This web service does not fully support Internet Explorer versions below IE9. If you are using an earlier version, some features may be displayed incorrectly.

GROTOAP2

GROTOAP2 (GROund Truth for Open Access Publications) is a dataset useful for training and performance evaluation of document content analysis tasks, such as document zone classification. GROTOAP2 is a successor of GROTOAP dataset.

GROTOAP2 was built automatically from PubMed Central Open Access Subset. It contains 13,210 ground truth files, that store geometrical and logical structure of the articles content. The corresponding PDF files can be downloaded from PMC repository using provided script.

This repository contains a sample of a 132 ground truth files. The full dataset can be downloaded from: http://cermine.ceon.pl/grotoap2/.

Publisher: RepOD

Publication year: 2014

Related publication: http://dx.doi.org/10.1045/november14-tkaczyk

Type of resource: Dataset

Area of study: Technology and engineering

Funder: European Commission

Funding program: FP7

Grant number: 283595

License for files: CC-BY-4.0

Files in this dataset

Keywords

Authors

Author Affiliation
Tkaczyk, Dominika ICM, University of Warsaw
Szostek, Paweł ICM, University of Warsaw
Bolikowski, Łukasz ICM, University of Warsaw

Cite this dataset as:

Tkaczyk, D.; Szostek, P.; Bolikowski, Ł. (2014) GROTOAP2. RepOD. http://dx.doi.org/10.18150/8527338

Publicly available in RepOD since: 2015-09-29 11:10 (CEST)

Download the dataset citation