This web service does not fully support Internet Explorer versions below IE9. If you are using an earlier version, some features may be displayed incorrectly.


GROTOAP2 (GROund Truth for Open Access Publications) is a dataset useful for training and performance evaluation of document content analysis tasks, such as document zone classification. GROTOAP2 is a successor of GROTOAP dataset.

GROTOAP2 was built automatically from PubMed Central Open Access Subset. It contains 13,210 ground truth files, that store geometrical and logical structure of the articles content. The corresponding PDF files can be downloaded from PMC repository using provided script.

This repository contains a sample of a 132 ground truth files. The full dataset can be downloaded from:

Publisher: RepOD

Publication year: 2014

Related publication:

Type of resource: Dataset

Area of study: Technology and engineering

Funder: European Commission

Funding program: FP7

Grant number: 283595

License for files: CC-BY-4.0

Files in this dataset



Author Affiliation
Tkaczyk, Dominika ICM, University of Warsaw
Szostek, Paweł ICM, University of Warsaw
Bolikowski, Łukasz ICM, University of Warsaw

Cite this dataset as:

Tkaczyk, D.; Szostek, P.; Bolikowski, Ł. (2014) GROTOAP2. RepOD.

Publicly available in RepOD since: 2015-09-29 11:10 (CEST)

Download the dataset citation