Why share research data in an open repository?Open sharing of data can bring advantages both to the development of science and to the researcher's career:
- sharing data allows its re-analysis and encourages new interpretations,
- open data can be used for conducting new research, and it can be merged with existing data to create new combinations,
- open data can be used both by other scientists and by people from outside the academic community,
- sharing data makes it easier to check whether published scientific works are based on reproducible results,
- depositing research data in a repository guarantees safe, long-term storage,
- preparing data for sharing requires that you describe it properly, so it becomes easier to use in the future,
- data deposited in a repository has a fixed URL and a DOI (digital object identifier) number, which makes proper citation easier, and allows the researcher to include a list of published datasets into their CV,
- data in a repository is supplied with a standardized set of metadata, which makes it easier to find,
- repositories may supply researchers with information on how often their data is viewed and downloaded.
How to share data in the Repository for Open Data?
Who can deposit data in RepOD?
Data can be deposited by every registered user who conducts research and wants to share data pertaining to their research project. You only need to remember that the person depositing data must have all intellectual property and related rights necessary for open sharing of the materials (RepOD Legal Guide).
What kind of data can be deposited?
Any data that was created, collected or annotated for the purpose of scientific research can be shared in RepOD. The repository is intended for sharing data from all areas of knowledge. The data can be raw or pre-processed, provided that the description contains information on all the actions performed. The size of a single dataset may not exceed 50GB. If you would like to deposit a larger dataset, please contact us at firstname.lastname@example.org. Please do not deposit datasets that have already been published and had a DOI number assigned.
How do you deposit your data?
- First, you need to register on our website. Every registered user has their own User folder associated with their account. All your published datasets are listed in your user folder.
- To start depositing data, select the “Create dataset” option from the menu. Then enter the metadata (the information that describes your dataset) into the provided fields, according to the instructions next to the fields, and click “Next: add data”. A draft version of your dataset has been created. You can now proceed to uploading your prepared data files (see How to prepare data for sharing?). You can always leave the deposit procedure and continue later. To do this, go to your “Dashboard”: in the “My datasets” section you will find a list of all your datasets, including drafts. Already at the draft stage a DOI number is pre-reserved for your dataset – it will not change later. If individual files in your dataset exceed 200 MB in size, please use the RepOD API to upload them (for instructions and scripts see the link at the bottom of the page). If your files exceed 8 GB, please contact us at email@example.com. Please also remember that the maximum size of a dataset is 50GB. Larger datasets may be deposited only upon individual arrangements.
- You can now assign your dataset to a group (for example, a group associated with your research institution, or with a project under which your research is conducted). In order to do this, please go to the page of your group of interest and click “Join group”. Then go to your dataset's page, go to the section “Groups”, and choose the group from the list. You may assign the dataset to more than one group, if you find that appropriate.
- When your dataset is ready for publication, please click “Submit dataset”. The word “Draft” next to your dataset's title will now vanish, and your dataset will appear in your user folder.
- So far, your dataset is only visible to you (it is marked “Private”). Now the repository editor will check whether the deposited files can be considered research data at all (and not spam), and whether there are any errors in the entered metadata (the editor may correct basic typing mistakes; for more serious correction, they will contact the dataset creator). The editor does not review the contents of the dataset, its merits or quality. After getting approval from the editor, the dataset will become visible to everyone and the DOI number will be registered. The approval process may take up to 2 working days. When the dataset is visible to the public, the marking “Private” will disappear, and you will receive an e-mail notification.
Can you modify the deposited data?
Modifications of published datasets are not possible. If you find it necessary to change something in the dataset (e.g. you notice a mistake in your metadata), please contact the repository editor by e-mail (at firstname.lastname@example.org) and ask him/her to introduce the changes.
If you only want to update a dataset, you should deposit a new version by making a new dataset with a new version number. In this case, it is best to put this information in the title and/or description, informing that this dataset is a subsequent version of a dataset once published. You can also contact the repository editor and ask to modify the description of the older dataset, so that it contains a link to the updated version.
Can the deposited data be withdrawn and how?
Due to the possibility of the data being cited in scientific publications, withdrawing data is only possible in justified cases and requires intervention from a RepOD editor. If you need to withdraw your data, please contact us via e-mail (email@example.com) and explain why the withdrawal is necessary. If the data is withdrawn, the current URL will lead to a message informing that the data has been withdrawn and the assigned DOI number will still direct users to this website.
What are RepOD groups and where do they come from?
Groups allow to combine datasets into collections. A scientific unit (a research institute, a university, a faculty, etc.) or a research project can have their own group. Groups are open: any repository user can join an existing group. New groups are created by the website editor. Any registered user who shared or would like to share a dataset can submit an e-mail request to the RepOD editor (to the address: firstname.lastname@example.org), asking to create a new group to which they would like to add their dataset. For example, this could be a group corresponding to a scientific institution the user is affiliated with. It is also possible to assign one dataset to numerous groups, for instance corresponding to several scientific institutions with which different authors of a given dataset are affiliated.
How to prepare data for sharing?
A dataset in our repository is just a collection of files with a description in the form of metadata. The dataset can be a collection of all data related to one publication or one research project, research question or experiment. The decision about the scope of data combined into one dataset lies solely with the dataset's creator. The dataset's structure, and the number and formats of files that constitute it, can be shaped accordingly. Please refer to the guidelines below:
- It is possible to use any file formats, but it is better to avoid formats that require one particular (especially commercial) software to work properly. Remember that the data will be stored over the long term, and popular programs appear and disappear all the time, so it's best to use formats that do not depend on a particular software. For example, it is better to store tabular data as .csv and not .xls. You can also deposit the same table in two formats.
- If you have a lot of files to deposit and the files can be grouped in a sensible manner, you can consider packaging them into .zip archives. You should however note that no preview will be available for packaged files.
- A single file in the dataset cannot exceed 8 GB. If it is not possible to divide your data into files of this size – please contact us. We will try to search for a solution together.
- The naming pattern of the files should be informative and transparent; this can significantly help users to understand and use the data.
- Not all metadata fields are compulsory, but try to fill as many as possible. The more information gets stored in the metadata, the bigger the chance that people who are genuinely interested in your data will be able to find it.
- In order for the data to be usable, the research process should be described in as much detail as possible. You can put the methodology descriptions into the data files themselves, but if – for some reason – this is not a good solution in your case, you can also create a separate ReadMe.txt file in which the research methodology and context will be described. If there is a scientific publication describing how the data was collected, you should link to that as well, pasting the link into the provided field when entering the metadata.
- The description of the data (metadata) can be written in Polish or English. We encourage you to write in English, as it will make the shared data available to more people.
- Research data can be shared in many ways. We encourage you to use one of the open licences (CC0, CC-BY), but it is also possible to share data in RepOD without specifying a licence, under the conditions of fair use. Please remember that in order to share data, you need to have the necessary legal rights (including verbal consent of all co-authors). Please also note that it is the depositer who is responsible for proper anonymization of all personal and sensitive data. More information on legal issues can be found in the RepOD Legal Guide.
Who can use the deposited data and how?
Using our services does not require registration. Any interested party can view, download and use the shared files. Any restrictions in using the data result only from the legal status of the files: the open licence granted by the depositer or the provisions of fair use (RepOD Legal Guide).
Please note that regardless of the legal status of the data, good scientific practice requires that you state the name of the author(s) and the source whenever you use the data – even if it is shared under a CC0 licence. The only exception to this rule is when such attribution is technically impossible (e.g. automatic data mining). If you use the data in your own publication, you should formally cite both the dataset available from the repository and the related scientific publication, if it exists.
Here is an example of a proper citation format for a dataset downloaded from RepOD:
Kowalska, A., Wiśniewska, B. (2015) Results of measurements from 2002-2004. RepOD. http://dx.doi.org/10.5072/0069791