Data repositories or archives
A data repository allows researchers to upload and publish their data, thereby making the data available for other researchers to re-use. Similarly, a data archive allows users to deposit and publish data but will generally offer greater levels of curation to community standards, have specific guidelines on what data can be deposited and is more likely to offer long-term preservation as a service. Sometimes the terms data repositories and data archives are used interchangeably.
A data repository or archive will provide services such as:
- Persistent identifier such as a “digital object identifier” or DOI; the presence of a DOI facilitates discoverability and citeability
- Assistance with metadata provision e.g. through the use of a template
- Allow you to apply a licence to your data
- Aid compliance with the FAIR data principles (data that are Findable, Accessible, Interoperable, and Reusable) as data are published online with appropriate metadata and are assigned a persistent identifier, see Jones, Sarah, & Grootveld, Marjan. (2017, November). How FAIR are your data?. Zenodo. http://doi.org/10.5281/zenodo.1065991
- Accept a wide range of data types
- Long-term access and, in some cases, long-term preservation
- Offer useful search, navigation and visualisation functionality
- Reach a wider audience of potential users
- Manage requests for data on your behalf
When to select a data repository?
Choose early so that you can familiarise yourself with the repository’s requirements. Requirements may include:
- depositing in certain file formats
- using a specific metadata standard
- inclusion of documentation to help describe your data.
Understanding such requirements will enable you to design your data collection materials for easier metadata and documentation creation.
How to select a data repository
Ask:
- Is it reputable? Is it listed in Re3data thereby meeting their conditions of inclusion?
- Is it appropriate to my discipline?
- Will it take the data you want to deposit?
- Is there a size limit?
- Does it provide a DOI / persistent identifier?
- Does it provide guidance on how the data should be cited?
- Does it provide access control, where necessary, for your research data?
- Does it ensure long-term preservation / curation?
- Does it provide expert help with e.g. metadata provision, curation?
- Is there a charge?
Other questions may pertain depending on your requirements. For more information see the UK’s Digital Curation Centre’s checklist
Locate a data repository
Some universities have their own data repositories that offer the facility for researchers to deposit, share and licence their data resources for discovery and use by others. There are more than 600 discipline-specific data repositories worldwide with community specific standards. They may also be called data centres or archives.
re3data.org (Registry of Research Data Repositories) is the primary place to locate a data repository. You can search it by specific research discipline and then filter by access categories, data usage licenses, whether the repository gives the data a persistent identifier etc.
Re3data uses a series of symbols to indicate key services e.g.
- To be registered in re3data.org a research data repository must:
- be run by a legal entity, such as a sustainable institution (e.g. library, university)
- clarify access conditions to the data and repository as well as the terms of use
- have focus on research data
Discipline-specific repositories have the expertise and resources to deal with particular types of data. They have different policies and may charge for their services.
See also PLOS recommended repositories
Multidisciplinary repositories
If there is no disciplinary-specific repository in your area select a general repository. These can handle a variety of different data types. Charges may apply but can be included in a funding application. Key general repositories are listed in the table below. This list is for information purposes only and is not exhaustive:
Data Hub provides free access to its core features letting you search for data, register published datasets, create and manage groups of datasets
Dataverse
Dryad hosts a wide range of data types. For some journals there is no charge to deposit in Dryad.
figshare archives data and software for all subjects and is suitable for small to medium sized projects that do not require specialised curation
Github is a code hosting site where you can store and share code for free
Open Science Framework
Zenodo is a multi-disciplinary data repositories where researchers can deposit both publications and data and create links between them
See also:
ICPSR is an international consortium of more than 750 academic institutions and research organizations that maintains a data archive of more than 250,000 files of research in the social and behavioral sciences
How to find a trustworthy repository for your data? Guides for Researchers from OpenAIRE