Skip to main content

Research Data Management: Metadata


Metadata is similar to Documentation (see related tab) but is more structured, conforms to set standards and is machine readable. It is required to facilitate archiving, discovery and citation of the dataset.

What is metadata?

Metadata is a formal structured description of a dataset, used by archives to create catalogue records. It is structured, conforms to set standards and is machine readable.There are three categories of metadata:

Descriptive metadata includes author, title, keywords and abstract and enable users to find resources online.

Administrative metadata includes information about when and how a resource was created as well as file type, technical information and access rights.

Structural metadata provides information about the relationship between the parts that make up a compound object e.g.relating articles, issues and volumes of serial publications, or the pages and chapters of a book.

Metadata describes the content, quality, condition, and other characteristics of a dataset. It enables data to be preserved, minimizes duplication of effort in the collection of expensive digital data and fosters the sharing of digital data resources. 

  • Who created the data?
  • What is the content of the data?
  • When were the data created?
  • Where is the data geographically?
  • How were the data developed?
  • Why were the data developed?

Example of metadata from the Library catalogue

Why is metadata essential?

Metadata enables data developers to:

  • Avoid data duplication because they check if data already exists
  • Share reliable information about a dataset by creating metadata for it
  • Reuse a dataset with confidence about its origins and quality as well as having valuable information about it
  • Publicize the data they have created by making the metadata available in repositories
  • Cite their datasets and increase the visibility of the data.

Metadata enables user to:

  • Search for and get access to data from a variety of sources
  • Restrict searches to a geographic regions
  • Determine whether the data will be applicable for use in a particular study
  • Acquire a dataset
  • Know restrictions on how a dataset use

Metadata enables organizations to:

  • Safeguard their investment in their data by retaining information about how it was collected, processed, quality controlled, used and restricted
  • Create a permanent record of the dataset which is critical institutional memory 
  • Ensure that datasets “live on” for the organization after researchers leave or retire
  • Re-use dataset in another research project if appropriate and future researchers will know how the datset was created
  • Advertise its research and enable new partnerships and collaborations by data sharing

Essential Fields

Title: Name of dataset or research project that produced it. (Include both if applicable.)

Creator(s): Names and addresses of the group that created the data.

Identifier: Unique identifier or number that is used to identify the data. This could be an internal project number or code to reference the data.

Abstract/Description: A brief synopsis of the project or data that another researcher can review quickly to see the relevance of the project to what they are seeking.

Dates: All the dates associated with the project. The most important is probably the release date of the data, but you'll eventually want to include:

  • start and end date of the project
  • time period covered by the data or project
  • maintenance cycle of the data
  • update schedule of the data
  • any other important dates that will help document the process and aid in preservation

Rights: Any known intellectual property rights held for the data or project.

Recommended Fields

Contributor(s): Names and addresses of additional individuals that contributed to the project.

Subject: Keywords, phrases, or subject headings that will describe the subject or content of the data. (In adding these, think of how you would search for the materials.)

Funders: Organizations or agencies that funded the research or project.

Access Information: The location of the data and how the researcher can access the materials. (Confidentiality can be addressed here as well.)

Language: The language(s) of the content.

Location: If the data relates to a physical location, the spatial coverage should be documented.

Methodology: The process of how the data was generated, including the equipment software used including the version the experimental protocol data validation and quality assurance of the data any other relevant information

Data Processing: Documenting the alterations made to the data will aid in preservation of the data and record who made changes and for what reasons at specific times.

Sources: Citations for the sources that were used during the project. (Include where the other data or material was stored and how it was accessed when appropriate.)

List of File Names:  List all of the data files associated with the project and include the file extensions. (e.g.,

File Formats: Format(s) of the data and any software that is required to read the data including the version. (e.g., TIFF, FITS, JPEG, HTML)

File Structure: Organization of the data file(s) (and the layout of the variables when applicable).

Variable List: List of variables in the data files, when applicable.

Code Lists: Explanation of codes or abbreviations used in the file names, variables of the data, or the project over all that will help the user understand the project. (e.g., "999" indicates a missing value in the data)

Versions: Date/time stamp for each file and use a separate identifier for each version.

Checksums: Used to test if your file has changed over time. (This will aid in the long term preservation of the data and help make it secure by tracking alterations.)

Related Materials: Links or location of materials that are related to the project. (e.g., articles, presentations, papers)

Citation: The recommended way to cite the data or the information needed.

What is a metadata standard?

A Standard provides a structure to describe data with:
  • Common terms for consistency between records
  • Common definitions for easier interpretation
  • Common language for ease of communication
  • Common structure to quickly locate information

Standards provide a uniform summary description of a dataset.

The Research Data Alliance Standards Directory contains widely used metadata standards in the Arts and Humanities, Engineering, Life Sciences, Physical Sciences and Mathematics, Social and behavioural Sciences and General Research Data.

The Digital Curation Centre provides links to information about discipline specific metadata standards, including profiles, tools to implement the standards, and use cases of data repositories currently implementing them.

Biosharing is an educational resource on inter-related data standards, databases and policies in the life, environmental and biomedical sciences.

The Data Documentation Initiative (DDI) is an international standard for describing the data produced by surveys and other observational methods in the social, behavioural, economic, and health sciences. DDI is a free standard that can be used to document and manage different stages in the research data lifecycle, such as conceptualization, collection, processing, distribution, discovery, and archiving. Documenting data with DDI facilitates understanding, interpretation, and use by people, software systems, and computer networks.

CEDAR (Center for Expanded Data Annotation) is a repository of community defined metadata templates.  and Retrieval. Its goal is to improve metadata and its use in the biomedical sciences. The CEDAR metadata tools can be used to create, annotate, analyze, validate and search metadata based on the fields and relations defined in the metadata templates.