Documentation and Metadata

Having data available is of no use if it cannot be understood. Without metadata to provide provenance and context, the data can’t be used effectively. For example, a table of numbers is useless if no headings describe what the columns/rows contain. Therefore you should ensure that open datasets include consistent metadata, that is information about the data so that the data is fully described. This requires that information accompanying data is captured in documentation and metadata.

Documentation¶

Documentation provides context for your work. It allows your collaborators, colleagues and future you to understand what has been done and why.

Data documentation can be done on different levels. All documentation accompanying data should be written in clear, plain language. Documentation allows data users have sufficient information to understand the source, strengths, weaknesses, and analytical limitations of the data so that they can make informed decisions when using it.

The figure goes through a dark wood setting lights along the way. The lights are blocks of text - one can see that these are pieces of documentation. They make it easy for colleagues to find their way. In the darkness one sees another figure - someone got lost in the woods where no documentation was available. — Illustration about peer review. *The Turing Way* project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.

Metadata¶

Metadata is information about the data, descriptors that facilitate cataloguing data and data discovery. Often, metadata are intended for machine reading.

When data is submitted to a trusted data repository, the machine-readable metadata is generated by the repository. If the data is not in a repository a text file with machine-readable metadata can be added as part of the documentation.

The type of research and the nature of the data also influence what kind of documentation is necessary.
The level of documentation and metadata [def] will vary according to the project, and the range of people the data needs to be understood by.
Examples of documentation may include items like data dictionaries (see here for a template) or codebooks, protocols, logbooks or lab journals, README files, research logs, analysis syntax, algorithms and code comments.
Variables should be defined and explained using data dictionaries or codebooks.
Data should be stored in logical and hierarchical folder structures, with a README file used to describe the structure. The README file is helpful for others and will also help you find your data in the future Fuchs & Kuusniemi, 2018. See the README template from Cornell for an example.
It is best practice to use recognised community metadata standards to make it easier for datasets to be combined.

Community Standards - Metadata¶

The use of community-defined standards for metadata is vital for reproducible research and allows for the comparison of heterogeneous data from multiple sources, domains and disciplines. Metadata standards are also discipline-specific. For example, for brain data, the Brain Imaging Data Structure is the standard to use. Not every discipline may use metadata standards, however. You can see if your discipline uses metadata standards through FAIRsharing, a resource to identify and cite the metadata or identifier schemas, databases or repositories that exist for your data and discipline. There are also situations when researchers make use of more general metadata standards, for example when they use a generic archive to store their data they have to adhere to the metadata standards of the archive.

In this case, a text file with discipline specific metadata can be added as part of the documentation.

Want to learn more about Metadata and Metadata Standards? Watch an introduction video.

Tagging¶

Tags are keywords assigned to files, and a way to add metadata to a file to organise them more flexibly. While a file can only be in one folder at a time, it can have an unlimited number of tags.

Some tips include:

Use short tag names (one or two words)
Be consistent with tags
Not all file formats allow tags, and when files are transferred tags may be stripped

See Tagging and Finding Your Files by MIT libraries) for more information.

Additional Resources¶

Videos on Data Description & Documentation and Data Quality from the TU Delft Open Science MOOC.
Example of data documentation by Larsen et al., 2021
Webinar: The Data You Document are the Data We Love
Slides: FAIRify your data: data documentation and metadata
Controlled vocabularies for the social sciences: what they are, and why we need them
Research Data Management: Metadata
Data dictionaries and codebooks by Buchanan et al., 2021.

References¶

Fuchs, S., & Kuusniemi, M. E. (2018). Making a research project understandable - Guide for data documentation. Zenodo. 10.5281/zenodo.1914401
FAIRsharing Team. (2024). FAIRsharing record for: Brain Imaging Data Structure. FAIRsharing. 10.25504/FAIRSHARING.RD1J6T
Larsen, R. J., Gagoski, B., Morton, S. U., Ou, Y., Vyas, R., Litt, P. E., Jonathanand Grant, & Sutton, B. P. (2021). Dataset for "Quantification of Magnetic Resonance Spectroscopy data using a combined reference: Application in typically developing infants. In Illinois Data Bank. 10.13012/B2IDB-3548139_V1
Buchanan, E. M., Crain, S. E., Cunningham, L., Ari, Johnson, H. R., Stash, H., Papadatou-Pastou, M., & Isager, P. M. (2021). Getting Started Creating Data Dictionaries: How to Create a Shareable Data Set. Advances in Methods and Practices in Psychological Science. 10.1177/2515245920928007

The Turing Way

Data Organisation in Spreadsheets

The Turing Way

Methods and Protocols