Skip to article frontmatterSkip to article content

Prerequisite

PrerequisiteImportanceNotes
Version ControlHelpfulCan be used to version the compendium
Open ResearchHelpfulComponents are part of the compendium
Reproducible EnvironmentsHelpfulCan be used to make the compendium reproducible
Binder HubHelpfulCan be used to publish the compendium
MakeHelpfulCan be used for automation in the compendium

Summary

A research compendium is a collection of all digital parts of a research project including data, code, texts (protocols, reports, questionnaires, meta data). The collection is created in such a way that reproducing all results is straightforward Nuest et al., 2017Gentleman & Temple Lang, 2007.

This chapter has many prerequisites as it takes all digital components of a project together into a reproducible research package. That said: a research compendium can be constructed with minimal technical knowledge. The main purpose is that all elements of a project are published together, so a basic folder structure combining all components can be sufficient.

An illustration showing a person churning a big machine that takes scientific information from multiple papers and gives one output of readable file.

Figure 1:The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: The Turing Way Community & Scriberia (2024).

Motivation

A research compendium [def] combines all elements of your project, allowing others to reproduce your work, and should be the final product of your research project. Publishing your research paper along with a research compendium allows others to access your input, test your analysis, and, if the compendium can be executed, rerun to assess the resulting output. This does not only instill trust in your research but can give you more visibility. Others may use your research in unexpected ways, some of which are discussed below (refer to section: Using a research compendium).

Background

A research compendium at its most basic is a comprehensive set of files that combines all components of a project. This compendium can be downloaded and run locally to recreate the work done, or it can contain elements that allow it to be executed on a remote server. Executable research compendia aim to make the computational part of a scientific publication reproducible by providing all the building blocks available and give a description of how the user can execute the contained code.

Structure of a Research Compendium

Three principles should be kept in mind when constructing a research compendium Marwick et al., 2018.

With these principles, a wide variety of compendia are possible. Let’s start with the most basic version.

Basic Compendium

A basic compendium follows these three principles. It separates data and methods into a conventional folder structure, and describes the computational environment in a designated file. Furthermore, any compendium should have a landing page in the form of a README document.

compendium/
├── data
│   ├── my_data.csv
├── analysis
│   └── my_script.R
├── DESCRIPTION
└── README.md

Executable Compendium

The following folder can be considered an executable research compendium. It contains all the digital parts of the research project (code, data, text, figures) and all the information on how to obtain the results. The computing environment is described in the Dockerfile, the dependencies of files and how to automatically generate the results are described in the Makefile. Additionally we have a README.md describing what the compendium is about and a LICENSE file with info on how it can be used.

compendium/
├── CITATION
├── code
│   ├── analyse_data.R
│   └── clean_data.R
├── data_clean
│   └── data_clean.csv
├── data_raw
│   ├── datapackage.json
│   └── data_raw.csv
├── Dockerfile
├── figures
│   └── flow_chart.jpeg
├── LICENSE
├── Makefile
├── paper.Rmd
└── README.md

Separating Methods, Data, Output

The principles of a research compendium state that it should clearly separate Methods, Data, and Output. Phrased differently, this means we should distinguish between three types of files and folders:

The examples mentioned here are not exhaustive and some may first be “human-generated” and at some point become “read-only” (for example a human may generate the data metadata datapackage.json, but once that is done it may become something not to be touched). In other words, whether a folder contains files in either of these categories, may depend on the life cycle of the project.

Creating a Compendium

If you already use some of the tools in this book - such as version control, Makefiles, and/or reproducible environments - it may come naturally to you to create a research compendium. This is, because a version control repository can be a research compendium; A Makefile makes it executable; A reproducible environment makes it reproducible. To create a research compendium, we recommend to first think about what the components of your project are and create the folder structure accordingly. Use names for files and folders that make it easy for others to understand what they contain. It is a good idea to think about this early in the research process and start your project with the mindset that the output in the end is a research compendium rather than just a research paper.

Publishing a Compendium

There are several options to publish a research compendium:

For examples, see the label/tag/community “research-compendium” (applied on GitHub, Zenodo, OSF) or as a fallback the term “research compendium” in the description (used on GitLab). For more info, see also Research Compendium.

In the future, the research compendium may even be the publication itself allowing peer review of the entire research project.

Using a Compendium

A research compendium can be used in several ways, including (but not limited to):

Checklist

To create a research compendium, follow these steps:

See the EMNLP 2020 reproducibility checklist or the AGILE reproducible checklist for conference submission checklists.

Further Reading

References
  1. Nuest, D., Konkol, M., Pebesma, E., Kray, C., Schutzeichel, M., Przibytzin, H., & Lorenz, J. (2017). Opening the Publication Process with Executable Research Compendia. D-Lib Magazine, 23(1/2). 10.1045/january2017-nuest
  2. Gentleman, R., & Temple Lang, D. (2007). Statistical Analyses and Reproducible Research. J. Comput. Graph. Stat., 16(1), 1–23. 10.1198/106186007X178663
  3. The Turing Way Community, & Scriberia. (2024). Illustrations from The Turing Way: Shared under CC-BY 4.0 for reuse. Zenodo. 10.5281/ZENODO.3332807
  4. Marwick, B., Boettiger, C., & Mullen, L. (2018). Packaging data analytical work reproducibly using R (and friends). PeerJ Preprints. 10.7287/peerj.preprints.3192v2
  5. Belliard, F., Rusne Sileryte, Graser, A., Broman, K., Teperek, M., Granell, C., Hofer, B., Ostermann, F., Wang, Y., Nüst, D., Hettne, K., & Clare, C. (2019). AGILE Reproducible Paper Guidelines. 10.17605/OSF.IO/CB7Z8