Data Papers

What are Data Papers?

The purpose of a data paper is to describe a dataset and make it more FAIR (findable, accessible, interoperable and reusable). Therefore, it is different from a research article, it is not meant to report the findings of your data.

It is a peer-reviewed article focused on describing datasets and the circumstances of their collection - a searchable metadata document.

There are specific data journals for data papers such as Scientific Data or Biodiversity Data Journal. Data journals are usually open access journals so a data paper does normally come with an article processing charge (APC), but this can vary considerably depending on the journal you choose. The APC can be anywhere between £0 to several thousand pounds. There is a helpful list of journals on the GBIF data papers page or see our list of data journals in the sub-chapter on Data Articles in the Research Data Management chapter.

Why write a Data Paper?

There are lots of reasons why you would want to write a data paper.

Public benefit:

  1. Validation of data in research papers

  2. Public trust in science through greater transparency

  3. Economic benefit for private sector

  4. Opportunities for citizen science

Research community benefits:

  1. More efficient research

  2. Re-use in teaching

  3. Easier to find useful data

  4. Data archived and preserved for future use

  5. New research made possible

Personal benefit:

  1. Career recognition

  2. Credit for data stewardship

  3. Citations

  4. New collaborations

Picking out three of these that are particularly important:

Makes your data more FAIR

FAIR (Findable, Accessible, Interoperable and Reusable) data has become the gold standard for data management and is being increasingly adopted in many different disciplines. If you want to make your dataset FAIR, you will be putting it in an online repository and then you really need to tell everyone where to find it. Therefore, a data paper is a good place to advertise your dataset making it even more findable.

A good place to go for more information about how to make your data FAIR is GO FAIR (https://www.go-fair.org) and also see their FAIRification process (https://www.go-fair.org/fair-principles/).

Part of the FAIR principles are that data are described with rich metadata (F2) and that the metadata clearly and explicitly include the identifier of the data they describe (F3). This is exactly what a data paper is for - describing all the metadata (methodology of data collection) and providing a digital object identifier (DOI) for the dataset.

But writing a data paper does not just increase the findability, it also increases the accessibility of your data. Another part of the FAIR principles is that metadata are retrievable by their identifier (A1). A data paper provides this by having a separate DOI associated with the article and therefore the description of your data (metadata).

It increases your number of articles and also citations

Research culture, research impact and career progression is still linked to the number of journal articles researchers publish and what journals they publish in.
We therefore need to convert the effort we make in producing sustainable datasets (open or restricted in repositories) into the impact gained from publishing journal articles. A data paper adds another published peer-reviewed article on top of your research article for the same project and gives credit to the researchers that take time to implement good data stewardship practices.

Data papers are quick to write and also tend to come back from review quickly so there is generally a quick turn around for those in need of bolstering their publication lists.

With this increase in publications will obviously come an increase in citations for you. So writing a data paper means more impact for your research!

Reproducibility and sustainability

The transparency produced by writing down your methodology in a structured way in a data paper means that your research will be easier for others to understand, reproduce and reuse. A data paper accompanied with a research compendium attached to a research article is a great way to signal to others that your research is reproducible. You are providing a more transparent record of your research.

Being more open and transparent, and particularly taking a reproducible research approach, makes your research more sustainable. But why is this such a good thing? Publishing research that has the ability to be fully reviewed, so that means the protocols, data, and code as well as your interpretations and conclusions, means that it can be fully validated. Other researchers can have more trust in the quality of the research and therefore, reuse any part of it with confidence. The research has more longevity and therefore is more cost effective.

This is especially important when it comes to developing new scientific methods for applied sciences. The methods need to be robust so that the investigations using them and subsequent interpretations made from the results must hold up to scrutiny.

How do you write a Data Paper?

First you need to deposit the dataset in an open repository of your choice. This can be done for free using one of many free open repositories such as Zenodo, Open Science Framework or Figshare. In most cases, a data paper will link to an open dataset but this may not be possible with sensitive data. However, with sensitive data you will be able to link to a restricted dataset and in this case it is important you provide detailed information about how to apply to get access to the data in the data availability statement in the data paper.

Depositing a dataset in a repository will provided you with a Digital Object Identifier (DOI) for the dataset, you can then write your article using a data paper template provided by a data journal and link the dataset to this paper by writing the DOI in the data availability statement.

For data journals, the data paper templates are all very similar and include sections such as:

  • Background and summary - used to introduce the context of your dataset and how it fits in with other similar or related datasets. You could also comment on the significance or uniqueness of the dataset.

  • Methods - this includes all the information about how you collected the data.

  • Data records - where is the data located?, what format?, overview of the files such as a data dictionary or link to a wiki.

  • Technical validation - what analyses were done to ensure technical quality of the dataset?

  • Usage notes - how could the dataset by reused?, how can you apply to reuse it?

This structured approach, using a template, makes data papers simple to write. They are meant to be fairly short and to the point therefore much less time consuming than writing a research article.

How do you review a Data Paper?

You can find general guidance on how to peer review a paper in our peer review chapter here.

When reviewing a data paper specifically it may be useful to focus on (adapted from F1000Research’s guidance):

  • Is the rationale for creating the dataset clearly described?

  • Is the dataset clearly presented in a useable and accessible format?