Open Data#

The world is witnessing a significant global transformation, facilitated by technology and digital media, and fuelled by data and information. This transformation has enormous potential to foster more transparent, accountable, efficient, responsive, and effective research. Only a very small proportion of the original data is published in conventional journals. Despite existing policies on archiving data, in today’s practice data are primarily stored in private files, not in secure institutional repositories, and effectively are lost to the public (and often even to the researcher who generated the data).

This lack of data sharing is an obstacle to international research (be it academic, governmental, or commercial) for two main reasons:

  1. It is generally difficult or impossible to reproduce a study without the original data.

  2. The data cannot be reused or incorporated into new work by other researchers if they cannot obtain access to it.

Accordingly, there is an ongoing global data revolution that seeks to advance collaboration and the creation and expansion of effective, efficient research programs. Open data [def] is crucial to meeting these objectives. Open data is freely available on the internet. Any user is permitted to download, copy, analyse, re-process, and re-use it for any other purpose with minimal financial, legal, and technical barriers.

This represents a real shift in how research works. Funders are starting to require researchers to make their data available and submit data management plans Data Management Plans as part of project proposals. At the moment, anyone who wishes to use data from a researcher often has to contact that researcher and make a request. “Open by default” remedies this with a presumption of publication for all. If access to data is restricted, for instance, due to security reasons, the justification for this should be made clear. Free access to and subsequent use of data is of significant value to society and the economy. That data should, therefore, be open by default and only as closed as necessary.

You can find more about the practical steps to make your data available in the section describing Steps to Share your Data in the subchapter: Sharing and Archiving Data.

Barriers to Data Sharing#

Many academics find sharing data difficult. Recent surveys [SBH+18] conducted amongst researchers list the following reasons:

  • Organising data in a presentable and useful way is challenging (mentioned by 46%)

  • Researchers are unsure about copyright and licensing (mentioned by 37%)

  • Researchers do not know which repository to use for different data types (raised by 33%)

These are cultural challenges that might be addressed in changing practice going forward. However, there are also legal, ethical or contractual reasons that sometimes prevent making data publicly available in its entirety or even in parts. Below, we discuss some reasons explaining why this may be the case.

An image detailing why private data should be used. A person stands next to a well with 'private data' written on it and a padlock around it. It is black and white and blue. The text lists that 'people deserve - dignity, agency, privacy, rights, confirmed consent.'

Fig. 13 The Turing Way project illustration by Scriberia. Original version on Zenodo. http://doi.org/10.5281/zenodo.3695300#

Privacy And Data Protection#

Many fields of research involve working with sensitive personal data, with medical research being the most obvious example. Please see the sensitive data chapter for more information about different types of sensitive data. You can check the Managing Sensitive Data Projects chapter on how you should manage these data. Particularly the Data Privacy Strategies section can help you to safely manage and protect sensitive personal data.

National and Commercially Sensitive Data#

In many cases, companies are understandably unwilling to publish much of their data. The reasoning goes that if commercially sensitive information of a company is disclosed, it will damage the company’s commercial interests and undermine competitiveness. This is based on the thinking that in competitive markets, innovation will only occur with some protection of information. If a company spends time and money developing something new, the details of which are then made public, then its competitors can easily copy it without having to invest the same resources. The result is that no one would innovate in the first place. Similarly, for public safety concerns, governments are often unwilling to publish data that relates to issues such as national security. In such cases, it may not be possible to make data open, or it may only be possible to share partial/obscured datasets.

Chapter Tags: This chapter is curated for the Turing Data Study Group (turing-dsg).