How to use the Data Hazards project

A cartoon sketch of a female-presenting researcher using the Data Hazard labels on a research project. The first section in the image is the researcher looking at 10 data hazard labels under the heading 'Learn'. An arrow leads to the next step, where under the heading 'Apply' the group of people are looking at a research paper with the researcher and talking to one another, one person is raising a hand to ask a question. An arrow then leads to 'Display' where the researcher is pointing to their paper infront of an audience, with the hazard labels on each side. Learn, Apply and Display are all connected to the bigger title of Reflect, surrounded by a thought bubble. To the right-hand side of the Reflect heading, the researcher and another person are sticking hazard labels to a piece of paper. — Figure 1:*The Turing Way* project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: The Turing Way Community & Scriberia (2024).

There are four steps to using the Data Hazard labels:

Learning: familiarising yourself with the Data Hazard labels.
Applying: deciding which Hazard labels are relevant to your project.
Reflecting: on what to do differently and what mitigations to make.
Display: displaying the labels alongside your work can help you to communicate that you’ve thought about these broad ethical issues and how you’d like others to use your work.

In addition, whether you have used the labels yet or not, you can also contribute to the project. If you think any labels are missing or could be improved, this project is looking to evolve with collaboration.

1. Learning about the Data Hazard labels¶

The first step in using the Data Hazards materials is learning about the Data Hazards labels: familiarising yourself with them so that you can later apply them to a project.

Data Hazards labels are supposed to represent as broad a selection of ethical risks associated with data-centric work as possible. This includes, but is not limited to the risks considered by ethics committees, which often focus on risks to research participants that could lead to legal repercussions for the research organisation, such as consent and privacy. It also includes issues like algorithmic bias or danger of misuse that might result from downstream outputs of research, rather than the research process itself.

Learning about the Data Hazard labels is usually part of a Data Hazards workshop, but you can also do it independently by:

Reading the labels on the website - click on each of them for more information!
Practicing applying them to a project.
Talking about them with other people.
Printing out label cards and using them in a workshop.

2. Applying the Data Hazard labels¶

Applying the labels means deciding which labels are relevant to a project. There is no prescriptive way to do this. However, we suggest one way to go through each label one at a time and decide why or why not it applies to your project.

What projects can I apply the labels to?

We recommend that you apply the labels to your own work unless you’ve been invited to give feedback on another piece of work.

The labels have been applied to many different types of projects that use data or data-intensive methods. Some examples of projects include:

A predictive model that aims to use machine learning to predict human traits from their genetic mutations.
A digital humanities project that uses web-scraping and natural language processing to analyse the text of political speeches.
An NHS data-linkage project to create a new database for researchers to use.
A modelling project that aims to improve whole-cell models of a bacteria.

Labelling these projects was part of various workshops and is not recorded or shared publicly.

When should I apply Data Hazards in my work?

Data Hazards labels can be applied at any stage in the research life-cycle. Ideally, the best time to apply the labels is close to the beginning of the research project, at the same time as you might consider your university ethics process, or Registered Reports (you can also pre-register your Data Hazards analysis!). However, much of our research builds on itself, so researchers have also found workshops useful to reflect and plan what they can do better in follow-up projects.

At the beginning of your workflow, you might want to prepare to avoid certain Data Hazards if you can, and if you can’t avoid them because of where your data has come from, you may want to acknowledge this. For example, if you have a sensitive data project, what Data Hazard labels will apply, and/or what can you do to design your project in a way that avoids certain harms?

As you are collecting and analyzing your data, you might want to iteratively think of the potential Data Hazards that exist in the information you are actively collecting. If you have a project where data has already been collected in the past, you can still apply and think of what labels may be relevant to the dataset.

When you are reporting your results, it is recommended you also think of reporting mitigation strategies together with the labels; see examples of how others have done this. This would then be helpful for people who see your outputs in the future. They can be aware of potential risks as they proceed with the project, and continue to think of solutions to any issues related to the research topic.

3. Reflecting on Safety Precautions¶

This chapter is a good place to read about Self Reflection, where discussion on identity and positionality, power and privilege, and self-reflection prompts are provided. Reflecting on where you stand in your work means you are more likely to see and acknowledge mitigation strategies which can apply to data hazards of your work. Likewise, reflecting on the history of where your work has come from may shed light as to how and why some data hazards apply to your work and promote thinking on how to alleviate these later.

Are you using data? Then doing some reflection as suggested above could help you think of what Data Hazards labels you might encounter throughout your project, for example “ranks of classifies people hazard” or “risk to privacy”.

A key part of the Data Hazards framework is to reflect on what to do differently. Thinking about mitigation strategies means that you suggest ways to prevent risks associated with your work, and if no prevention is possible at present, how to avoid it in the future. The point of this is not to think of all the possible scenarios on how your work carries or can carry risks, and of all possible mitigation measures; but to reflect, acknowledge and promote an awareness about the ethical implications of the work that we do.

4. Displaying the Data Hazard labels¶

To showcase your reflections and how you have applied the labels, displaying them can help people visualise their importance in your work.

The Data Hazards website has some suggestions and templates on how to present the labels.

Contributing to Data Hazards materials¶

Additionally, to the steps stated above, the Data Hazard labels are a collaborative effort to create a shared vocabulary that evolves with time. If you find that labels are missing, you can suggest a new one yourself. Guidelines on how to contribute are laid out in the contribution section of the Data Hazard website. There are different ways of contributing, either by hosting a workshop on how to apply Data Hazard frameworks, by applying them to your work and sharing this, or by proposing new labels to add to the existing ones.

You can reach the team via a GitHub discussion post, or by contacting them directly via email.

References¶

The Turing Way Community, & Scriberia. (2024). Illustrations from The Turing Way: Shared under CC-BY 4.0 for reuse. Zenodo. 10.5281/ZENODO.3332807

Pathways

Introduction to Data Hazards Project

Pathways

Case Study: Data Ethics and Reproducibility Symposium and Data Hazards Workshop