Checklist and Resources

Checklist and Resources#

Checklist#

Identify whether your dataset has missing data; use visualisation tools to help you Visualising Missingness
Try to determine what type of missing data there is (MCAR/MAR/MNAR) as described in Missing Data Structures, and don’t forget about Structured Missingness
Choose an appropriate missing data handling method; you can use Missing Data Handling Methods as a starting point for ideas
Apply the missing data handling method and then continue with your analyses!

References#

Coding segments of this chapter were in part created thanks to several online tutorials which were used as a reference:

Visualizing Missing Data: A python notebook exploring the use of the missingno library.
Gallery of Missing Data Visualisations: A tutorial on missing data visualisation in R.
Imputing Missing Data with R; MICE package
Intro to MICE: An Imputation Strategy: A short notebook introducing implementing MICE in python.
Lastly, the scikit-learn documentation is incredibly helpful and detailed with regards to implementing missing data handling in python:
- 6.4. Imputation of missing values
- Imputing missing values with variants of IterativeImputer

Other textbook and paper references used, that have not been previously directly cited:

On types of missing data [Buu15, MC18]
On multiple imputation [AWLvanBuuren21, dGvDJ+13]

What to Learn Next#

If you happen to be handling sensitive data in your project, check out the Working on Sensitive Data Projects chapter.

Alternatively, if you want to make your research project and data analysis pipeline more reproducible, see the chapter on Reproducibility with Make, a build automation tool.

Checklist and Resources

Contents

Checklist and Resources#

Checklist#

References#

What to Learn Next#

Further Reading#