Skip to article frontmatterSkip to article content

Checklist

References

Coding segments of this chapter were in part created thanks to several online tutorials which were used as a reference:

Other textbook and paper references used, that have not been previously directly cited:

What to Learn Next

If you happen to be handling sensitive data in your project, check out the Working on Sensitive Data Projects chapter.

Alternatively, if you want to make your research project and data analysis pipeline more reproducible, see the chapter on Reproducibility with Make, a build automation tool.

Further Reading

  • Flexible Imputation of Missing Data: This is a much more in-depth look at missing data imputation that goes into further characterising data, including mathematical definitions, and describing data imputation methods.
  • Getting Started with naniar: More R functions to visualise Data Missingness, including one using decision trees to map out the proportion of missingness in a variable based on all other variables.
  • The papers cited throughout this chapter are all good resources for further reading. The original paper on MICE Buuren & Groothuis-Oudshoorn, 2011 and the review papers on missing data handling Pigott, 2001Oluwaseye Joel et al., 2022 are especially great resources.
  • For more R visualisation and imputation packages see:
  • The Turing-Roche partnership has some resources on structured missingness:
References
  1. Buuren, S. van. (2015). Types of missing data [Book]. In An Introduction to Medical Statistics, Fourth Edition (pp. 306–307). Oxford University Press. https://www-users.york.ac.uk/~mb55/intro/typemiss4.htm#:~:text=When%20we%20say%20data%20are,CADET%2C%20sex%20might%20be%20MCAR.
  2. Mack C, W. D., Su Z. (2018). Types of Missing Data [Book]. In Managing Missing Data in Patient Registries: Addendum to Registries for Evaluating Patient Outcomes: A User’s Guide, Third Edition [Internet]. Rockville (MD): Agency for Healthcare Research. https://www.ncbi.nlm.nih.gov/books/NBK493614/
  3. de Goeij, M. C. M., van Diepen, M., Jager, K. J., Tripepi, G., Zoccali, C., & Dekker, F. W. (2013). Multiple imputation: dealing with missing data. Nephrology Dialysis Transplantation, 28(10), 2415–2420. 10.1093/ndt/gft221
  4. Austin, P. C., White, I. R., Lee, D. S., & van Buuren, S. (2021). Missing Data in Clinical Research: A Tutorial on Multiple Imputation. Canadian Journal of Cardiology, 37(9), 1322–1331. https://doi.org/10.1016/j.cjca.2020.11.010
  5. van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. 10.18637/jss.v045.i03
  6. Pigott, T. D. (2001). A Review of Methods for Missing Data. Educational Research and Evaluation, 7(4), 353–383. 10.1076/edre.7.4.353.8937
  7. Oluwaseye Joel, L., Doorsamy, W., & Sena Paul, B. (2022). A Review of Missing Data Handling Techniques for Machine Learning. International Journal of Innovative Technology and Interdisciplinary Sciences, 5(3), 971–1005. 10.15157/IJITIS.2022.5.3.971-1005
  8. Mitra, R., McGough, S. F., Chakraborti, T., Holmes, C., Copping, R., Hagenbuch, N., Biedermann, S., Noonan, J., Lehmann, B., Shenvi, A., Doan, X. V., Leslie, D., Bianconi, G., Sanchez-Garcia, R., Davies, A., Mackintosh, M., Andrinopoulou, E.-R., Basiri, A., Harbron, C., & MacArthur, B. D. (2023). Learning from data with structured missingness. Nature Machine Intelligence, 5(1), 13–23. 10.1038/s42256-022-00596-z
  9. Jackson, J., Mitra, R., Hagenbuch, N., McGough, S., & Harbron, C. (2023). A Complete Characterisation of Structured Missingness. https://arxiv.org/abs/2307.02650