One neat way of sharing your research workflow - whether that is your data cleaning process, the statistical modelling that you have applied to your data set, or how you have visualised your results - is through Jupyter notebooks.
Working with sensitive data may, however, create barriers to sharing your Jupyter notebook: you do not want to commit a Jupyter notebook containing sensitive data to GitHub.
You can get around this hurdle by manually clearing your notebook’s output before each Git commit you do.
This process is, however, time-consuming, cumbersome and - most importantly - extremely error-prone.
You only need to forget to clear the output once to inadvertently expose your data.
Another, much more efficient (and failsafe!) way of doing this is by using nbstripout.
nbstripoutis a utility that - when used as a git filter or pre-commit hook - automatically strips the Jupyter (or IPython/Zeppelin) notebook output before Git even gets the chance to see it.
In other words, it simulates the Clear All Output procedure in the Jupyter notebook user interface.
The latest version of nbstripout can be installed from PyPI (The Python Package Index) using the command pipinstall--upgradenbstripout.
If you are using Anaconda, you can install nbstripout via the conda package manager: condainstall-cconda-forgenbstripout.
Once nbstripout is installed, you need to add it to your local Git repository.
Start by creating a new repository or navigating to one that you are already using. Once there, add nbstripout using the command nbstripout--install.
You can check that nbstripout has successfully been applied as a filter by running the command cat.git/config, or checking git attributes by running the commandcat.git/info/attributes.
nbstripout is generally installed in one local Git repository at a time, so that you can control when it is applied as a filter.
However, if all of your notebooks deal with sensitive data, it might be a good idea to install nbstripout globally across all of your Git repositories.
This way, no notebooks risk slipping under the radar.
To install nbstripout globally, run the command: nbstripout--install--global