Sharing your Jupyter notebook#
One neat way of sharing your research workflow - whether that is your data cleaning process, the statistical modelling that you have applied to your data set, or how you have visualised your results - is through Jupyter notebooks.
Working with sensitive data may, however, create barriers to sharing your Jupyter notebook: you do not want to commit a Jupyter notebook containing sensitive data to GitHub.
You can get around this hurdle by manually clearing your notebook’s output before each Git commit you do.
This process is, however, time-consuming, cumbersome and - most importantly - extremely error-prone.
You only need to forget to clear the output once to inadvertently expose your data.
Another, much more efficient (and failsafe!) way of doing this is by using
nbstripoutis a utility that - when used as a git filter or pre-commit hook - automatically strips the Jupyter (or IPython/Zeppelin) notebook output before Git even gets the chance to see it.
In other words, it simulates the Clear All Output procedure in the Jupyter notebook user interface.
The latest version of
nbstripout can be installed from PyPI (The Python Package Index) using the command
pip install --upgrade nbstripout.
If you are using Anaconda, you can install
nbstripout via the conda package manager:
conda install -c conda-forge nbstripout.
Setting up the git filter and attributes#
nbstripout is installed, you need to add it to your local Git repository.
Start by creating a new repository or navigating to one that you are already using. Once there, add
nbstripout using the command
You can check that
nbstripout has successfully been applied as a filter by running the command
cat .git/config, or checking git attributes by running the command
Removing the git filter and attributes#
If you decide that you would like to remove
nbstripout, simply run
nbstripout --uninstall whilst in the repository.
nbstripout is generally installed in one local Git repository at a time, so that you can control when it is applied as a filter.
However, if all of your notebooks deal with sensitive data, it might be a good idea to install
nbstripout globally across all of your Git repositories.
This way, no notebooks risk slipping under the radar.
nbstripout globally, run the command:
nbstripout --install --global