Binder

In this chapter, we will discuss Project Binder and mybinder.org as a means to transparently and interactively share research.

Why should you share your work?

Motivation for sharing research outputs is more deeply explored in the Open Research chapter.

In short, sharing your research code can help provide context to the results you present by illustrating the process you went through to reach them. By sharing code, we also avoid reinventing the wheel in order to make progress on a research topic since the previous tools are available to be built on top of.

However, the biggest barrier to sharing code is often installing packages and setting up the computational environment, as we will see in the next section. By sharing your work via platforms such as mybinder.org:

  • Installing software packages is no longer a challenge

  • People using different operating systems have similar experiences since the computation is happening on the platform, not their local machine

  • Your work can be distributed to a broader audience since the technical barrier has been lowered

What is Project Binder?

We’ve discussed why it’s important to share your work and we’ve reached a point where we’ve decided to publish some Jupyter Notebooks with analysis code on a collaboration platform, such as GitHub.

GitHub is a great platform for sharing code statically. If the repository is public, anyone can navigate to your Notebook and read the contents. However, running code is a lot more complicated than just displaying it as GitHub does. A lot of interdependent parts are required to run code, such as:

  • a copy of the code itself;

  • the appropriate software to execute it;

  • any extra packages the code depends on that aren’t shipped as part of the core software;

  • any input data the analysis requires;

  • and you also need some hardware (a computer!) to run it on as well.

On top of acquiring all those parts, you also have to install them correctly and in such a way that they are not influenced or come into conflict with other software that may be running on your machine. It’s a lot of work!

How much easier would it be if we could run code in the browser, similar to how it’s displayed? This is what Project Binder aims to achieve.

Project Binder provides a user with the following infrastructure:

  • some hardware to execute code, usually a server hosted in the cloud but can be on-premise hardware too;

  • a computational environment containing:

    • the approriate software,

    • any extra package dependencies,

    • any required input data,

    • and a copy of the code itself (Notebooks or scripts);

  • a URL to where the environment is running so the code can be interacted with by you or your collaborators.

Project Binder has packaged together all of the moving parts that make it challenging to share computational work into a simple to use interface. There is a free and public version of this interface running at mybinder.org.

The cartoon below, by Juliette Taka, demonstrates one workflow a that scientist using Binder might adopt.

You can find out more about Project Binder and mybinder.org on their About mybinder.org page.

Disambiguation

In this section, there are some related terms, which will be outlined here for clarity:

  • Project Binder: An open community that makes it possible to create sharable, interactive, reproducible environments. The technological output of this project is a BinderHub.

  • BinderHub: A cloud-based infrastructure for generating Binders. The most widely-used is mybinder.org, which is maintained by the Project Binder team. It is built upon a range of open source tools, including JupyterHub, for providing cloud compute resources to users via a browser; and repo2docker, for building docker images from projects. Since it is an open project, it is possible to create other BinderHubs which can support more specialised configurations. One such configuration could include authentication to enable private repositories to be shared amongst close collaborators.

  • A Binder: A sharable version of a project that can be viewed and interacted within a reproducible computational environment running in the cloud via a web browser. By automating the installation of the computing environment (as discussed in the Reproducible Environments chapter), Project Binder transforms the overhead of sharing such an environment into the act of sharing a URL.

  • mybinder.org: A public and free BinderHub. Because it is public, you should not use it if your project requires any personal or sensitive information (such as passwords).

  • Binderize: The process of making a Binder from a project.

When is it appropriate to use mybinder.org?

Maintaining a free, anonymous service in the cloud is a lot of voluntary work and costs a lot of money. In order to reduce the running costs somewhat, mybinder.org places computational restrictions on each running Binder instance. These restrictions are:

  • 1 CPU, and

  • 1 GB of RAM.

Hence, mybinder.org is not an appropriate place to perform end-to-end replications of Machine Learning workflows, for example!

And this is the primary reason why this chapter on Binder has been placed in the “Guide for Communication”. With these computational restrictions, mybinder.org lends itself very well to hosting interactive demonstrations and learning resources for software packages or research analyses. In this scenario, the people clicking the Binder link probably want to learn something, and sitting through a time-consuming model-training process likely won’t help them achieve that. Instead, you could provide pre-trained models or instructions on how to train the models on their own hardware and come back to the Binder for the remainder of the interactive tutorial.

So, when is it appropriate to use mybinder.org?

  • When you want to communicate something in an interactive manner, such as short analyses, tutorials, or even blogs! Check out Achintya Rao’s blog powered by mybinder.org!

  • When the code and associated data (if relevant) are publicly available

  • When the code you want to run interactively does not require a lot of resource or specialist resources (for example, GPUs)

FAQs

Many common questions are answered on the About mybinder.org page.

How do I save my changes back to my repository?

Unfortunately, you can’t. At least, not from the command line in a running Binder instance.

Writing back to a hosted repository, whether it be on GitHub or some other platform, will require a credential of some kind to authorise you to write to that repository. And as has been mentioned, mybinder.org is a completely public service and you should not provide any sensitive information to a running Binder instance under any circumstances.

However, mybinder.org does run an add-on called jupyter-offlinenotebook which provides a download button to save your notebooks locally, even if your browser has lost its connection with the cloud infrastructure that is providing the compute! This means you can save your progress locally, update your repository with your saved notebooks, and relaunch your Binder with the updated notebooks.

A screenshot of the control panel of a Jupyter Notebook with a download button highlighted by a purple rectangle.

Fig. 82 Using this “Download” button in a Jupyter Notebook running on mybinder.org will allow you to save your notebooks locally, even after the Binder instance has been disconnected from computational resources.

How can I collaborate with my peers on mybinder.org?

It’s not impossible, but there’s definitely room to develop this feature in comparison to other “free cloud compute” services available.

Those who are interested in this, can find out more in this Discourse post and in the jupyterlab-link-share repository.

How is mybinder.org different to Google Colab?

Google Colab provides a “kitchen sink” computational environment with many of the most popular data science software packages pre-installed. In contrast, mybinder.org builds bespoke images for each repository launched, specifically installing the packages listed in your configuration files.

Can I connect to INSERT DATA PROVIDER HERE?

Network connections on mybinder.org are quite limited for security and abuse-prevention purposes. That being said you should be able to connect to an external data provider so long as it satisfies the following two criteria:

  • It can be accessed over an HTTP/HTTPS connection

  • You do not need credentials to access the data

Remember, mybinder.org is an entirely public service and under no circumstances should you provide confidential information, such as credentials, to a Binder instance.

How to create a Binder-ready project

The next chapter contains a Zero-to-Binder tutorial that will guide you through creating your first Binder-ready project on GitHub.