jupyterhub / binderhub Goto Github PK

View Code? Open in Web Editor NEW

2.5K 78.0 385.0 8.07 MB

Run your code in the cloud, with technology so advanced, it feels like magic!

Home Page: https://binderhub.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 84.57% HTML 3.28% JavaScript 9.24% CSS 1.34% Shell 0.78% Dockerfile 0.47% Mustache 0.32%

jupyterhub binder jupyter-notebook

binderhub's Introduction

BinderHub

What is BinderHub?

BinderHub allows you to BUILD and REGISTER a Docker image from a Git repository, then CONNECT with JupyterHub, allowing you to create a public IP address that allows users to interact with the code and environment within a live JupyterHub instance. You can select a specific branch name, commit, or tag to serve.

BinderHub ties together:

JupyterHub to provide a scalable system for authenticating users and spawning single user Jupyter Notebook servers, and
Repo2Docker which generates a Docker image using a Git repository hosted online.

BinderHub is built with Python, kubernetes, tornado, npm, webpack, and sphinx.

Documentation

For more information about the architecture, use, and setup of BinderHub, see the BinderHub documentation.

Contributing

To contribute to the BinderHub project you can work on:

answering questions others have,
writing documentation,
designing the user interface, or
writing code.

To see how to build the documentation, edit the user interface or modify the code see the contribution guide.

Installation

BinderHub is based on Python 3, it's currently only kept updated on GitHub. However, it can be installed using pip:

pip install git+https://github.com/jupyterhub/binderhub

See the BinderHub documentation for a detailed guide on setting up your own BinderHub server.

Why BinderHub?

Collections of Jupyter notebooks are becoming more common in scientific research and data science. The ability to serve these collections on demand enhances the usefulness of these notebooks.

Who is BinderHub for?

Users who want to easily interact with computational environments that others have created.
Authors who want to create links that allow users to immediately interact with a computational enviroment that you specify.
Deployers who want to create their own BinderHub to run on whatever hardware they choose.

License

See LICENSE file in this repository.

binderhub's People

Contributors

Stargazers

Watchers

Forkers

willingc choldgraf minrk jdetle anantmittal yuvipanda gnestor ian-r-rose mfraezz carreau aculich ctb betatim meeseeksbox mariusvniekerk hydrosquall thefotios rs2 drorata ventilooo syutbai olesyabar harrytanme pombredanne rockyzyl henfee ibell colcarroll www3838438 iamsubhokarmakar rakelkar sksundaram-learning nikolayvoronchikhin gladysnalvarte novavic sangramga ballon3 anaderi neerajbansal24 adl raphbacher batermj wfreelandecon gesiscss djsegal gbonomib aluciano7893 feststelltaste sandbox4kids datalayer-externals johnewart gbraad marius92mc ablekh 3838438org cohenim jzf2101 mazhar266 freddupont westurner rizplate sophiezxf zmughal dalejung data-navigator databooks machinelearning-spain nikosyahputra neverwell luizirber evertrol cxz qianglisinoeusa vandop harunpehlivan awesomedatatool sabarnwa zymergen-luke genikolja i386uk 13768324554 bitnik awesome-repositories strategist922 mdyzma adrianrocamora kkoojjyy sfouilloux feitianyiren consideratio imccommons jaynoel okj8-jupyter kobemallorca18 anthonymcqueen21 nunofernandes-plight konstantinklepikov tchen0123 the-cc-dev z4-oauth

binderhub's Issues

URL Type: Zip files

This is a future feature idea. Some of the sphinx-gallery devs said that it'd be useful if Binder supported zipfiles that basically had the full contents of a github repository.

So you could give binder a URL to a zip file, and the inside of that file is the contents of a github repository. Binder would unzip first, and then treat the contents the same way as any other repo. Doesn't seem too tough to implement but I wonder if there would be security issues?

cc @Titan-C in case he has any thoughts!

Add support for RISE display

It would be really cool if users could build repositories that were meant to be viewed with RISE. This way people would click a link and they'd be taken to an interactive RISE view automatically, which could be used to step people through material, tell stories in a linear fashion, etc.

This should be tagged as a future feature request but I think it'd be pretty cool if we could implement this such that we use RISE when giving talks about Binder in the future!

Federation support

Me and @fperez came up with a pretty good (and really simple) federation scheme for binders late evening this friday.

I'll document that here in a day or two!

Rebuild does not clear status bar

If you have a failed build, enter updated file path and press rebuild. The status bar does not clear when doing the new build.

Building Julia dependencies

We also need to support dependencies for Julia. Discuss implementations etc here!

Repository qualities to test + incorporating tests in general

Before Binder 2.0 we should test it out on a few specific repositories to make sure some common usecases work as expected. What should we try?

Now:

Python + requirements.txt
Python + environment.yml
Python + Dockerfile
Python + Pre-binder-2.0 Dockerfile

Future:

R + ???
Python + interactivity/jupyter widgets?
???

Milestones for summer conferences

Hey folks - @yuvipanda and I met to discuss some milestones we should hit before a few key summer conferences. The first is JuliaCon, where Fernando is giving a talk, and the second is JupyterCon, where we're all giving talks of one kind or another.

I've added two lists to our project page here: https://github.com/jupyterhub/binderhub/projects/1

I've tried to break them down like this:

Task name | (Opt or Req) [ INITIALS ]

Where opt = optional but preferred, and req = required for that conference. The initials correspond to C=chris, Y=yuvi. Feel free to convert any of those into issues or add your initials so that we can start tackling them!

Decide on what to do for default when there is no build pack detected

For a lot of repositories there's no requirements.txt or environment.yaml file.

We should decide what to do in these cases.

Directly talk to the JupyterHub API to launch services

We should have code in binderhub that actually talks to the JupyterHub API to launch the user, and return info about the launched pod rather than rely on redirects forever.

Figure out a local setup / testing solution for Binder

So people can use this for local development as well, not just for use on binder.

Add UI to generate badges

Right now people don't know how to generate badges unless they've been on gitter! Make a fairly small UI for this.

Add a README

What is this thing, what is the scope of it, why do we want it?

Also how to run it.

Building JavaScript dependencies

We also need to support dependencies for Javascript (maybe including things that require a function call after installing, e.g. some of the jupyter extensions?). Discuss implementations etc here!

Improve error messages for users on building

Build fails if there is no requirements.txt in root directory.

Create example binder repos

We need to do at least two things:

Make sure all the old binder examples point to our new examples that cover the same kinds of material
Cover some use-cases many people will have, using each of the languages / build workflows that we support.

Here's where new examples will be located:

https://github.com/binder-examples

Old repos

New repos

legacy dockerfile (using the andrewosh image)
from jupyter stacks dockerfile
from your own dockerfile

Use cases

Installing latex

New usecases for which we should have repos:

Switch to querying the Docker Registry API to cache images

Right now we do some crazy hacks to detect if an image has already been built.

Instead, we should hit the Docker Registry API to check if it exists.

Support for pointing to gists

Quicker than setting up an entire github repository!

Check if image already exists before attempting to build it

Shouldn't rebuild image twice!

UI - Add indicator/message that file path is optional

File to Launch is optional.

ToDo for beta.mybinder.org

ToDo before beta

Chris

Build the Binder diagram
Finalize new text for main page (under 'how it works', since it is slightly different)
Make favicon work
Figure out how to get images inheriting from 'binder-base' to work. Figure out what is actually in that image, and if we can find the source somewhere
Test a lot of binder repositories to make sure we're fully compatible
Blog post for the release (#50)

Yuvi

Rebuild the UI landing page using HTML/CSS
Get an IP address for beta.mybinder.org
Fix the progress / logging UX to be understandable
Not doing for now: ~~Set up Prometheus / Grafana for metrics on this (assuming this isn't a lot of work, otherwise we move it to later, but I think it'd be good to start collecting data abt users/downtime ASAP)~~
Make sure that currently existing 'launch binder' buttons work
Figure out a permalink solution v2
Support non-master refs
Scale up the beta mybinder deployment (maybe autoscale?)

Min

Add support for anaconda yml installs

Set limits on building Docker containers

So we don't actually overload machines by building containers that are way too large.

Information to track from incoming users

We should decide on what pieces of information we'd like to know about where users come from, what is the thing that brought them to SG, etc.

Some specific ideas:

Referring location (github.com / personal website / copypaste / sphinx-gallery)

Make notebooks iframeable

Currently they are not by default. Turn on allowing iframes.

Add support for building from environment.yml

What can I do to help get conda working? Supporting environment.yml is super important for lots of scientific cases where requirements.txt is a non-starter and building from a Dockerfile can take hours. But for now, I don't know what repo I should look at or how to get testing.

Increase bandwidth for notebook data transfer

This will be particularly useful for more data-heavy interactive stuff in the notebook, e.g. ipywidgets. I think this is the relevant config parameter:

c.NotebookApp.iopub_data_rate_limit

can somebody make me an admin?

@willingc @minrk @yuvipanda ? :-)

Support building arbitrary Dockerfiles

Gotta do it! Lots of people have pretty custom stacks...

Not exactly sure how to restrict the docker images we'll run, or why exactly we should. Should do a proper, full on thorough examination of the security properties...

Blog post

Probably going to riff off of Andrew's original blog post:

https://elifesciences.org/labs/a7d53a88/toward-publishing-reproducible-computation-with-binder

Once we announce the beta we should partially do so with a blog post on the jupyter website, twitter, etc. Here are some things to mention:

Principles behind the changes
Technical backend differences
What's different between current and previous from a user perspective
- New URL structure
- The initial environment is different
- Be more specific / complete about your requirements
- No more manual building, it gets automatically rebuilt if someone clicks a link
- This means that if you push to master it'll now automatically get rebuilt
- However now you can point it to a specific hash / tag / branch
- It should be much faster after the initial build
Public grafana dashboard
Hardware restrictions for users
Link to documentation for new deployments
Future development stuff
- R support

Suggestions from other folks? @yuvipanda @willingc @minrk ?

Write documentation on getting set up for development

This needs the following pre-requisites:

A Kubernetes cluster that is running and we have API access to
A JupyterHub that's configured properly to run a tmpnb type setup
A Docker Registry to push images to

We should lay out how to set all these up, a sample builder_config.py file and instructions on dev setup

Binderfile config file

At the Jupyter sprint we discussed setting up a "meta-config" file called a Binderfile. I think we should think about implementing this sooner than later, as it will take care of many user concerns (e.g. running shell commands) and will be helpful for extending binder functionality (e.g., telling Binder to start in JupyterLab mode).

General behavior

multiple file types supported in a hierarchical fashion (e.g., reqs.txt, env.yml etc)
for these, if one if found, then it triggers a build and anything lower in the hierarchy doesn't happen

Specific to binderfile

if you want MORE than one of these to be triggered, you need a binder.yml file. This has key: value pairs.
- the key is the name of one of the files that could be in the hierarchy (e.g. pip, conda)
- the value is a list of items, one per line, that mimics exactly the content that would have been inside of the file if were a standalone file instead of being inside of binder.yml
In this case, each key will be triggered

Example

inside binder.yml:

image: <path-to-image>

pip:
    package1
    package2==2.0
conda:
    package3
bash:
    touch myfile.txt
    python myfile.py
    nbconfig etc etc

Outstanding questions

Should each section name be exactly the same as the name of the file if it were standalone? E.g., of the form e.g, use environment.yml: and requirements.txt: above instead of conda: and pip:, respectively.
How to support that the value can also point to a file of the same nature (e.g., ./docs/requirements.txt)

Links

Original etherpad

Change the favicon for fail

What do people think about making the favicon change if the build fails? Might be a useful feature since building can take a while sometimes and you'd get a little visual cue in your tab that way...

BUILDER: Environment variables

We should let users define environment variables w/ their repositories.

Should this be its own file, or should we just ask people to use a binder.yml file in this case?

Pre-provision Binder instances w/ data

In the sphinx-gallery issue (sphinx-gallery/sphinx-gallery#244 (comment)) someone mentioned it would be very useful to be able to pre-provision some Binder instances with, e.g., data that is normally downloaded before examples. Sometimes the data can take many hours to download, but the examples themselves only take a minute or two.

API Proposal

This needs a versioned, simple RESTful API that allows for a wide variety of use cases.

Some of the use cases it should cater for:

Working as an on-demand backend for ipywidgets and related toolkits that want to talk to a kernel
mybinder.org like use-cases with various levels of authentication

We should collect other use cases too before finalizing the API design.

Some API design guidelines I like (and should re-read before doing the design):

Google's API Design guidelines https://cloud.google.com/apis/design/
Heroku's API Design guidelines https://geemus.gitbooks.io/http-api-design/content/en/

Feature a list of 'featured' repositories on the binder home page

So people don't have to 'go find something useful' when we launch.

Berkeley: Mechanical engineering workshop

@aculich could you note a few thoughts / pros and cons about your use of Binder in the event we recently spoke about? Stuff like something that would have made the use of binder more simple, something you expected it to do that it didn't do, that kind of thing. I'd like to keep track of the experience folks have!

Stable URL scheme for badges + links to launch binder

This is different from #13 since it focuses solely on what links people will use when clicking 'launch binder' links, rather than a HTTP API that code can also call. Requires different design and guarantees.

We also need to have a compatibility layer so current launch binder links continue to work.

Current URL structure

Only supported URL structure now is:

/repo/<username>/<repo>/<path-to-start>

While a great start, it has some extensibility problems:

Assumes it is using GitHub, and doesn't provide an easy way to use other providers.
Unclear how it can handle arbitrary git URLs (that aren't hosted in a 'provider' as such)
Not specified how you can provide other arbitrary run-time parameters in jupyterhubs that support it (such as memory limits, extra data to be provisioned, etc)

Whatever we do, we'll make sure these links continue to work for the foreseeable future. We should find and send PRs to people tho to change them.

Proposed new URL structure

/v2/repo/<git-clone-url>/ref/<ref>/?<runtime-params>

where:

git-clone-url is URL encoded URL pointing to a git repository (such as a github URL). We will interpret this pretty generously.
ref points to a commit hash, branch or tag. If it's a branch or tag it'll be redirected to a commit hash - permanently for a tag and temporarily for a branch.
runtime-params is query parameters used for all runtime parameters - including the path to launch, and in the future other parameters too (extra data to load, RAM requirements, etc). These will be formally defined too.

This separates the build parameters (git url + ref) from the runtime parameters. Also establishes that the canonical URL is the one with the commit hash, rather than branch or tag info.

Possible modifications:

~~Version this API too, with a /v2/ prefix~~ Decided to actually do this.
Special case 'path' runtime parameter, include that in the path. Not sure how necessary that is.

Set up doctr w/ the jupyterhub account to auto-build the docs

I just realized that the docs that are being served on the website are really outdated. I've updated them with a push to gh-pages, but we should get doctr set up to auto-build with travis so we don't need to think about this. I could set it up myself but then it'd be tied to my github account, I think it'd be better if the jupyterhub account were building. What do folks think?

BUILDER: Apt install

We should support apt-installs from a text file

Building R dependencies

We should figure out a way to handle R dependencies. Here are some thoughts from Carl on this:

R mechanism for listing dependencies:

In an R package, the DESCRIPTION file plays the role of a requirements.txt in stating the dependencies, minimal version needed, and where get them (e.g. CRAN or additional cran-type repo like bioconductor).

This approach does not accommodate installing something that is not the most recent version of a package. (CRAN archives old sources, but because, unlike python or ruby gems distribution, CRAN is designed to provide binaries & you can't guarantee binaries build for an old /archived source, the default install does not immediately support installing archived packages).

If you just have a list of packages you want, I recommend something along the lines of what we do with rocker, e.g.

install2.r `cat deps.txt`

Where deps.txt is just a list of package names you want to install. If these come from multiple repos (cran & bioconductor), just list those as arguments to -r:

install2.r -r "https://cran.rstudio.com" -r "https://bioconductor.org/pagkages/release" cat deps.txt

If you want to install the same version each time, just use an MRAN snapshot of the appropriate date.

This installs everything from source of course, and assumes you have the system dependencies (like openssl, or libxml2) installed already, but you probably do since python is the same way. Compiling a full R stack from scratch can take some time; you could save effort by building on any of the existing Rocker stack images (using an R version tag as appropriate).

Alternative approaches which I don't recommend for general use:

packrat is a rather heavyweight solution. Packrat is designed to lock you into a particular version of every package. The only way to guarantee you can install the right version of some remote source (could be from github, from CRAN, etc) is to make a local copy of that source, and this is exactly what packrat does. While it creates a manifest listing dependencies, versions, and sources that looks kinda like a requirements.txt file, it is not intended you generate such a manifest by hand, and it only supports the notion of == version, which means it is going to be a nuciance to maintain if you are regularly trying to provide access to an R image with updated software.
https://github.com/mangothecat/pkgsnap is a lighter-weight approach, which generates a .csv file of your current library. Again is generated based on your current install rather than than writing out a list by hand, and has no notion of >= dependency, but tracks name, version and source of installation.

These are great for an individual user who wants to lock their library at the current state, which may include an arbitrary mix of up-to-date and not up-to-date packages.

Add checkbox to launch with JupyterLab instead of Notebook

Should be fairly trivial to do!

Shall await an 'OK' from @ellisonbg and/or @fperez before doing this, though.

Make this work with minikube

We should possibly allow configuring this to not need an image registry, for the (common?) special case of just having one node (such as minikube). It'll just check local docker instead of registry.

This might eventually allow us to have a single node non-kubernetes setup if needed.

Add better error handling when no repo is entered

Reproduce: Press Launch without entering a repo. There is no error indication to the user that a repo hasn't been entered.

Separate out the provider of repositories (GitHub) from the building code

Should make it trivial to add more sources in the future (such as bitbucket)

Update license to Jupyter BSD version

ToDo for Binder v0.1beta

After many months of hard work it is time to launch the new Binder backend! Let's use this issue to coordinate the last few steps we need to take. @yuvipanda and @freeman-lab correct me if I'm wrong, but here's the list as I see it:

ToDo

Tech

Set slowspawn timeout to 0 so that pre-built images will launch faster (@yuvipanda)
Error pages: incl kubernetes full error + check for other miscellaneous errors to catch #91 (@minrk)
tmpnb auth bug causing people to have to clear their cache (@minrk or @yuvipanda)
Figure out a UI to create the badges for newly-built binders ( #122 )
Figure out the URL for the badges ( #122 )
Make sure that the badge SVG will still link properly so images don't break ( #122 )

Documentation

Documentation on user repo structure (@choldgraf)
Flow chart on when to use a Dockerfile (@choldgraf @yuvipanda @minrk )
Templates / best-practices for Dockerfiles (@choldgraf @yuvipanda @minrk )
Add a little more error handling (@yuvipanda)

Misc

Swap omgwtf to mybinder.org (@yuvipanda)
Have a privacy policy (@yuvipanda and @choldgraf)
~~Deprecate all the old Binder repos to point to the new deployment. (@choldgraf)~~
~~Transfer DNS ownership to Jupyter (@freeman-lab and @choldgraf)~~
Make sure old badge URLs will still work ( #122 )

Release

Point mybinder.org to the new Binder deployment (@yuvipanda)
Send out a blog post talking about changes etc (@choldgraf) (handled in another repo)

Non blocking

Delete stale users
UI improvements so it doesn't look like a form (maybe discuss w/ Granger UI team)
Get the Binder logo on the notebook pages (@choldgraf)

Update

Here's a whiteboard from JupyterCon:

503 Error After Building

A repository (GitHub link, mybinder link) that was launching successfully as recently as Sunday, July 16th at approximately 5:00 PM (Pacific) started failing to launch by that following morning, Monday, July 17th at approximately 10:00 AM. It was successfully built from the cache, but the new tab that normally contains the environment instead returned a 503 server error. The repo was originally built roughly one month ago (using beta.mybinder.org).

Thinking it might be a caching issue, I forced a rebuild by adjusting some non-critical whitespace. Again, it was successfully built and pushed, but again I got the 503 error. I reproduced this issue with a different repo that had previously deployed successfully.

Any help would be greatly appreciated. This is a great product, and I love using it to share Jupyter notebooks!