Comments (15)
Building a binder image
This is used at the API endpoint (e.g. api.mybinder.org) to work with binder image resources.
Creating a new binder
POST /binders/CodeNeuro/notebooks/ HTTP 1.1
Content-Type: application/json
{
"name": "codeneuro-notebooks",
"repo": "https://github.com/CodeNeuro/notebooks",
"requirements": "repo/requirements.txt",
"notebooks": "repo/notebooks",
"services": [
{
"name": "spark",
"version": "1.4.1",
"params": {
"heap_mem": "4g",
"stack_mem": "512m"
}
}
]
}
That's copied straight from your current API, though it would be good to spec those out.
Detail on a binder
Right now the GET
on /apps/<organization>/<repo>/
returns the redirect URI (yes, tmpnb does this too because of its limited purpose, but they're already launched).
In my opinion this should tell you about the resource, returning what the spec was in the POST
as well as the status.
GET /binders/CodeNeuro/notebooks/ HTTP 1.1
would then return
{
"name": "codeneuro-notebooks",
"repo": "https://github.com/CodeNeuro/notebooks",
"requirements": "repo/requirements.txt",
"notebooks": "repo/notebooks",
"services": [
{
"name": "spark",
"version": "1.4.1",
"params": {
"heap_mem": "4g",
"stack_mem": "512m"
}
}
]
}
That could include that status as well.
Beyond that, I think a HEAD
request makes sense here for checking to see if a binder exists.
from binder.
Launching a binder
Right now this is part of the GithubBuildHandler
as a GET on the /apps/
resource. This piece could actually be distant from GitHub, relying only on image names (those that have been built, whitelisted, whatever). In the Docker API (just as a reference), they POST to /containers/create
with the payload.
Since we're talking "precanned" images (in the sense they were built prior or already existing) that get launched either by a user visiting a resource or via AJAX call by JavaScript, I think we can pick a solid resource name. I'm tilted towards Picked spawn
since it's a noun and we've already been using it in tmpnb.containers
after discussion on gitter.
Since we're creating a container, we'd want to start this off as a POST
with a GET
retrieving that same information.
POST /containers/CodeNeuro-notebooks/ HTTP 1.1
Accept: application/json
Which would return
{
"id": "12345"
}
If the resource was immediately available, then it could include the location
. Otherwise, retrieving the location for that specific container would be by GET
GET /containers/CodeNeuro-notebooks/12345
which returns
{
"location": "...",
"id": "12345"
}
You may ask yourself then, what if someone GETs the resource directly?
GET /containers/CodeNeuro-notebooks/ HTTP 1.1
Authorization: 5c011f6b474ed90761a0c1f8a47957a6f14549507f7929cc139cbf7d5b89
This should return all of the current containers that user is allowed to see.
[
{
"location": "...",
"id": "12345",
"uri": "/containers/CodeNeuro-notebooks/12345"
},
{
"location": "...",
"id": "787234",
"uri": "/containers/CodeNeuro-notebooks/787234"
}
]
from binder.
The last thing I posted, about returning all the currently spawned containers, would be super helpful for operations (as I've faced with tmpnb). It's probably worth thinking about authentication sooner rather than later, even if for your own administration.
It's easy to defer to a separate authentication store, relying on an LRU key store (that's what I made https://github.com/rgbkrk/lru-key-store for, when deferring to a separate identity service) to keep yourself from making repeated calls to, e.g. GitHub or another provider.
from binder.
Working with pools of pre-allocated binders
Thinking about the pooling that tmpnb does (and that I would hope for in binder), I imagine we could have an endpoint at /pool/
to set up capacities (and inspect allocations) for images:
GET /pool/{imageName} HTTP 1.1
returns
{
"running": 123,
"available": 12,
"minPool": 1
}
Updating the pool (by POST
or PUT
):
POST /pool/{imageName}
Authorization: 9f66083738d8e8fa48e2f19d4bd3bdb4821fa2d3fdc7d84e4228ded5e219
{
"minPool": 512
}
from binder.
How would you feel if spawn and pool both take image names (binders?) and it's up to the underlying implementation for whether they'll run that image or not?
from binder.
That seems good to me. I'm imagining that would look something like:
- Send the authorized
PUT
to/pool/
with the format you described above to create the pool - Check
imageName
against a whitelist of pool-able images (presumably only a small set of single-kernel images) - If
imageName
is either forbidden or does not exist, synchronously return some error code - Update the number of containers allocated to the pool with an authorized
POST
Sound right?
Thinking along these lines, we should definitely come up with a plan for decoupling image names from GitHub repos. There's currently an empty class called OtherSourceHandler
which was intended to complement GithubBuildHandler
, but for images from arbitrary sources (whose names can't be uniquely constructed from an organization/name combo). Perhaps that change should go in another issue...
from binder.
Yeah, sorry I brain dumped it all here. This would work well on a wiki once we're specced out. I imagine I'll use the same API on revision to tmpnb, which in my eyes would be a fairly restricted launcher that relies on pure Docker API (including swarm support out of the box).
For binder images that get created, I'd think of those as existing in a particular namespace - then you'd assume they're trusted.
Thinking about what I wrote though, maybe spawn
and pool
both take a binderName
since it's coupled with services, etc.
from binder.
This is awesome @rgbkrk ! I really like the API design and strategy. A couple comments (after chatting with @andrewosh ):
Great to split the GET and POST for the spawn, we wanted to decouple those anyway to make it easier to support a loading screen / query for finished deployments.
Not totally clear the best way to "spawn" an image that's already been "spawned" as part of a pool
. Does it make sense that a POST to spawn
either
(1) returns a location
immediately if the the image was part of a pool
, which we can figure out via a GET
on the pool
or
(2) triggers an actual "spawn", and just returns an id
, and then polling or websockets could be used to check whether its been deployed and has a valid location
.
Reasonable, or is there a better design?
Another question, re: naming, is whether we only support images linked to GitHub repos, in which case we can have a one-to-one mapping from repos <-> names. This will of course work for binders built from repos (by design), but for the "standard" images we want to support (e.g. for the lighterweight kernels), do we need more flexibility? Or can we assume those are linked to repos too?
from binder.
and 👍 to putting this into a wiki once specced out, and then eventually into proper documentation =)
from binder.
(2) triggers an actual "spawn", and just returns an id, and then polling or websockets could be used to check whether its been deployed and has a valid location.
I was curious about that too. I'll go ahead and update the above to reflect that, as I think the POST should return immediately. For now we can stick with long polling on the GET.
from binder.
Not totally clear the best way to "spawn" an image that's already been "spawned" as part of a pool.
Continuum once told me that I was cheating, since they're pre-spawned. I say it's an optimization we needed.
As you've said, they are already spawned when in the pool. Perhaps they're hatching at this point? Being allocated? Other nomenclature:
/hatchling/
/container/
These are the states we're wondering about:
- Allocated, unused containers
- Allocated, being used containers
- Containers being culled
- Containers being allocated
Perhaps this is a relationship between a container
and a userContainer
. That or a user
has a container
resource.
from binder.
What do you think about calling the resource we bundle/build an environment
? To a user we might call them binders, but in other contexts they're a built out resource for on demand computation coupled with:
- specification of software to install
- notebooks, data, code, to pull in (from a repo or otherwise - ok to stick with GH for the moment)
- linked services
- volumes, etc.
- the application to serve/run
The last one is important since there are many uses for kernels beyond notebooks. As @parente put it on a hackpad we were mocking up for a Kernel Service API:
... launch this
random.choice(['kernel', 'notebook', 'dashboard-app'])
...
As long as we make an API spec we're happy with, we'll probably end up with at least these three cluster managers
- Kubernetes - Binder
- Marathon / Mesos - mentioned as done by IBM
- Docker / Swarm - tmpnb, dockerspawner (jupyterhub), ephemit
from binder.
Once you have the kernels decoupled as the only thing that is provisioned, you can serve notebooks from a separate content store. That content store could even be GitHub itself. 👍 Doing it with gists is pretty simple as well. This also opens up the use of Google Drive, firebase, etc. as content stores that also allow for realtime collaboration.
However, we still want to be able to access datasets, etc. many times in typical POSIX ways (CSVs, etc.) straight from repositories. Hence why we either need to mount volumes, build them into the container, or expose services for data.
I do have a prototype that cheats and uses the notebook contents API to inject data from a repo. That code is not being used, but I've tried it out locally for fun. Example that renders a notebook from a gist, populates thebe cells, and connects you to a tmpnb kernel.
from binder.
Re: naming -- which is the most important part ;) -- we're way into aiming for a more general spec, to be used here and elsewhere, and here's our current favorite (with pithy summaries from above):
build
POST defines and creates a build, GET returns info on the build
container
POST triggers a launch (if unavailable) or returns a location (if available), GET returns a location (once available)
pool
POST or PUT sets up capacities, GET inspects allocations
build
was formerly binder
, the new name is more general, and container
was formerlyspawn
, but the new name is more allowing of the fact that these containers may have already been deployed (in a pool).
from binder.
build
makes sense, though the attached services make it more like a deployment specification/template.
from binder.
Related Issues (20)
- unable to get jupyterhub/singleuser-builder-venv-3.5:v0.2.1 HOT 21
- Language Support HOT 3
- Selenium with its webdriver (chrome/firefox) HOT 4
- Widget Javascript not detected HOT 5
- docs.mybinder have hard-to-read font
- Build does not appear to fail when package is missing
- Update the conda version in the base image HOT 3
- Versioned Docker images for reproducibility
- After a long time waiting, it failed HOT 2
- Update conda version
- Allowing for docker images not based on binder-base HOT 1
- R kernel dies HOT 1
- Build stuck on a running jupyter notebook server
- Rest API for requesting notebook servers
- Binder broken with Jupyter notebook version >= 4.3.0 HOT 2
- Is mybinder.org still supported? HOT 3
- custom.css and custom.js
- Binder Launch Fails HOT 2
- Wont finish building
- Infographic for binder HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from binder.