swissdatasciencecenter / renku-notebooks Goto Github PK
View Code? Open in Web Editor NEWAn API service to provide jupyter notebooks for the Renku platform.
Home Page: https://renkulab.io
License: Apache License 2.0
An API service to provide jupyter notebooks for the Renku platform.
Home Page: https://renkulab.io
License: Apache License 2.0
The notebook service should provide an endpoint that determines which is the image that should be used for a particular commit - this is assuming that we only build images on changes to the files that define the environment. For example, we could use the functionality of gitlab CI which can limit job triggers to only when specific paths change:
image_build:
stage: build
image: docker:stable
before_script:
- docker login -u gitlab-ci-token -p $CI_JOB_TOKEN http://$CI_REGISTRY
script:
- CI_COMMIT_SHA_7=$(echo $CI_COMMIT_SHA | cut -c1-7)
- docker pull renku/singleuser:latest
- docker build --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA_7 .
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA_7
only:
changes:
- Dockerfile
- requirements.txt
- envs/*
tags:
- image-build
related to #98
Still failing on usernames with non alpha-numeric characters:
{"reason":"FieldValueInvalid","message":"Invalid value: \"renku-jupyter-rok-2erosk-rok_2erosk-proj-85a5229\": a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')"
The username was rok.roskar
. This issue is continued from SwissDataScienceCenter/renku#252.
If the notebook server takes a long time to spawn (e.g. because images have to be downloaded), JH will wait for 30 seconds and then redirect to a url like .../hub/user/...
which fails with a 401 because it's not actually a valid URL (it still gets proxied to the notebooks service for some reason which queries gitlab for a non-existent project, hence the 401).
The way to fix this is to wait until the server is ready before redirecting. It's unclear how to get this information.
moved from SwissDataScienceCenter/renku#210 -- see SwissDataScienceCenter/renku#210 (comment)
Sometimes notebook instances stop (e.g. node enters an out of memory condition) and current work is permanently lost.
Is there a way to provide some form of backup for these cases?
Changes in here needed to resolve SwissDataScienceCenter/renku#299
https://github.com/SwissDataScienceCenter/renku-notebooks/blob/master/jupyterhub/spawners.py#L111
returns 404
in access_level = gl_project.members.get(self.gl_user.id).access_level
: the members are only the manually added ones, not the ones belonging to a group (and the access was granted to the goup).
Jupyterhub should be a dependency of renku-notebooks.
We should allow the notebook service to consume JWT tokens for authentication - this is a pre-requisite for providing dedicated resources to certain users and/or groups.
Since jupyterlab/jupyterlab-git#210 has been merged, should we add the extension by default?
On Renkulab: If an operation in a notebook runs out of memory, this is not reported by Jupyterlab. The kernel will restart, but cell will continue to appear to be pending. There is no error message in the notebook itself.
The <service-prefix>
endpoint right now just gives a JSON of the user object. It should give the authenticated user an overview of the running notebook servers and the means to stop them.
One main issue to resolve is how to actually get the information about running servers per user. Right now this seems to be kept in the user object obtained through the HubOAuthenticator
but this is cached via some (as of yet) to me obscure mechanism. Since the user info is cached, we don't get up-to-date info unless we reduce the caching timeout. There should be some way to query the hub directly, however.
related to SwissDataScienceCenter/renku-ui#151
Add options to include a GPU request in the pod manifest.
The server information we receive from the REST API does no include server state. If we stop a server, it disappears from the list of user's servers -- however, it may still be in the process of shutting down. If the user then tries to start it again before it shuts down completely (in k8s before the pod is completely gone) then JH returns a 400 (pod is terminating). We need to somehow get more up-to-date information about server state to mitigate this.
This is needed to better respond to problems during server spawn, e.g. when an image is not able to be pulled.
See also: SwissDataScienceCenter/renku-ui#273
After #56, if the image is still building, renku-notebooks
launches the default image instead of waiting imageBuildTimeout
Need some actual tests
Pod disk storage is turning out to be a problem (not unexpectedly) -- we should allow pods to make a resource request for ephemeral storage.
...
"resources": {
"cpu_request": {
"default": 0.1,
"displayName": "Number of CPUs",
"enum": "float",
"options": [
0.1,
0.5,
1,
2,
4,
8
]
},
...
note the enum
instead of type
.
renku-notebooks/singleuser/Dockerfile
Line 51 in e3a5b98
From magic recipe:
conda create -y -n renku python=3.6
$(conda env list | grep renku | awk '{print $2}')/bin/pip install -e git+https://github.com/SwissDataScienceCenter/renku-python.git#egg=renku
mkdir -p ~/.renku/bin
ln -s "$(conda env list | grep renku | awk '{print $2}')/bin/renku" ~/.renku/bin/renku
echo "export PATH=~/.renku/bin:$PATH" >> $HOME/.bashrc
source $HOME/.bashrc
renku --version
which renku
Avoid infinite loop if build job is stuck.
See:
renku-notebooks/jupyterhub/spawners.py
Lines 137 to 168 in d7ac292
At the moment, we simply issue an API request to jupyterhub for a server spawn and wait until jupyterhub reports that this server is running. This is definitely not the way we want to handle server launches.
We should improve the sequence to provide some extra feedback to the user about what is happening. Most importantly, the notebooks service flask app should not block waiting for the server to spawn. We should discuss how to move this forward -- some options:
<namespace>/<project>/<sha>
is not running, the endpoint should return an page that shows the status of the pod/server being launched and a 202 return code<namespace>/<project>/<sha>
is running, the server/notebook is returned as normalDELETE
on the JH API for the server so that it is immediately removed from the proxy etc.<namespace>/<project>/<sha>/status
endpoint? Or just <namespace>/<project>/<sha>?status
?POST
launch the notebook, GET
gives the status of the launchOnce this is done, it will most likely fix #23
if application/json
is requested, return the JSON for the server status
The notebooks service should publish its swagger file.
From @erbou on September 28, 2018 10:22
Is your feature request related to a problem? Please describe.
The starting Jupyterlab will not notify me if something went wrong, and can make me wait forever.
Describe the solution you'd like
Give the option to get more details (can be summarized, no need for a full log) about what it is doing, and what's left to do, with a brief status (Ok, Fail) of operation, such as:
* [Ok] starting docker container
* [Fail] cloning repo
* [ ] importing data <- for git submodule update / git lfs pull
Describe alternatives you've considered
At a minimum we should report an error, so that know when there's no point to keep waiting.
Copied from original issue: SwissDataScienceCenter/renku-ui#325
the defaultUrl
value is not propagated to the container.
Currently, the notebook service sets a max-age
value of 12 hours on the cache-control
header for the server_options
endpoint. I assume that this was not set on purpose, but that it's just a side-effect from serving the json as a static file. I suggest to dramatically reduce this value or to remove the header all together.
URLs returned by the notebook service api should be absolute, i.e. include the host name.
...
url: "/jupyterhub/user/cramakri/cramakri-weather-2d-9789210/"
should become
'''
url: "https://renkulab.io/jupyterhub/user/cramakri/cramakri-weather-2d-9789210/"
such that the UI can open that URL directly when opening a notebook server tab for the user.
The @authenticated
decorator should accept JupyterHub-issued tokens.
We should provide a way to check with the registry if the image exists and if it doesn't give options to either build it or to use the default image.
Set Git config lfs.storage
to a separate volume that can be shared among notebook servers spawned for the same project.
Pod \"jupyter-johann-2Et-johann-2et-newnew-a342109\" is invalid: [metadata.name: Invalid value: \"jupyter-johann-2Et-johann-2et-newnew-a342109\": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character
?branch=<name>
should create a server with different name that includes branch nameClicking on Connect
while the server is not ready results in a 500 page on jupyterhub.
It appears that JH doesn't offer an API for removing spawning servers. We should somehow be able to remove them, however, if e.g. someone tries to spawn a server and a requested number of GPUs are not available.
The mounted volume (empty for) does not respect case at the moment. It results in a lab instance which cannot be used
Edit: This happens when converting existing projects to Renku (and not when creating them through the UI).
Use gunicorn or similar in the Docker image.
When a user requests a new server, check how many are already running and if that number is equal to MAX_USER_SERVERS
then shut down the oldest one before starting the new one.
Note (@leafty): shutdown oldest from the same (project, user). The idea is that the user is iterating with his/her Docker image
Once the pod is running, the page loads fine. This happened while waiting for the pod to start.
log:
10.36.0.8 - - [10/Jul/2018 06:56:58] "GET /jupyterhub/services/notebooks/demo/test/74871936eed7d586d8034a3ecadd444f369492df?branch=master HTTP/1.1" 500 -
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2309, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2295, in wsgi_app
response = self.handle_exception(e)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1741, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/app/src/notebooks_service.py", line 77, in decorated
return f(user, *args, **kwargs)
File "/app/src/notebooks_service.py", line 218, in launch_notebook
headers=headers
File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The spawner should only execute the instruction to use a certain image and not do anything about the image build itself.
The launch_notebook
function is synchronous -- this will eventually lead to a terrible user experience. Should be using asyncio
.
The server_options
need validation in several places:
values.yaml
that the admin passes on deploymentserverOptions
in the body is processedIt's not obvious what the validation should consist of, however.
resources.cpu_request
, resources.mem_request
etc.serverOptions
that are passed in through the request conform to the types specified in the values.yaml
.This validation was left as a todo from pr #68.
It would be cool to have vim pre-installed on the default image.
If a server is already running, this is not handled properly.
From @rokroskar on September 21, 2018 6:8
We need a Dockerfile that uses the nbrsessionproxy Dockerfile as a base, adding on renku-related pieces. See https://github.com/jupyterhub/nbrsessionproxy/blob/master/Dockerfile for a start.
Copied from original issue: SwissDataScienceCenter/renku#425
server_options
endpointserver_options
from POST
body and relay to the spawnerThis should be set in the repository on checkout:
git lfs install --skip-smudge --local
Otherwise, every git checkout
command will try to pull LFS objects leading to general unhappiness.
need to configure ssh deploy keys for pushing to the chart repo and docker hub.
We should have GIT_LFS_SKIP_SMUDGE=1
set by default and the possibility to override it via server_options
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.