Giter Club home page Giter Club logo

renku-notebooks's Introduction

Renku notebooks

CI Conventional Commits

A simple service using the Amalthea operator to provide interactive Jupyter notebooks for the Renku platform.

The service relies on renku-gateway for authentication. However, anonymous users are supported as well in which case anyone can start and use sessions for public renku projects. Therefore, the notebook service can run even without having the renku-gateway installed or present. In this case only sessions for anonymous users can be launched.

Endpoints

The service defines endpoints to list the active sessions for a user, start or stop a session. It can also provide the logs of a running user session or information about work that was saved automatically if a user stops a session without committing and pushing all their work to their project repository.

The endpoints for the API will be defined in the swagger page of any Renku deployment. The swagger page is usually available at https://<domain-name>/swagger/?urls.primaryName=notebooks%20service.

Here you can look at the swagger page for the renkulab.io deployment and explore the endpoints in more detail.

Sequence diagram

Please note that the Notebook service does not execute kubectl commands from the shell against a kubernetes cluster. It uses the k8s client for Python. Simple shell kubectl commands are shown below for clarity. The notebook service uses the Python k8s client to execute the equivalent queries directly from Python without using the shell.

Please refer to the swagger page on the renkulab.io deployment for additional information on the format of the requests and responses from the API.

  sequenceDiagram
    participant User
    participant Notebooks
    participant k8s
    participant Gitlab
    participant Image Repo
    User->>+Notebooks: GET /servers/<server_name><br>GET /servers
    Notebooks->>k8s: `kubectl get jupyterservers`
    k8s->>Notebooks: <br>
    Notebooks->>-User: List of servers
    User->>+Notebooks: POST /servers<br>{project, commit_sha, image}
    Notebooks->>+Gitlab: Check that the project, commit sha exist
    Gitlab->>Notebooks: <br>
    Notebooks->>+Image Repo: Check that the image exists
    Image Repo->>Notebooks: <br>
    Notebooks->>+k8s: `kubectl create jupyterserver`
    k8s->>Notebooks: <br>
    Notebooks->>-User: Server information
    User->>+Notebooks: DELETE /servers/<server_name>
    Notebooks->>k8s: `kubectl delete jupyterserver`
    k8s->>Notebooks: <br>
    Notebooks->>-User: Delete confirmation
    User->>+Notebooks: GET /servers/server_options
    Notebooks->>-User: List of allowable server options
    User->>+Notebooks: GET /logs/<server_name>
    Notebooks->>k8s: `kubectl logs`
    k8s->>Notebooks: <br>
    Notebooks->>-User: Logs
    User->>+Notebooks: GET /images?image_url=<image_url>
    Notebooks->>+Image Repo: Check that the image exists
    Image Repo->>Notebooks: <br>
    Notebooks->>-User: Image exists

Usage

The best way to use renku-notebooks is as a part of a renku platform deployment. As described above using renku-notebooks without the other components in the renku platform will only allow the usage of anonymous sessions for public renku projects. This is a drawback because anonymous sessions do not allow users to save their work but rather to quickly test something out or explore what renku has to offer.

If used as a part of renku the notebook service receives all required user credentials from renku-gateway, another service in the renku platform. These credentials include information about the user and their git credentials. The notebook service then uses the git credentials to clone the user's repository, pull images from the registry if needed and sets up a proxy that handles and authenticates all git commands issued by the user in the session without asking the user to log in GitLab every time they launch a session.

Building images and charts

To build the images and render the chart locally, use chartpress. Install it with pip or use poetry install.

Development flow

You can run the notebook service locally in a few easy steps:

  • install poetry
  • run poetry install
  • create a copy of example.config.hocon in the root of the repository called .config.hocon and fill in the required values
  • if using VS code simply use the Flask configuration from .vscode/launch.json
  • if not using VS code execute FLASK_APP=renku_notebooks/wsgi.py FLASK_ENV=development CONFIG_FILE=.config.hocon poetry run flask run --no-debugger -h localhost -p 8000

In addition to the above steps if you have a running Renku deployment you can use [telepresence] (https://www.telepresence.io/docs/latest/install/) to route traffic from a deployment to your development environment. After you have set up telepresence you can simply run the run-telepresence.sh script. This script will try to find a Renku Helm deployment in your current K8s context and active namespace. Then it will redirect all traffic for the notebooks service from the deployment to your local machine at port 8000. Combining telepresence with the steps above can be used to quickly test a notebook service in a full Renku deployment.

renku-notebooks's People

Contributors

ableuler avatar ciyer avatar cramakri avatar dependabot-preview[bot] avatar dependabot[bot] avatar github-actions[bot] avatar jachro avatar jirikuncar avatar leafty avatar lorenzo-cavazzi avatar m-alisafaee avatar olevski avatar pameladelgado avatar panaetius avatar renkubot avatar rokroskar avatar snyk-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

renku-notebooks's Issues

Use symlink, not alias for renku in singleuser

echo "alias renku=$CONDA_DIR/envs/renku/bin/renku" >> /home/$NB_USER/.bashrc

From magic recipe:

conda create -y -n renku python=3.6
$(conda env list | grep renku | awk '{print $2}')/bin/pip install -e git+https://github.com/SwissDataScienceCenter/renku-python.git#egg=renku
mkdir -p ~/.renku/bin
ln -s "$(conda env list | grep renku | awk '{print $2}')/bin/renku" ~/.renku/bin/renku
echo "export PATH=~/.renku/bin:$PATH" >> $HOME/.bashrc
source $HOME/.bashrc
renku --version
which renku

Create a notebooks landing page

The <service-prefix> endpoint right now just gives a JSON of the user object. It should give the authenticated user an overview of the running notebook servers and the means to stop them.

One main issue to resolve is how to actually get the information about running servers per user. Right now this seems to be kept in the user object obtained through the HubOAuthenticator but this is cached via some (as of yet) to me obscure mechanism. Since the user info is cached, we don't get up-to-date info unless we reduce the caching timeout. There should be some way to query the hub directly, however.

cc/ @ableuler @ciyer

related to SwissDataScienceCenter/renku-ui#151

disable smudge with repo option

This should be set in the repository on checkout:

git lfs install --skip-smudge --local

Otherwise, every git checkout command will try to pull LFS objects leading to general unhappiness.

Return absolute instead of relative URLs

URLs returned by the notebook service api should be absolute, i.e. include the host name.

...
url: "/jupyterhub/user/cramakri/cramakri-weather-2d-9789210/"

should become

'''
url: "https://renkulab.io/jupyterhub/user/cramakri/cramakri-weather-2d-9789210/"

such that the UI can open that URL directly when opening a notebook server tab for the user.

Don't make us wait forever when starting a docker/jupyterlab fails.

From @erbou on September 28, 2018 10:22

Is your feature request related to a problem? Please describe.
The starting Jupyterlab will not notify me if something went wrong, and can make me wait forever.

Describe the solution you'd like
Give the option to get more details (can be summarized, no need for a full log) about what it is doing, and what's left to do, with a brief status (Ok, Fail) of operation, such as:

* [Ok]   starting docker container
* [Fail]  cloning repo
* [    ]   importing data   <- for git submodule update / git lfs pull

Describe alternatives you've considered
At a minimum we should report an error, so that know when there's no point to keep waiting.

Copied from original issue: SwissDataScienceCenter/renku-ui#325

Fix float server options

...
 "resources": {
        "cpu_request": {
            "default": 0.1,
            "displayName": "Number of CPUs",
            "enum": "float",
            "options": [
                0.1,
                0.5,
                1,
                2,
                4,
                8
            ]
        },
...

note the enum instead of type.

Create a better server launch flow

At the moment, we simply issue an API request to jupyterhub for a server spawn and wait until jupyterhub reports that this server is running. This is definitely not the way we want to handle server launches.

We should improve the sequence to provide some extra feedback to the user about what is happening. Most importantly, the notebooks service flask app should not block waiting for the server to spawn. We should discuss how to move this forward -- some options:

  • if the server for <namespace>/<project>/<sha> is not running, the endpoint should return an page that shows the status of the pod/server being launched and a 202 return code
  • if the server for <namespace>/<project>/<sha> is running, the server/notebook is returned as normal
  • the page with the status should redirect to the running server once it's up
  • we should use k8s/docker clients for checking on the server spawn so we can report errors as they arise. If the pod launch errors, call DELETE on the JH API for the server so that it is immediately removed from the proxy etc.
  • do we need a <namespace>/<project>/<sha>/status endpoint? Or just <namespace>/<project>/<sha>?status? No, use CRUD and make POST launch the notebook, GET gives the status of the launch

Once this is done, it will most likely fix #23

Session backup

Sometimes notebook instances stop (e.g. node enters an out of memory condition) and current work is permanently lost.

Is there a way to provide some form of backup for these cases?

use dns-safe server names

Still failing on usernames with non alpha-numeric characters:

{"reason":"FieldValueInvalid","message":"Invalid value: \"renku-jupyter-rok-2erosk-rok_2erosk-proj-85a5229\": a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')"

The username was rok.roskar. This issue is continued from SwissDataScienceCenter/renku#252.

how to remove pending servers

It appears that JH doesn't offer an API for removing spawning servers. We should somehow be able to remove them, however, if e.g. someone tries to spawn a server and a requested number of GPUs are not available.

400 when trying to access a server that is being terminated

The server information we receive from the REST API does no include server state. If we stop a server, it disappears from the list of user's servers -- however, it may still be in the process of shutting down. If the user then tries to start it again before it shuts down completely (in k8s before the pod is completely gone) then JH returns a 400 (pod is terminating). We need to somehow get more up-to-date information about server state to mitigate this.

should wait for server to be ready before redirecting

If the notebook server takes a long time to spawn (e.g. because images have to be downloaded), JH will wait for 30 seconds and then redirect to a url like .../hub/user/... which fails with a 401 because it's not actually a valid URL (it still gets proxied to the notebooks service for some reason which queries gitlab for a non-existent project, hence the 401).

The way to fix this is to wait until the server is ready before redirecting. It's unclear how to get this information.

moved from SwissDataScienceCenter/renku#210 -- see SwissDataScienceCenter/renku#210 (comment)

http 500 while pod is starting

Once the pod is running, the page loads fine. This happened while waiting for the pod to start.

log:

10.36.0.8 - - [10/Jul/2018 06:56:58] "GET /jupyterhub/services/notebooks/demo/test/74871936eed7d586d8034a3ecadd444f369492df?branch=master HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2309, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2295, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1741, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/app/src/notebooks_service.py", line 77, in decorated
    return f(user, *args, **kwargs)
  File "/app/src/notebooks_service.py", line 218, in launch_notebook
    headers=headers
  File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Introduce timeout for pending images.

Avoid infinite loop if build job is stuck.

See:

while True:
if status == 'success':
# the image was built
# it *should* be there so lets use it
self.image = '{image_registry}'\
'/{namespace}'\
'/{project}'\
':{commit_sha_7}'.format(
image_registry=os.getenv('IMAGE_REGISTRY'),
commit_sha_7=commit_sha_7,
**options
).lower()
self.log.info(
'Using image {image}.'.format(image=self.image)
)
break
elif status in {'failed', 'canceled'}:
self.log.info(
'Image build failed for project {0} commit {1} - '
'using {2} instead'.format(
project, commit_sha, self.image
)
)
break
yield gen.sleep(5)
status = self._get_job_status(pipeline, 'image_build')
self.log.debug(
'status of image_build job for commit '
'{commit_sha_7}: {status}'.format(
commit_sha_7=commit_sha_7, status=status
)
)

was SwissDataScienceCenter/renku#318 reported by @leafty

use asyncio in launch_notebook

The launch_notebook function is synchronous -- this will eventually lead to a terrible user experience. Should be using asyncio.

Modify `cache-control` header for `server_options` endpoint

Currently, the notebook service sets a max-age value of 12 hours on the cache-control header for the server_options endpoint. I assume that this was not set on purpose, but that it's just a side-effect from serving the json as a static file. I suggest to dramatically reduce this value or to remove the header all together.

validate server_options

The server_options need validation in several places:

  • in the values.yaml that the admin passes on deployment
  • in the service code where the request withserverOptions in the body is processed

It's not obvious what the validation should consist of, however.

  1. Enforce only a specific set of server options, e.g. resources.cpu_request, resources.mem_request etc.
  2. Make sure that the serverOptions that are passed in through the request conform to the types specified in the values.yaml.
  3. ?

This validation was left as a todo from pr #68.

OOM issue not reported on jupyterlab

On Renkulab: If an operation in a notebook runs out of memory, this is not reported by Jupyterlab. The kernel will restart, but cell will continue to appear to be pending. There is no error message in the notebook itself.

accept a JWT from trusted source

We should allow the notebook service to consume JWT tokens for authentication - this is a pre-requisite for providing dedicated resources to certain users and/or groups.

provide appropriate image ref/tag to use for a commit

The notebook service should provide an endpoint that determines which is the image that should be used for a particular commit - this is assuming that we only build images on changes to the files that define the environment. For example, we could use the functionality of gitlab CI which can limit job triggers to only when specific paths change:

image_build:
  stage: build
  image: docker:stable
  before_script:
    - docker login -u gitlab-ci-token -p $CI_JOB_TOKEN http://$CI_REGISTRY
  script:
    - CI_COMMIT_SHA_7=$(echo $CI_COMMIT_SHA | cut -c1-7)
    - docker pull renku/singleuser:latest
    - docker build --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA_7 .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA_7
  only: 
    changes:
      - Dockerfile
      - requirements.txt 
      - envs/*
  tags:
    - image-build

related to #98

Project with upper case is not started properly

The mounted volume (empty for) does not respect case at the moment. It results in a lab instance which cannot be used

Edit: This happens when converting existing projects to Renku (and not when creating them through the UI).

Still some uppercase issues

Pod \"jupyter-johann-2Et-johann-2et-newnew-a342109\" is invalid: [metadata.name: Invalid value: \"jupyter-johann-2Et-johann-2et-newnew-a342109\": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character

limit the number of simultaneous servers

When a user requests a new server, check how many are already running and if that number is equal to MAX_USER_SERVERS then shut down the oldest one before starting the new one.

Note (@leafty): shutdown oldest from the same (project, user). The idea is that the user is iterating with his/her Docker image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.