Giter Club home page Giter Club logo

renku-python's Introduction

Renku Python Library, CLI and Service

image

image

image

image

Documentation Status

image

A Python library for the Renku collaborative data science platform. It includes a CLI and SDK for end-users as well as a service backend. It provides functionality for the creation and management of projects and datasets, and simple utilities to capture data provenance while performing analysis tasks.

NOTE:

renku-python is the python library and core service for Renku - it does not start the Renku platform itself - for that, refer to the Renku docs on running the platform.

Renku for Users

Installation

Renku releases and development versions are available from PyPI. You can install it using any tool that knows how to handle PyPI packages. Our recommendation is to use :code:pipx.

Note

We do not officially support Windows at this moment. The way Windows handles paths and symlinks interferes with some Renku functionality. We recommend using the Windows Subsystem for Linux (WSL) to use Renku on Windows.

Prerequisites

Renku depends on Git under the hood, so make sure that you have Git installed on your system.

Renku also offers support to store large files in Git LFS, which is used by default and should be installed on your system. If you do not wish to use Git LFS, you can run Renku commands with the -S flag, as in renku -S <command>. More information on Git LFS usage in renku can be found in the Data in Renku section of the docs.

Renku uses CWL to execute recorded workflows when calling renku update or renku rerun. CWL depends on NodeJs to execute the workflows, so installing NodeJs is required if you want to use those features.

For development of the service, Docker is recommended.

pipx

First, install pipx and make sure that the $PATH is correctly configured.

$ python3 -m pip install --user pipx
$ python3 -m pipx ensurepath

Once pipx is installed use following command to install renku.

$ pipx install renku
$ which renku
~/.local/bin/renku

pipx installs Renku into its own virtual environment, making sure that it does not pollute any other packages or versions that you may have already installed.

Note

If you install Renku as a dependency in a virtual environment and the environment is active, your shell will default to the version installed in the virtual environment, not the version installed by pipx.

To install a development release:

$ pipx install --pip-args pre renku

pip

$ pip install renku

The latest development versions are available on PyPI or from the Git repository:

$ pip install --pre renku
# - OR -
$ pip install -e git+https://github.com/SwissDataScienceCenter/renku-python.git#egg=renku

Use following installation steps based on your operating system and preferences if you would like to work with the command line interface and you do not need the Python library to be importable.

Windows

Note

We don't officially support Windows yet, but Renku works well in the Windows Subsystem for Linux (WSL). As such, the following can be regarded as a best effort description on how to get started with Renku on Windows.

Renku can be run using the Windows Subsystem for Linux (WSL). To install the WSL, please follow the official instructions.

We recommend you use the Ubuntu 20.04 image in the WSL when you get to that step of the installation.

Once WSL is installed, launch the WSL terminal and install the packages required by Renku with:

$ sudo apt-get update && sudo apt-get install git python3 python3-pip python3-venv pipx

Since Ubuntu has an older version of git LFS installed by default which is known to have some bugs when cloning repositories, we recommend you manually install the newest version by following these instructions.

Once all the requirements are installed, you can install Renku normally by running:

$ pipx install renku
$ pipx ensurepath

After this, Renku is ready to use. You can access your Windows in the various mount points in /mnt/ and you can execute Windows executables (e.g. \*.exe) as usual directly from the WSL (so renku run myexecutable.exe will work as expected).

Docker

The containerized version of the CLI can be launched using Docker command.

$ docker run -it -v "$PWD":"$PWD" -w="$PWD" renku/renku-python renku

It makes sure your current directory is mounted to the same place in the container.

CLI Example

Initialize a Renku project:

$ mkdir -p ~/temp/my-renku-project
$ cd ~/temp/my-renku-project
$ renku init

Create a dataset and add data to it:

$ renku dataset create my-dataset
$ renku dataset add my-dataset https://raw.githubusercontent.com/SwissDataScienceCenter/renku-python/master/README.rst

Run an analysis:

$ renku run --name my-workflow -- wc < data/my-dataset/README.rst > wc_readme

Trace the data provenance:

$ renku workflow visualize wc_readme

These are the basics, but there is much more that Renku allows you to do with your data analysis workflows. The full documentation will soon be available at: https://renku-python.readthedocs.io/

Renku as a Service

This repository includes a renku-core RPC service written as a Flask application that provides (almost) all of the functionality of the Renku CLI. This is used to provide one of the backends for the RenkuLab web UI. The service can be deployed in production as a Helm chart (see helm-chart.

Deploying locally

To test the service functionality you can deploy it quickly and easily using docker-compose up [docker-compose](https://pypi.org/project/docker-compose/). Make sure to make a copy of the renku/service/.env-example file and configure it to your needs. The setup here is to expose the service behind a traefik reverse proxy to mimic an actual production deployment. You can access the proxied endpoints at http://localhost/api. The service itself is exposed on port 8080 so its endpoints are available directly under http://localhost:8080.

API Documentation

The renku core service implements the API documentation as an OpenAPI 3.0.x spec. You can retrieve the yaml of the specification itself with

` $ renku service apispec`

If deploying the service locally with docker-compose you can find the swagger-UI under localhost/api/swagger. To send the proper authorization headers to the service endpoints, click the Authorize button and enter a valid JWT token and a gitlab token with read/write repository scopes. The JWT token can be obtained by logging in to a renku instance with renku login and retrieving it from your local renku configuration.

In a live deployment, the swagger documentation is available under https://<renku-endpoint>/swagger. You can authorize the API by first logging into renku normally, then going to the swagger page, clicking Authorize and picking the oidc (OAuth2, authorization_code) option. Leave the client_id as swagger and the client_secret empty, select all scopes and click Authorize. You should now be logged in and you can send requests using the Try it out buttons on individual requests.

Developing Renku

For testing the functionality from source it is convenient to install renku in editable mode using pipx. Clone the repository and then do:

$ pipx install \
    --editable \
    <path-to-renku-python>[all] \
    renku

This will install all the extras for testing and debugging.

If you already use pyenv to manage different python versions, you may be interested in installing pyenv-virtualenv to create an ad-hoc virtual environment for developing renku.

Once you have created and activated a virtual environment for renku-python, you can use the usual pip commands to install the required dependencies.

$ pip install -e .[all]  # use `.[all]` for zsh

Service

Developing the service and testing its APIs can be done with docker compose (see "Deploying Locally" above).

If you have a full RenkuLab deployment at your disposal, you can use telepresence v1 to develop and debug locally. Just run the start-telepresence.sh script and follow the instructions. Mind that the script doesn't work with telepresence v2.

Running tests

We use pytest for running tests. You can use our run-tests.sh script for running specific set of tests.

$ ./run-tests.sh -h

We lint the files using black and isort.

Using External Debuggers

Local Machine

To run renku via e.g. the Visual Studio Code debugger you need run it via the python executable in whatever virtual environment was used to install renku. If there is a package needed for the debugger, you need to inject it into the virtual environment first, e.g.:

$ pipx inject renku ptvsd

Finally, run renku via the debugger:

$ ~/.local/pipx/venvs/renku/bin/python -m ptvsd --host localhost --wait -m renku.ui.cli <command>

If using Visual Studio Code, you may also want to set the Remote Attach configuration PathMappings so that it will find your source code, e.g.

{
    "name": "Python: Remote Attach",
    "type": "python",
    "request": "attach",
    "port": 5678,
    "host": "localhost",
    "pathMappings": [
        {
            "localRoot": "<path-to-renku-python-source-code>",
            "remoteRoot": "<path-to-renku-python-source-code>"
        }
    ]
}

renku-python's People

Contributors

ableuler avatar burnout87 avatar cmdoret avatar dependabot-preview[bot] avatar dependabot[bot] avatar eikek avatar emmjab avatar github-actions[bot] avatar jirikuncar avatar jsam avatar leafty avatar li-il-li avatar liligasser avatar lokijuhy avatar lorenzo-cavazzi avatar m-alisafaee avatar olevski avatar pameladelgado avatar panaetius avatar renkubot avatar rokroskar avatar spmohanty avatar vfried avatar vigsterkr avatar wesjdj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

renku-python's Issues

metadata: properly serialize nested JSON-LD-aware classes

  • Implement following helper attributes builders: jsonld.container.(set|list|index)
    • example usage: jsonld.container.list(Author)

Options

a) build a single @context from all nested objects
b) include @context in each nested object + implement custom loader?

extracting submodule history fails if not modified by renga

Sequence to reproduce:

mkdir foo
mkdir bar
cd foo
renga init
echo woop > ../woop
renga datasets add dataset ../woop
cd ../bar
renga init
renga datasets create dataset
renga datasets add dataset ../foo/data/dataset/woop
renga run wc data/dataset/foo/data/dataset/woop > woop.wc
cd ../foo
echo woop2 > data/dataset/woop
git commit -am 'commiting changes to woop'
cd ../bar
git submodule update --rebase --remote
git commit -am 'update submodule'
renga status

On branch master
Files generated from outdated inputs:
  (use "renga log <file>..." to see the full lineage)

	woop.wc:

Normally it should display: <name_of_submodule>@<commit_sha1>.

Renga init with endpoint does not save it

Scenario:

renga init --endpoint https://example.com --autosync

Then the following is not working:

renga contexts list

I can fix by changing .renga/config.yml from:

core:
  autosync: true
  generated: '2017-10-25T11:36:10.689296'
  name: MyProject
endpoints:
  https://example.com:
    vertex_id: '20688'

to

core:
  autosync: true
  generated: '2017-10-25T11:36:10.689296'
  name: MyProject
  default: https://example.com
endpoints:
  https://example.com:
    vertex_id: '20688'

but it would be nice to have it inferred from the init command.

Better errors + cleanup when `renga init` is executed in an already existing git repo

If you run renga init in an already existing git repo, the error is confusing, and hard to know about the -f flag.
Also, it does not rollback or cleanup the .renga folder it created in the process, which throws error when you try a renga init -f later.

If you run renga init in an already existing renga repo, the error is confusing as you get a FileExistsError.

Publish docs

  • include autodoc for clients and cli
  • enable hook for Read the Docs

from_config doesn't work as advertised

In [4]: client = renga.from_config(endpoint='http://localhost')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-b511b07226e7> in <module>()
----> 1 client = renga.from_config(endpoint='http://localhost')

~/Projects/renga-python/renga/cli/_client.py in from_config(config, endpoint)
     31     """
     32     if config is None:
---> 33         config = read_config()
     34         project_config_path = get_project_config_path()
     35         if project_config_path:

TypeError: read_config() missing 1 required positional argument: 'path'

Re-run CWL steps on the hosted platform

  • use the CI pipeline functionality of gitlab to rerun steps
  • create custom images for rerunning renga-generated steps
  • create a .gitlab-ci.yml automatically in each renga repo

Import datasets from renku-aware repos

Importing from a git repository that contains a .renku directory should automatically reuse the included metadata about authors/creators of various entities.

  • remove the local filesystem path (privacy issues)
  • reference the original dataset metadata file: $ref: ...
  • use submodule index to iterate over files when importing from a Git repo

addresses SwissDataScienceCenter/renku#135

create basic CLI

Need a CLI that allows for this workflow at a minimum:

$ renga login
  • guides the user through obtaining an offline token
  • set up the platform access points
  • create a ~/.renga.conf file that stores user settings and tokens
$ renga init <project>
  • initialize a project, including adding a node to the KG
  • creates a .renga metadata file for project-specific configuration
$ renga add
  • add code and/or data from KG
  • from git repo
  • from URL
renga notebook
  • launch a notebook, mounting . in the notebook container and setting it up with the proper environment for interacting with the platform

Construct workflows from steps

link together several steps to form a workflow

  • construct a CWL workflow from a file's provenance graph
  • resolve dependency paths and save workflow to disk for reuse

trying to access a nonexistent context should return a 404

inside a renga-deployed notebook:

import renga
client = renga.from_env()
client.contexts[0]

--> HTTP 500

from the logs:

sqlalchemy.exc.StatementError: (builtins.ValueError) bytes is not a 16-char string 
[SQL: 'SELECT contexts.created AS contexts_created, contexts.updated AS contexts_updated, contexts.id AS contexts_id, contexts.spec AS contexts_spec, contexts.jwt AS contexts_jwt 
FROM contexts 
WHERE contexts.id = %(param_1)s'] [parameters: [{'%(140547263475216 param)s': '0'}]]

fix renga notebook cli

  • renga notebook fails
  • renga notebook show doesn't show the running notebooks
  • implement renga notebook stop
  • implement numbering of running notebooks for easier navigation?
  • implement automatic opening of chosen notebook in browser

will be done in #31

cli: fix issues with `datasets add`

  • authors are removed after calling add (related to #119)
  • files/<NAME>/path is not serialized as str but pathlib.PosixPath
  • check how the target is joined to origin path (//)
  • warn when importing local git repository
  • adding a specific file without using -t doesn't work

renga io buckets throws an error

$ renga io buckets

Traceback (most recent call last):
  File "/Users/rok/.virtualenvs/renga/bin/renga", line 11, in <module>
    load_entry_point('renga', 'console_scripts', 'renga')()
  File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 1064, in invoke
    sub_ctx = cmd.make_context(cmd_name, args, parent=ctx)
  File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 621, in make_context
    self.parse_args(ctx, args)
  File "/Users/rok/Projects/renga-python/renga/cli/_group.py", line 28, in parse_args
    if args[0] in self.commands:
IndexError: list index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.