swissdatasciencecenter / renku-python Goto Github PK

View Code? Open in Web Editor NEW

37.0 9.0 29.0 41.54 MB

A Python library for the Renku collaborative data science platform.

Home Page: https://renku-python.readthedocs.io/

License: Apache License 2.0

Python 99.08% Shell 0.30% Makefile 0.08% Dockerfile 0.08% JavaScript 0.47%

renku

renku-python's Issues

metadata: properly serialize nested JSON-LD-aware classes

Implement following helper attributes builders: jsonld.container.(set|list|index)
- example usage: jsonld.container.list(Author)

Options

a) build a single @context from all nested objects
b) include @context in each nested object + implement custom loader?

include JSON-LD context with all metadata

include contexts with:

project
~~CWL steps/workflows~~
~~.renga metadata~~ (is included in project)
datasets

nested serialization to be handled via #119

implement SDK initialization

want something like:

client = renga.from_env()

enable launching of a specific notebook

allow e.g. the following:

$ renga runner notebook --notebook-path notebooks/my_notebook.ipynb

instruct the user of changes to git repo if initializing renku in an existing one

renku init has some side effects if run in an existing git repo (inits lfs, for example). The user should be made aware of these changes and also instructed on how to bring the existing repo in-line with what we expect (e.g. command to run to add existing files to git-lfs).

CLI: dev documentation

create basic CLI

Need a CLI that allows for this workflow at a minimum:

$ renga login

guides the user through obtaining an offline token
set up the platform access points
create a ~/.renga.conf file that stores user settings and tokens

$ renga init <project>

initialize a project, including adding a node to the KG
creates a .renga metadata file for project-specific configuration

$ renga add

add code and/or data from KG
from git repo
from URL

renga notebook

launch a notebook, mounting . in the notebook container and setting it up with the proper environment for interacting with the platform

add option to specify a notebook image

create notebook images with renga installed

make several general notebook images with renga pre-installed based on https://hub.docker.com/r/jupyter/

workflows: fix serialisation of File and Path

renga workflow create <FILE> generates invalid CWL due to wrong YAML serialisation.

SDK: Read/write data with client

define schemas for configs

`default_bucket` used in `add` when `autosync=true` but it is not defined at `init` time

In other words, the default bucket is not created by the cli at project creation time

return state and creation time in CLI listings

Better errors + cleanup when `renga init` is executed in an already existing git repo

If you run renga init in an already existing git repo, the error is confusing, and hard to know about the -f flag.
Also, it does not rollback or cleanup the .renga folder it created in the process, which throws error when you try a renga init -f later.

If you run renga init in an already existing renga repo, the error is confusing as you get a FileExistsError.

create a jupyter notebook docker image with renga-python pre-installed

from_config doesn't work as advertised

In [4]: client = renga.from_config(endpoint='http://localhost')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-b511b07226e7> in <module>()
----> 1 client = renga.from_config(endpoint='http://localhost')

~/Projects/renga-python/renga/cli/_client.py in from_config(config, endpoint)
     31     """
     32     if config is None:
---> 33         config = read_config()
     34         project_config_path = get_project_config_path()
     35         if project_config_path:

TypeError: read_config() missing 1 required positional argument: 'path'

create python api skeleton

Basic python API that can be used inside e.g. a jupyter notebook to interact with the platform.

fix renga notebook cli

renga notebook fails
renga notebook show doesn't show the running notebooks
implement renga notebook stop
implement numbering of running notebooks for easier navigation?
implement automatic opening of chosen notebook in browser

will be done in #31

Implement `renga status` that checks the validity of data provenance

all CWL outputs should be produced from latest version of inputs

--
addresses SwissDataScienceCenter/renku#117

extracting submodule history fails if not modified by renga

Sequence to reproduce:

mkdir foo
mkdir bar
cd foo
renga init
echo woop > ../woop
renga datasets add dataset ../woop
cd ../bar
renga init
renga datasets create dataset
renga datasets add dataset ../foo/data/dataset/woop
renga run wc data/dataset/foo/data/dataset/woop > woop.wc
cd ../foo
echo woop2 > data/dataset/woop
git commit -am 'commiting changes to woop'
cd ../bar
git submodule update --rebase --remote
git commit -am 'update submodule'
renga status

On branch master
Files generated from outdated inputs:
  (use "renga log <file>..." to see the full lineage)

	woop.wc:

Normally it should display: <name_of_submodule>@<commit_sha1>.

runner-spawned notebook should clone the environment's branch

Renga init with endpoint does not save it

Scenario:

renga init --endpoint https://example.com --autosync

Then the following is not working:

renga contexts list

I can fix by changing .renga/config.yml from:

core:
  autosync: true
  generated: '2017-10-25T11:36:10.689296'
  name: MyProject
endpoints:
  https://example.com:
    vertex_id: '20688'

core:
  autosync: true
  generated: '2017-10-25T11:36:10.689296'
  name: MyProject
  default: https://example.com
endpoints:
  https://example.com:
    vertex_id: '20688'

but it would be nice to have it inferred from the init command.

Import datasets from renku-aware repos

Importing from a git repository that contains a .renku directory should automatically reuse the included metadata about authors/creators of various entities.

remove the local filesystem path (privacy issues)
reference the original dataset metadata file: $ref: ...
use submodule index to iterate over files when importing from a Git repo

addresses SwissDataScienceCenter/renku#135

notebooks: recreate context when id from config is not found

$ make start
$ renga login
$ renga init --autosync
$ renga notebooks launch
$ make stop
$ docker-compose rm
$ make start
$ renga login
$ renga notebooks launch
# ERROR

renga runner rerun should fail gracefully if no tool is found

if no cwl is found, we should not raise an error because that breaks the pipeline on repos that have not used renga from the outset. Instead, fail gracefully to allow other stages of the pipeline to queue.

addresses SwissDataScienceCenter/renku#117

Re-run CWL steps on the hosted platform

use the CI pipeline functionality of gitlab to rerun steps
create custom images for rerunning renga-generated steps
create a .gitlab-ci.yml automatically in each renga repo

provide a template LFS configuration for the newly initialized project

~~- [ ] renga init should also create a .gitattributes file that will sync the data/ directory.~~

cli functions like renga run should add appropriate paths to git lfs for tracking
- renga run invocations should add the outputs to .gitattributes by default -- provide a flag to override the behavior (done in #116)
- renga datasets add should add all files except for metadata.json

addresses SwissDataScienceCenter/renku#117

Resolve lineage in submodules

addresses SwissDataScienceCenter/renku#117

Simple inputs and outputs

Define inputs and outputs and labels of a context:

labels:
  - renga.contexts.inputs.<name>(=[<bucket_id>,<file_id>])

relates to SwissDataScienceCenter/renga-deployer#46

cli: fix issues with `datasets add`

authors are removed after calling add (related to #119)
files/<NAME>/path is not serialized as str but pathlib.PosixPath
check how the target is joined to origin path (//)
warn when importing local git repository
adding a specific file without using -t doesn't work

CLI: add project vertex_id to http headers

currently the project vertex_id is passed as a label -- move it to headers to have the same behavior as storage

done in #31

Jupyter notebook docker image with Renga edition installed and minimal scientific packages

Fix bugs with provenance calculation

fix DAG display when the same file is used as input in multiple steps

--
addresses SwissDataScienceCenter/renku#117

possibility to upload/download files from CLI

like for example (it can be different):

renga io buckets upload BUCKET_ID FILENAME

renga io download FILE_ID > FILENAME

authentication: how to ensure the deployer-spawned session remains authenticated

Currently we pass an access token to the deployed container, which will expire. On solution could be to simply send a refresh token instead of the access token, but this has security implications. We need to solve this problem either by having a flow with authentication via the python client or some other means.

construct a CWL workflow from a file's provenance graph
resolve dependency paths and save workflow to disk for reuse

import renga
client = renga.from_env()
client.contexts[0]

--> HTTP 500

from the logs:

sqlalchemy.exc.StatementError: (builtins.ValueError) bytes is not a 16-char string 
[SQL: 'SELECT contexts.created AS contexts_created, contexts.updated AS contexts_updated, contexts.id AS contexts_id, contexts.spec AS contexts_spec, contexts.jwt AS contexts_jwt 
FROM contexts 
WHERE contexts.id = %(param_1)s'] [parameters: [{'%(140547263475216 param)s': '0'}]]

cli: rerun with different parameters

enable this:

$ renga runner rerun --job job.yml .renga/workflow/step12345.cwl

dataset: recognize github urls and parse /tree/<branch-name>

we should recognize /tree/<branch-name> and checkout the branch

renga io buckets throws an error

$ renga io buckets

Traceback (most recent call last):
  File "/Users/rok/.virtualenvs/renga/bin/renga", line 11, in <module>
    load_entry_point('renga', 'console_scripts', 'renga')()
  File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 1064, in invoke
    sub_ctx = cmd.make_context(cmd_name, args, parent=ctx)
  File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 621, in make_context
    self.parse_args(ctx, args)
  File "/Users/rok/Projects/renga-python/renga/cli/_group.py", line 28, in parse_args
    if args[0] in self.commands:
IndexError: list index out of range

include autodoc for clients and cli
enable hook for Read the Docs

problem with projects url

renga init tries to post to /api/projects/ but the URL should be /api/projects

swissdatasciencecenter / renku-python Goto Github PK

renku-python's Issues

Recommend Projects

Recommend Topics

Recommend Org