swissdatasciencecenter / renku-python Goto Github PK
View Code? Open in Web Editor NEWA Python library for the Renku collaborative data science platform.
Home Page: https://renku-python.readthedocs.io/
License: Apache License 2.0
A Python library for the Renku collaborative data science platform.
Home Page: https://renku-python.readthedocs.io/
License: Apache License 2.0
jsonld.container.(set|list|index)
jsonld.container.list(Author)
Options
a) build a single @context
from all nested objects
b) include @context
in each nested object + implement custom loader?
include contexts with:
nested serialization to be handled via #119
want something like:
client = renga.from_env()
allow e.g. the following:
$ renga runner notebook --notebook-path notebooks/my_notebook.ipynb
add renga-python
to PyPI
renku init
has some side effects if run in an existing git repo (inits lfs, for example). The user should be made aware of these changes and also instructed on how to bring the existing repo in-line with what we expect (e.g. command to run to add existing files to git-lfs).
Need a CLI that allows for this workflow at a minimum:
$ renga login
~/.renga.conf
file that stores user settings and tokens$ renga init <project>
.renga
metadata file for project-specific configuration$ renga add
renga notebook
.
in the notebook container and setting it up with the proper environment for interacting with the platformmake several general notebook images with renga pre-installed based on https://hub.docker.com/r/jupyter/
renga workflow create <FILE>
generates invalid CWL due to wrong YAML serialisation.
In other words, the default bucket is not created by the cli at project creation time
If you run renga init
in an already existing git repo, the error is confusing, and hard to know about the -f
flag.
Also, it does not rollback or cleanup the .renga
folder it created in the process, which throws error when you try a renga init -f
later.
If you run renga init
in an already existing renga repo
, the error is confusing as you get a FileExistsError
.
In [4]: client = renga.from_config(endpoint='http://localhost')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-b511b07226e7> in <module>()
----> 1 client = renga.from_config(endpoint='http://localhost')
~/Projects/renga-python/renga/cli/_client.py in from_config(config, endpoint)
31 """
32 if config is None:
---> 33 config = read_config()
34 project_config_path = get_project_config_path()
35 if project_config_path:
TypeError: read_config() missing 1 required positional argument: 'path'
Basic python API that can be used inside e.g. a jupyter notebook to interact with the platform.
renga notebook
failsrenga notebook show
doesn't show the running notebooksrenga notebook stop
will be done in #31
--
addresses SwissDataScienceCenter/renku#117
Sequence to reproduce:
mkdir foo
mkdir bar
cd foo
renga init
echo woop > ../woop
renga datasets add dataset ../woop
cd ../bar
renga init
renga datasets create dataset
renga datasets add dataset ../foo/data/dataset/woop
renga run wc data/dataset/foo/data/dataset/woop > woop.wc
cd ../foo
echo woop2 > data/dataset/woop
git commit -am 'commiting changes to woop'
cd ../bar
git submodule update --rebase --remote
git commit -am 'update submodule'
renga status
On branch master
Files generated from outdated inputs:
(use "renga log <file>..." to see the full lineage)
woop.wc:
Normally it should display: <name_of_submodule>@<commit_sha1>.
Scenario:
renga init --endpoint https://example.com --autosync
Then the following is not working:
renga contexts list
I can fix by changing .renga/config.yml
from:
core:
autosync: true
generated: '2017-10-25T11:36:10.689296'
name: MyProject
endpoints:
https://example.com:
vertex_id: '20688'
to
core:
autosync: true
generated: '2017-10-25T11:36:10.689296'
name: MyProject
default: https://example.com
endpoints:
https://example.com:
vertex_id: '20688'
but it would be nice to have it inferred from the init command.
Importing from a git repository that contains a .renku
directory should automatically reuse the included metadata about authors/creators of various entities.
$ref: ...
addresses SwissDataScienceCenter/renku#135
$ make start
$ renga login
$ renga init --autosync
$ renga notebooks launch
$ make stop
$ docker-compose rm
$ make start
$ renga login
$ renga notebooks launch
# ERROR
if no cwl is found, we should not raise an error because that breaks the pipeline on repos that have not used renga from the outset. Instead, fail gracefully to allow other stages of the pipeline to queue.
addresses SwissDataScienceCenter/renku#117
.gitlab-ci.yml
automatically in each renga repo- [ ] renga init
should also create a .gitattributes
file that will sync the data/
directory.
renga run
should add appropriate paths to git lfs
for tracking
renga run
invocations should add the outputs to .gitattributes
by default -- provide a flag to override the behavior (done in #116)renga datasets add
should add all files except for metadata.json
addresses SwissDataScienceCenter/renku#117
addresses SwissDataScienceCenter/renku#117
Define inputs and outputs and labels of a context:
labels:
- renga.contexts.inputs.<name>(=[<bucket_id>,<file_id>])
relates to SwissDataScienceCenter/renga-deployer#46
authors
are removed after calling add
(related to #119)files/<NAME>/path
is not serialized as str
but pathlib.PosixPath
//
)-t
doesn't workcurrently the project vertex_id
is passed as a label -- move it to headers to have the same behavior as storage
done in #31
--
addresses SwissDataScienceCenter/renku#117
like for example (it can be different):
renga io buckets upload BUCKET_ID FILENAME
renga io download FILE_ID > FILENAME
Currently we pass an access token to the deployed container, which will expire. On solution could be to simply send a refresh token instead of the access token, but this has security implications. We need to solve this problem either by having a flow with authentication via the python client or some other means.
We need a way to link notebook code in the graph
link together several steps to form a workflow
addresses #62
Image name cannot be renga-deployer
:
https://github.com/SwissDataScienceCenter/renga-python/blob/2f1aa1246e86063355116bdcc4fbaf4afaca45b6/.travis/deploy-docker.sh#L4-L5
Travis cannot build docker images with sudo: false
:
https://github.com/SwissDataScienceCenter/renga-python/blob/2f1aa1246e86063355116bdcc4fbaf4afaca45b6/.travis.yml#L22
inside a renga-deployed notebook:
import renga
client = renga.from_env()
client.contexts[0]
--> HTTP 500
from the logs:
sqlalchemy.exc.StatementError: (builtins.ValueError) bytes is not a 16-char string
[SQL: 'SELECT contexts.created AS contexts_created, contexts.updated AS contexts_updated, contexts.id AS contexts_id, contexts.spec AS contexts_spec, contexts.jwt AS contexts_jwt
FROM contexts
WHERE contexts.id = %(param_1)s'] [parameters: [{'%(140547263475216 param)s': '0'}]]
enable this:
$ renga runner rerun --job job.yml .renga/workflow/step12345.cwl
we should recognize /tree/<branch-name>
and checkout the branch
$ renga io buckets
Traceback (most recent call last):
File "/Users/rok/.virtualenvs/renga/bin/renga", line 11, in <module>
load_entry_point('renga', 'console_scripts', 'renga')()
File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 1064, in invoke
sub_ctx = cmd.make_context(cmd_name, args, parent=ctx)
File "/Users/rok/.virtualenvs/renga/lib/python3.6/site-packages/click/core.py", line 621, in make_context
self.parse_args(ctx, args)
File "/Users/rok/Projects/renga-python/renga/cli/_group.py", line 28, in parse_args
if args[0] in self.commands:
IndexError: list index out of range
Now: <renga.models.deployer.Context at 0x7f54d419eb00>
Expected: Context(id=foobar, ...)
on all models
renga init
tries to post to /api/projects/
but the URL should be /api/projects
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.