Giter Club home page Giter Club logo

renku-project-template's Introduction

renku-project-template

A repository of base templates for new Renku projects. The next sections outline what different files in the template are used for.

For running interactive environments from Renkulab

Dockerfile - File for building a docker image that you can launch from renkulab, to work on your project in the cloud. Template-supplied contents will allow you to launch an interactive environment from renkulab, with pre-installed renku CLI and software dependencies that you put into your requirements.txt, environment.yml, or install.R. You can and should add to this Dockerfile if libraries you install require linux software installations as well; for more information see: https://github.com/SwissDataScienceCenter/renkulab-docker.

.gitlab-ci.yml - Configuration for running gitlab CI, which builds a docker image out of the project on git push to renkulab so that you can launch your interactive environment (don't remove, but you can modify to add extra CI functionality).

.dockerignore - Files and directories to be excluded from docker build (you can append to this list); https://docs.docker.com/engine/reference/builder/#dockerignore-file.

Setting the version of the renku-cli

The default version of the renku CLI used in the interactive environment is specified in the Dockerfile in a line similar to this:

ARG RENKU_VERSION={{ __renku_version__ | default("0.16.0") }}

The client creating the project (either via the UI in RenkuLab or the renku CLI) can override this default setting. The version is set as follows:

  • if the client (the renku core service or the renku CLI) is using a released version, then pass this to the project template
  • if the client is on a development version, use the default provided by the template

For managing software dependencies

requirements.txt - Required by template's Dockerfile; add your python pip dependencies here.

environment.yml - Required by template's Dockerfile; add your python conda dependencies here.

install.R - Required by template's Dockerfile (for r-based projects only).

For the landing page for your project

README.md - Edit this file to provide information about your own project. Initial contents explain how to use a renku project.

For renku CLI

.renku - Directory containing renku metadata that renku commands update (caution: don't update this manually).

.renkulfsignore - File similar to .gitignore for telling renku to NOT store listed files in git LFS. Use in conjunction with renku config lfs_threshold <[size]kb> to tell renku to NOT store files above a threshold size in LFS. Initial configuration is set to 100kb.

By default, renku commands (like renku run and renku dataset) store all output files above a configurable threshold size of 100KB in git LFS to prevent accidentally committing large files to git. It's bad to git commit large files (e.g. datasets, graphics, videos, audio samples) without being tracked by git LFS, because they slow down git commands (and thus renku commands). However, sometimes:

  • an imported dataset will come with markdown (*.md) and/or code (e.g. *.py).
  • a code file (like *.ipynb) will be generated from a renku run (e.g. with papermill).
  • generated or imported data could be small (e.g. <100kb)

Tracking files with LFS is good, but limits your ability to use commands like git diff to view changes, and to see the contents of the files in the project's page on renkulab.

Thus, you can edit .renkulfsignore to add files with particular paths or extensions that are relevant for your project. renku commands will consult .renkulfsignore and not track those files with git LFS.

Note: When you start a new interactive environment, by default the LFS-tracked files (e.g. files above the configured threshold AND not on this list) are in their "pointer" form. Run renku storage pull <filepath> to pull the real content into each file, or git lfs pull to replace all pointers with real content all at once. Since these are large files, you might be better off pulling them one at a time.

For organizing project files

data - Initially empty directory where renku dataset creates subdirectories for your named datasets and the files you add to those datasets (if you haven't or will not be creating renku datasets, you can remove this directory).

notebooks - Initially empty directory to help you organize jupyter notebooks (not a requirement, you can remove this directory).

For git to ignore

.gitignore - Files and directories to be excluded from git repository (this template only requires the #renku section, but the others are nice-to-haves for common paths to ignore).

renku-project-template's People

Contributors

ableuler avatar champost avatar ciyer avatar cramakri avatar dependabot[bot] avatar emmjab avatar gavin-k-lee avatar joke1196 avatar leafty avatar lorenzo-cavazzi avatar mana-alisafaee avatar olevski avatar pameladelgado avatar panaetius avatar rokroskar avatar seanrmurphy avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

renku-project-template's Issues

Project Environment Configuration

Description

It should be possible to store project-level configuration. Examples of this include:

  • Environment resource requirements (e.g., an interactive environment for this project requires 2 GPUs and 16G of memory)
  • The URL to connect to (e.g., jupyterlab or rstudio)

Python version out of date

With the updates to JupyterLab 3.0, the default Python version has gone to 3.9.

A few of the Docker image tags are therefore out of date:

ARG RENKU_BASE_IMAGE=renku/renkulab-py:3.8-0.8.0

ARG RENKU_BASE_IMAGE=renku/renkulab-py:3.8-0.8.0

name: Basic Python (3.8) Project

E.g. on a fresh v0.8.0 project:

image

Related to SwissDataScienceCenter/renkulab-docker#177

Add server options to the project's config file

Description

We should allow the user to specify default server_options, e.g. default_url. These runtime parameters should go in a config.ini file in the project root directory.

Examples of this include:

  • Environment resource requirements (e.g., an interactive environment for this project requires 2 GPUs and 16G of memory)
  • The URL to connect to (e.g., jupyterlab or rstudio)

New template for batch execution

For batch execution described in SwissDataScienceCenter/renku/issues/1929, we suggest to create a new template.

Compare to the original template, this one:

Such template can be useful for batch execution with commands proposed in SwissDataScienceCenter/renku-python/issues/2213

Dockerfile.batch, for example:

ARG RENKU_BASE_IMAGE=renku/renkulab-batch
FROM ${RENKU_BASE_IMAGE}

# Uncomment and adapt if code is to be included in the image
# COPY src /code/src

# Uncomment and adapt if your R or python packages require extra linux (ubuntu) software
# e.g. the following installs apt-utils and vim; each pkg on its own line, all lines
# except for the last end with backslash '\' to continue the RUN line
#
# USER root
# RUN apt-get update && \
#    apt-get install -y --no-install-recommends \
#    apt-utils \
#    vim
# USER ${NB_USER}

# install the python dependencies
COPY requirements.txt /tmp/
RUN pip install -r /tmp/requirements.txt

# RENKU_VERSION determines the version of the renku CLI
# that will be used in this image. To find the latest version,
# visit https://pypi.org/project/renku/#history.
ARG RENKU_VERSION=0.15.1

########################################################
# Do not edit this section and do not add anything below

RUN if [ -n "$RENKU_VERSION" ] ; then \
    pipx uninstall renku && \
    pipx install --force renku==${RENKU_VERSION} \
    ; fi

########################################################

Extra job in .gitlab-ci.yml, for example:

batch_image_build:
  stage: build
  image: docker:stable
  before_script:
    - docker login -u gitlab-ci-token -p $CI_JOB_TOKEN http://$CI_REGISTRY
  script: |
    CI_COMMIT_SHA_7=$(echo $CI_COMMIT_SHA | cut -c1-7)
    docker build --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA_7-batch -f Dockerfile.batch .
    docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA_7-batch

template for courses

SwissDataScienceCenter/renku-notebooks#304 is proposing to let projects use a default image in cases like courses and workshops where no changes are expected to occur for the docker image.

In this case, we could include a template that doesn't have a Dockerfile and .gitlab-ci.yml and explains in the README.md file how use this template.

This would probably be adapted by the course/workshop instructor to be forked by the participants.

Project Template Readme

Description

A project template should include a readme that describes the features of the template and how to change configuration options after instantiation. This readme should become the new project's readme similar to create-react-app.

pre-populate source code part of the project

A suggestion was made to create a src/python directory with the necessary things inside to make pip install src/python/<project-name> work.

This is needed in order to make it easy for users to refactor early.

Set automated_template_update and immutable_template_files according to each templates' needs

An example can be seen in https://github.com/SwissDataScienceCenter/renku-project-template/pull/97/files#diff-d865b06b497efc8c56c63e5bedfc86f6da6a005cdb7d30e3702286f3d918aaa6
Set allow_template_update: truefor a template if updating files from the template in a users project doesn't break anything (only files not touched by a user get updated) Setimmutable_template_files` to a list of files in the template that a user should not modify (as modifying them causes them not to be updated which could lead to discrepancies when updating).

Do this for all our templates, deciding for each which values are appropriate.

Change variables in the manifest file from dictionaries to objects

Currently, variables are dictionaries where the key is the variable name and the values represent its description.
We should use an object instead so that we can specify other properties like type, default, and support enumerations.

Instead of summary: short summary added at the beginning of the readme file, we should have something like this:

  - name: summary
    description: short summary added at the beginning of the readme file
    type: string

For enumerations, we should have something like the following:

  - name: plugins
    description: list of plugins to install in the template
    type: enum
    enum: ["none", "VScode", "GIS", "VScode and GIS"]
    default: none

Initial support for strings and enumerations may be enough.
We could also support numbers and booleans for completeness and to have a better validation system, as well as a better UX in the UI.

.gitlab-ci should not pull base image

Description

The .gitlab-ci.yml file pulls the base image to make it possible to transparently distribute bug fixes by using the same tag to reference a corrected version of the base image, but this breaks the expectation that images not be updated beneath the user's feet.

The line that pulls the base image should be removed, and we should have a better solution for notifying the users about updates to the base image. (See SwissDataScienceCenter/renku#632)

add renkulfsignore file

renku-python is adding functionality to optionally keep certain filepaths (including wildcard file matching) out of LFS (see: SwissDataScienceCenter/renku-python#1210)

This is useful for when you have code files in a dataset, or want to diff data files that you know are small, that are output from a renku run.

We should add a template that includes *.ipynb for starters, which is currently hardcoded functionality.

Define new folder structure for templates

Define a folder structure for templates. The initial proposal is that templates are stored in git repositories. In the top level of the repository is a folder per template. E.g., python-basic, python-datascience, r, r-tidyverse, julia, etc.

Inside each folder is a file that contains template metadata (e.g., display name), and the files that need to be rendered to realize the template.

support new git sha versions of renku-python in images

renku-python has switched to using dev versions of the scheme X.Y.Z.devN.gGITSHA, with GITSHA being the sha of the git commit the version was made from. Regular release versions are still just X.Y.Z

In the acceptance tests, we already changed to checking out the commit and installing from source if the version contains a git sha.

Our images should be updated accordingly, e.g. this line.

templatise plug-ins/add-ins in project creation

In the context of the VNC and other targeted use-cases, it would be nice to templatise/modularise the various options, for example like:
image

To do:

  • scope out necessary functionality used in jinja
  • see what changes in renku-python are needed, if any
  • put together a PoC

Automatically update template repo ref in renku-ui values

Changes to master in this repo could automatically open a PR in the renku-ui repo to change the corresponding line in the values.yaml file of the UI chart. This would ensure that released versions of the ui chart use the up-to-date version of that repo. What do you think @SwissDataScienceCenter/renku-ui-maintainers?

Add install.R to parallel requirements.txt but for R projects

There's no widely accepted way to include dependencies in R projects, but one way is to include an install.R file (@cchoirat suggestion) executed by R. We should include this file & lines for installing stuff from the file in the minimal R template.

file to add:

install.R

lines for docker:

COPY install.R /tmp/install.R
RUN R -f /tmp/install.R

Can an empty project be a minimal renku set-up

Would be it possible that the Empty project is the most skinny version of Renku project possible. What I mean is that if I'd like to use Renku neither for Python nor R (I used it for Bash), I would still expect that when I create an empty project through the Renku UI, I'll get a Renku project where I can use Renku commands like: renku run ...., renku dataset create ... and so on. What I'd imagine the project would have is the most basic Renku Dockerfile, .gitlab-ci.yml, .dockerignore, .gitignore and probably .renku folder. Perhaps, the Empty project could be even named in a bit more descriptive way e.g. Empty Renku project but even if we keep the name as it is now, I believe it won't be that much misleading.

Add VNC to the standard Renku Project Template Slack

Collaborators have asked for this base vnc image to be included within the usual Renku Project Stack of ease of access.

The VNC template has matured over the last few months and is now used by many of our imaging-related counterparts. It will also be used in other contexts down the line (GIS etc.). It has been extensively tested and seems on all accounts to be stable, with desirable features like a clipboard, fullscreen and renku branding.

We should integrate this into the Renku project stack.

conda build failing

The default docker build fails with

SpecNotFound: /home/jovyan/test.yml is not a valid yaml file.

originally reported by @cchoirat

create templates for different environments

We should start to enable projects with different default environments. Whether this requires a full separate template or just different templates for the Dockerfile is yet to be determined.

Store a link to the source template

Description

When a template is instantiated, the metadata should include a link to the exact template (including version) from which it came.

It would be nice to have git bash completion available out of the box

When you create a new project and start an interactive environment it would be nice to have git bash-completion pre-installed already in the image so you can use tab completion for all the git commands. I successfully installed them by adding RUN sudo apt-get install git-core bash-completion --assume-yes to my project's Dockerfile.

kernel names cannot have spaces

If a project has spaces in the name, the kernel creation fails -- need to escape spaces (or perhaps also other characters?)

make the renku-python version templateable

Ideally, at the time of project creation, the version of renku-python installed in a project's docker image would be the same as the version of the client that was used to create the project. This is now easily done by overriding the renku-python version in the Dockerfile. We should make this version templateable so the client that is used to initialize the project can inject the version there.

Use the default jinja filter to make sure that the template stays compatible with older versions of renku-python: https://jinja.palletsprojects.com/en/2.11.x/templates/#default

Include variables in manifest

Include template's available variables in the manifest.yaml with a short description.

E.G.

- folder: python-cuda
variables:
  - description: a short description going in the Readme file
  - cuda-version: specify cuda version (only 9.x or 10.x)

Could the prompt in the interactive session be a bit shorter

At the moment when I open a terminal window in an Interactive Env the prompt looks like this:

jovyan@jupyter-jakub-2echrobasik1-kubas-2ddatascience-dbash-2d335595ca:/work/kubas-datascience-in-bash$ █

Could we shorten/simplify that to smth as simple as
renku%> █
or
kubas-datascience-in-bash%> █ ?

Remove `image-build` tag from gitlab ci file

We currently add the image-build tag to the image-build job in the gitlab ci file. At the same time we recommend admins to tag the runners when deploying runners for Renku. We should change the default behaviour to not using tags.

Update `renku.ini`

renku.ini file in all templates contains a [renku "interactive"] section that is not supported by renku-python anymore. It should be renamed to [interactive].

Reduce unnecessary docker image builds

Description

At the moment the CI/CD in the projects is configured in such a way that the docker image is built for every single commit. This is not only unnecessary most of the times but it also slows down the time a user has to wait to start an environment, even for the simplest of changes. On the other hand, gitlab-runner resources are occupied.

Proposal

A docker image should be built by default when only selected files are modified: Dockerfile, requirements.txt, environment.yaml.

Limit scope of Docker build context when building an image

The problem

When building a docker image, default behavior is to ship the repository, including data and all, as part of the docker context.

Desired solution

Change default behavior. Provide a .dockerignore file, build from a subdirectory, or take advantage of buildkit.

Alternatives

Point to best practices for building docker images, and leave it to users to implement them. This may not be ideal for non-experts.

Rproj file in template

Previously we had a workaround for creating the .Rproj file in the entrypoints for the R dockerfile, but we decided it should go into the template. However, until now, file names could not be templated. Now that we have SwissDataScienceCenter/renku-python#1271, we can include the templated name.

Suggestion: use pipenv by default in renku

According to me, Renku is suffering from the problem that it needs on one hand an exact requirements.txt file with pinned versions, but on the other hand user friendly requirements.txt/solution.

This problem has been solved by pipenv. So I wonder if this should not become a default in renku. From an interface point of view, the user will not have to care about requirements.txt anymore which is a step toward a more friendly experience.

I understand, they might be other consideration at play here. I just wanted to share my opinion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.