Giter Club home page Giter Club logo

cookiecutter-docker-science's Introduction

travis

This article consists of the following sections.

Cookiecutter Docker Science provides the following features.
  • Improve reproducibility of the results in machine learning projects with Docker
  • Output optimal directories and file template for machine learning projects
  • Edit codes with favorite editors (Atom, vim, Emacs etc)
  • Provide make targets useful for data analysis (Jupyter notebook, test, lint, docker etc)

NOTE: please visit home page before you get started.

Many researchers and engineers do their machine learning or data mining experiments. For such data engineering tasks, researchers apply various tools and system libraries which are constantly updated, installing and updating them cause problems in local environments. Even when we work in hosting environments such as EC2, we are not free from this problem. Some experiments succeeded in one instance but failed in another one, since library versions of each EC2 instances could be different.

By contrast, we can creates the identical Docker container in which needed tools with the correct versions are already installed in one command without changing system libraries in host machines. This aspect of Docker is important for reproducibility of experiments, and keep the projects in continuous integration systems.

Unfortunately running experiments in a Docker containers is troublesome. Adding a new library into requirements.txt or Dockerfile does not installed as if local machine. We need to create Docker image and container each time. We also need to forward ports to see server responses such as Jupyter Notebook UI launch in Docker container in our local PC. Cookiecutter Docker Science provides utilities to make working in Docker container simple.

This project is a tiny template for machine learning projects developed in Docker environments. In machine learning tasks, projects glow uniquely to fit target tasks, but in the initial state, most directory structure and targets in Makefile are common. Cookiecutter Docker Science generates initial directories which fits simple machine learning tasks.

To generate project from the cookiecutter-docker-science template, please run the following command.

$cookiecutter [email protected]:docker-science/cookiecutter-docker-science.git

Then the cookiecutter command ask for several questions on generated project as follows.

$cookiecutter [email protected]:docker-science/cookiecutter-docker-science.git
project_name [project_name]: food-image-classification
project_slug [food_image_classification]:
jupyter_host_port [8888]:
description [Please Input a short description]: Classify food images into several categories
Select data_source_type:
1 - s3
2 - nfs
3 - url
data_source [Please Input data source]: s3://research-data/food-images

Then you get the generated project directory, food-image-classification.

The following is the initial directory structure generated in the previous section.

├── Makefile                          <- Makefile contains many targets such as create docker container or
│                                        get input files.
├── config                            <- This directory contains configuration files used in scripts
│   │                                    or Jupyter Notebook.
│   └── jupyter_config.py
├── data                              <- data directory contains the input resources.
├── docker                            <- docker directory contains Dockerfile.
│   ├── Dockerfile                    <- Base Dockerfile contains the basic settings.
│   ├── Dockerfile.dev                <- Dockerfile for experiments this Docker image is derived from the base Docker image.
│   │                                    This Docker image does not copy the files and directory but used mount the top
│   │                                    directory of the host environments.
│   └── Dockerfile.release            <- Dockerfile for production this Docker image is derived from the base Docker image.
│                                        The Docker image copy the files and directory under the project top directory.
├── model                             <- model directory store the model files created in the experiments.
├── my_data_science_project           <- cookie-cutter-docker-science creates the directory whose name is same
│   │                                    as project name. In this directory users puts python files used in scripts
│   │                                    or Jupyter Notebook.
│   └── __init__.py
├── notebook                          <- This directory stores the ipynb files saved in Jupyter Notebook.
├── requirements.txt                  <- Libraries needed in the project. The library listed in this file
│                                        are installed in the Docker images for not only development but also production.
├── requirements_dev.txt              <- Libraries needed to run experiments. The library listed in this file
│                                        are installed in the Docker images for developments.
└── scripts                           <- Users add the script files to generate model files or run evaluation.

Cookiecutter Docker Science provides many Makefile targets to supports experiments in a Docker container. Users can run the target with make [TARGET] command.

init

After cookiecutter-docker-science generate the directories and files, users first run this command. init setups resources for experiments. Specifically init run init-docker and sync-from-source command.

  • init-docker

    init-docker command first creates Docker the images based on docker/Dockerfile.

  • sync-from-source

    sync-from-source downloads input files which we specified in the project generation. If you want to change the input files, please modify this target to download the new data source.

create-container

create-container command creates Docker container based on the created image and login the Docker container.

start-container

Users can start and login the Docker container with start container created by the create-container.

jupyter

jupyter target launch Jupyter Notebook server.

profile

profile target shows the misc information of the project such as port number or container name.

clean

clean target removes the artifacts such as models and *.pyc files.

  • clean-model

    clean-model command removes model files in model directory.

  • clean-pyc

    clean-pyc command removes model files of *.pyc, *.pyo and __pycache__.

  • clean-docker

    clean-docker command removes the Docker images and container generated with make init-docker and make create-container. When we update Python libraries in requirements.txt or system tools in Dockerfile, we need to clean Docker the image and container with this target and create the updated image and container with make init-docker and make create-container.

distclean

distclean target removes all reproducible objects. Specifically this target run clean target and remove all files in data directory.

  • clean-data

    clean-data command removes all datasets in data directory.

lint

lint target check if coding style meets the coding standard.

test

test target executes tests.

sync-to-source

sync-to-remote target uploads the local files stored in data to specified data sources in such as S3 or NFS directories.

With Cookiecutter Docker Science, data scientists or software engineers do their developments in host environment. They open Jupyter notebook in the browsers in the host machine connecting the Jupyter server launched in Docker container. They also writes the ML scripts or library classes in the host machine. The code modification in host environment are reflected in the container environment. In the containers, they just launch Jupyter server or start ML scripts with make command.

Files and directories

When you log in a Docker container by make create-container or make start-container command, the log in directory is /work. The directory contains the project top directories in host computer such as data or model. Actually the Docker container mounts the project directory to /work of the container and therefore when you can edit the files in the host environment with your favorite editor such as Vim, Emacs, Atom or PyCharm. The changes in host environment are reflected in container environment.

Jupyter Notebook

We can run a Jupyter Notebook in the Docker container. The Jupyter Notebook uses the default port 8888 in Docker container (NOT host machine) and the port is forwarded to the one you specify with JUPYTER_HOST_PORT in the cookiecutter command. You can see the Jupyter Notebook UI accessing "http://localhost:JUPYTER_HOST_PORT". When you save notebooks the files are saved in the notebook directory.

Generate Docker Image for production

make init-docker command creates a Docker image based on docker/Dockerfile.dev, which contains libraries for developments. The libraries are not needed in production.

To create a Docker image for production which does not contain the development libraries such as Jupyter, we run make init-docker command specifying a environment variable MODE to release as make init-docker MODE=release.

Override port number for Jupyter Notebook

In the generation of project with cookiecutter, the default port of Jupyter Notebook in host is 8888. The number is common and could have a collision to another server processes.

If we already have the container, we first need to remove the current container with make clean-container. And then we create the Docker container changing the port number with make create-container command adding the Jupyter port parameter (JUPYTER_HOST_PORT). For example the following command creates Docker container forwarding Jupyter default port 8888 to 9900 in host.

make create-container JUPYTER_HOST_PORT=9900

Then you launch Jupyter Notebook in the Docker container, you can see the Jupyter Notebook in http://localhost:9900

Specify suitable Dockerfile in stages

Some projects can have multiple Dockerfiles. Dockerfile.gpu contains the settings for GPU machines. Dockerfile.cpu contains settings to be that can be used in production for non-GPU machines.

To use one of these specific Dockerfile, override the settings by adding parameters to the make command. For example, when we want to create a container from docker/Dockerfile.cpu, we run make create-container DOCKERFILE=docker/Dockerfile.cpu.

Show target specific help

help target flushes the details of specified target. For example, to get the details of clean target.

$make help TARGET=clean
target: clean
dependencies: clean-model clean-pyc clean-docker
description: remove all artifacts

As we can see, the dependencies and description of the specified target (clean) are shown.

Apache version 2.0

See CONTRIBUTING.md.

cookiecutter-docker-science's People

Contributors

funwarioisii avatar graph226 avatar himkt avatar paralax avatar takahi-i avatar varunkashyap avatar yoheikikuta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cookiecutter-docker-science's Issues

Windows problems

  • pwd command is missing
  • -u option cases a problem in docker/Dockerfile

init-docker fails on macOS

Hi, Thanks for the wonderful tool.
I always use cookiecuttor-docker-science on Ubuntu16.04, but the make init-docker did not work when trying to use it on macOS.

Command to reproduce:

first, create repository with default settings.

$ cookiecutter [email protected]:docker-science/cookiecutter-docker-science.git              
project_name [project_name]: test-coociecutter-docker-science
project_slug [test_coociecutter_docker_science]:
jupyter_host_port [8888]:
description [Please Input a short description]:
Select data_source_type:
1 - s3
2 - nfs
3 - url
Choose from 1, 2, 3 [1]:
data_source [Please Input data source]:
Select use_nvidia_docker:
1 - no
2 - yes
Choose from 1, 2 [1]:

second, execute make init-docker

$ cd test_coociecutter_docker_science
$ make init-docker
docker build -t test_coociecutter_docker_science-image -f docker/Dockerfile --build-arg UID=1663316204 .
Sending build context to Docker daemon  19.97kB
Step 1/10 : FROM ubuntu:16.04
 ---> 20c44cd7596f
Step 2/10 : RUN apt-get update && apt-get install -y   git   python3.5   python3-pip   python3-dev
 ---> Using cache
 ---> 8bb9d3f9bec9
Step 3/10 : RUN pip3 install --upgrade pip
 ---> Using cache
 ---> 5e906e1cbec5
Step 4/10 : COPY ./requirements.txt /requirements.txt
 ---> Using cache
 ---> 91d1c70b7126
Step 5/10 : RUN pip install -r /requirements.txt
 ---> Using cache
 ---> aac86f8ffef4
Step 6/10 : ARG UID
 ---> Using cache
 ---> 1b13e2d797c6
Step 7/10 : RUN useradd docker -u $UID -s /bin/bash -m
 ---> Running in 416e1cfe122a
Error processing tar file(exit status 1): write /var/log/faillog: no space left on device
make: *** [init-docker] Error 1

When building docker image, can not I set macOS’s UID(1663316204) to Ubuntu?

I deleted the following in Dockerfile and I could manually set it after create container.

https://github.com/docker-science/cookiecutter-docker-science/blob/f95b81e167264a46655bd9e4cc6120a1913d833c/%7B%7B%20cookiecutter.project_slug%20%7D%7D/docker/Dockerfile#L18:L20

root@08bfc3edbca1:/work# useradd docker -u 1663316204 -s /bin/bash -m
root@08bfc3edbca1:/work# su docker
docker@08bfc3edbca1:/work$ id -u
1663316204

So I added entrypoint.sh and I fix it like this.
I hope this will useful for you.

Conditions

  • MacBook Pro (13-inch, 2017, Four Thunderbolt 3 Ports)
    • macOS Sierra 10.12.6
  • cookiecutter-docker-science: 93a3602

I use Docker for Mac.

$ docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:21:31 2018
 OS/Arch:           darwin/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:29:02 2018
  OS/Arch:          linux/amd64
  Experimental:     true

Add python package installation directory to $PATH

I got the following warning in building an docker image with make init-docker

  WARNING: The scripts f2py, f2py3 and f2py3.6 are installed in '/home/docker/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts jupyter, jupyter-migrate and jupyter-troubleshoot are installed in '/home/docker/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script jsonschema is installed in '/home/docker/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script jupyter-trust is installed in '/home/docker/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script pygmentize is installed in '/home/docker/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts iptest, iptest3, ipython and ipython3 are installed in '/home/docker/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts jupyter-kernel, jupyter-kernelspec and jupyter-run are installed in '/home/docker/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script jupyter-nbconvert is installed in '/home/docker/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts jupyter-bundlerextension, jupyter-nbextension, jupyter-notebook and jupyter-serverextension are installed in '/home/docker/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script jupyter-console is installed in '/home/docker/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script chardetect is installed in '/home/docker/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

I failed to run make jupyter in the docker container. The problem is fixed adding the path to envrionment variable $PATH as the warning describes.

export PATH=$PATH:/home/docker/.local/bin

I would like to run targets without running the workaround.

Replace CI service to GitHub Actions

I would like to replace CI with Travis to GitHub actions or CircleCI since Travis might not provide free plan for open source projects in the future.

Support type hinting

Add target

type-check: ## check types with mypy
	mypy -p package-name

Add dependency to requirments-dev.txt

Need to add mypy.

aws: command not found

When starting make init in a python environment without aws-cli, I got the following error:

pyenv: aws: command not found

Should awscli be in requirements.txt when using s3 data sources?

Setting file for fitting environments

When we run mutiple Dockerfile or requirements.txt files for separated purpose, we need to specify the setting through envrionment variables described in #53.

But setting such variables though the command line parameters are tedious, and therefore I would like to add a basic setting files to add environment setting. the following is the sample of the setting file (.env).

DOCKERFILE=docker/Dockerfile.test
REQUIREMENTS=test_requirement.txt

The template does not generate .env directory but the .env_template file not to load the setting until when users change the name of the file .env_template to .env and add the setting to the file.

why not support pipenv?

pipenv is the officially recommended Python packaging tool from Python.org.
I think it's good to support pipenv on cookiecutter-docker-science.

Add make target to run specified test cases

I would run only specified test cases. The following is the implementation.

export TARGET_TEST_CASE=tests.simple_application.TestApplication

run-specified-test-case: ## Run specified test case
       $(PYTHON) -m unittest $(TARGET_TEST_CASE)

We run test this target with make run-specified-test case

Remove container only

Currently make clean-docker removes image and containers. I would like to have a command to remove only container.

Make the `make init-docker` faster

The make init-docker takes a lot of time.
This is detrimental to developer experience.
I think the make init-docker should be faster.

(Currently no idea 😇 )

Make port number for Jupyter random

Sometime, I failed to create docker with make create-container with the port is already occupied. I would like to set the port with random from 5000 to 9999.

Error when i run make command

docker@8c350098e734:/work$ make
/bin/sh: 1: python: not found
Makefile:44: recipe for target 'help' failed
make: *** [help] Error 127

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.