The repo takes the FasterRCNN example from the CNTK samples GitHub repo and makes it run inside a docker container both locally and within a remote NC series data science VM in Azure complete with GPU support.
- Derive a new docker container
- Configure Conda dependencies
Some videos of this process:
- Remote Compute with AML Workbench Part A - Create a Data Science VM in Azure
- Remote Compute with AML Workbench Part B - Create a Custom Docker Base Image
- Remote Compute with AML Workbench Part C - Link a Remote GPU based VM and Run an Experiment
In order to start create a new project in Azure Machine Learning Workbench and copy the files in this repo over to the new project directory.
This project contains source images (created with VoTT) and code to train a model that can identify fluffy animals.
To train this model you can use environments created from the pre-made Docker and Conda configs in this repo.
The base docker image that is used by default (microsoft/mmlspark:plus-0.7.91
) when creating new AML Workbench projects does not have all the dependencies installed that are required by CNTK 2.2 (it's actually OpenCV that is the problem).
Luckily it's possible to derive a new container from microsoft/mmlspark:plus-0.7.91
and add the dependencies required.
The supplied docker\dockerfile runs the required dependencies during build.
You can see the file is running the extra dependencies. Note some are commented out in the dockerfile - these are extra dependencies listed by OpenCV that were not required for this project.
RUN apt-get update
RUN apt-get -y install libpng-dev
RUN apt-get -y install libjasper-dev
To build your new container, run the following command from the Docker folder.
docker build -t reponame/imagename .
Once that is built edit docker.compute
and update the baseDocker image.
baseDockerImage: "reponame/imagename"
See Running a script on a remote Docker for instructions in linking a remote Docker environment before continuing.
There is a different dockerfile
and Conda file for GPU based machines.
Derive from the docker_gpu\dockerfile Make sure the .compute
file reflects this new container name. Also make sure you use the conda_dependencies_gpu.yml
file in the .runconfig
file.
docker build -t reponame/imagename .
Once you've built the container, you'll have to push it up to Docker Hub so the remote data science VM can see it.
Get an account at Docker Hub then run the following commands:
docker login
docker push reponame/imagename
Note: Your "reponame" is the name of your Docker Hub account.
Importantly make sure you add the following line to the .compute
file:
nvidiaDocker: true
Now set up the remote environment by using the following command (where removeenvname is the name you used when you ran the az ml computetarget attach
command.
az ml experiment prepare -c remoteenvname
Once complete you will be able to run your experiements by calling
az ml experiment submit -c remoteenvname
If you don't want to build and publish your own containers, you may use the premade ones:
jakkaj/ml
for non-GPUjakkaj/mlgpu
for GPU
Run az ml experiment submit -c docker
or -c remoteenvname
to run the experiment.
A nice way to run experiments in different environments is to use the Visual Studio Code Tools for AI Plugin.
The experiment will save the created model in the "output" folder. The system will download the AlexNet pre-trained model the first time it is run.