anibali / docker-pytorch Goto Github PK
View Code? Open in Web Editor NEWA Docker image for PyTorch
License: MIT License
A Docker image for PyTorch
License: MIT License
I'm checking out the Dockerfile
here: https://github.com/anibali/docker-pytorch/blob/master/dockerfiles/1.8.1-cuda11.1-ubuntu20.04/Dockerfile, and noticed that it uses:
nvidia/cuda:11.1.1-base-ubuntu20.04
)pytorch=1.8.1=py3.8_cuda11.1_cudnn8.0.5_0
)Is my understanding right, or do the CUDA resources in the base image differ from those bundled with installing torch
, such that the resources aren't redundant?
I was trying to use anibali/pytorch:1.8.1-cuda11.1-ubuntu20.04
as the base image to create my own docker that needs to have google-cloud-sdk
. But since google-cloud-sdk is not available in ubuntu20.04 yet, it will not work. Do you have any workaround for this? Maybe having an ubuntu 18 based image? Anything else?
Below a glimpse of my dockerfile.
ARG BASE_IMAGE=anibali/pytorch:1.8.1-cuda11.1-ubuntu20.04
FROM $BASE_IMAGE
......
RUN export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" > /etc/apt/sources.list.d/google-cloud-sdk.list && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
apt-get update && \
apt-get install -y google-cloud-sdk kubectl
hi i am kind of stuck with docker gpu pytorch, basically my image is successfully built but on inference invocation when using cuda is gives out the following error
ImportError: /root/.cache/torch_extensions/py38_cu113/fused/fused.so: cannot open shared object file: No such file or directory
i am using nvidia/ image
also i would like to ask why do we use miniconda with pytorch cuda image?
Hi, thanks for the work for the docker image. I am kind of new to docker and wonder if there is a way to pip install other needed packages in this docker image, "matplotlib" for example? I tried to run "docker......pytorch /bin/bash" and pip install there. However, when I exit and coma back again it will disappear. Need some help~
I've run command docker pull anibali/pytorch:1.4.0-cuda10.1
after connected to container with command docker run -it anibali/pytorch:1.4.0-cuda10.1 sh
both nvcc and nvidia-smi not found.
Output from my vm's nvidia-smi.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| 0 Tesla V100-PCIE... Off | 00000000:37:00.0 Off | 0 |
| N/A 59C P0 46W / 250W | 0MiB / 32510MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 49C P0 37W / 250W | 0MiB / 32510MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
there are my project path and my python file will to be run:
[yao@gpu02 ~/apps/DSFD]pwd
/home/yao/apps/DSFD
[yao@gpu02 ~/apps/DSFD]ls
data eval_tools fddb_test.py layers model README.md utils widerface_val.py
demo.py face_ssd.py imgs LICENSE __pycache__ sfd WIDERFace_DSFD_RES152.pth
after, I try to run command with:
sudo docker run --rm -it --init --runtime=nvidia -v "/home/yao/apps/DSFD:/app" pytorch/pytorch:0.4.1-cuda9-cudnn7-runtime python3 demo.py
but it return python3: can't open file 'demo.py': [Errno 2] No such file or directory
I know it's a path error,but I can not how to fix it .
please help me !
thanks~
i already finish create new docker image but i tried to run my script error like the docker can't found my script?
how actualy use the docker image for runing a script?
docker run --rm -it --init
--gpus=all
--ipc=host
--volume="$PWD:/home/data/AI/pytroch/deeplabv3"
envpytorch17cu11 python3 script.py
i used that command for runing my script. thank you
Many thanks for this repo.
I pulled the cuda9.1 image, and when I ran some DNN model, it worked fine. But when I ran nvcc command, it showed:
nvcc: command not found
I debuged this issue following here, and I checked the /usr/local/cuda-9.1 directory, there was no bin sub-dir.
So how could I run nvcc command in this image? looking forward to your reply.
Hello,
I use cuda92 docker image.
But when I type in nvcc
, it does not find cudnn.
Is cudnn library not installed?
I have install the cuda for version 11.4.0, and now i wonder know that could i use the cuda 11.3 dockerfile to build my docker container?
it's too big for me to install ,my centos7 show that 'no space left on device'.
When building from Dockerfile, there is a user prompt while installing openssh-server.
Can ARG DEBIAN_FRONTEND=noninteractive
be added to the dockerfiles, after the FROM
command?
The command '/bin/sh -c /home/user/miniconda/bin/conda create -y --name py36 python=3.6.9 && /home/user/miniconda/bin/conda clean -ya' returned a non-zero code: 127
When I create and enter the container, the identity is user, how to switch to root role?
My computer(host) has installed cuda 9.0, and I also have installed NVIDIA Container Toolkit.
And when I use your docker anibali/pytorch:cuda-9.0, it's OK.
root@zbp-PowerEdge-T630:~# docker run -it --gpus all anibali/pytorch:cuda-9.0 /bin/bash
user@91fafcb855d7:/app$ python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>
But, when I use your docker anibali/pytorch:cuda-10.0, it's failed.
root@zbp-PowerEdge-T630:~# docker run -it --gpus all anibali/pytorch:cuda-10.0 /bin/bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled
In my opinion, docker pytorch should be independent of my computer' cuda.
Is there something I missed? Thank you sir
Hello - thank you for providing these images! I tried to derive an image from the 2.0.0 image, and get some key errors.
Dockerfile:
FROM anibali/pytorch:2.0.0-cuda11.8-ubuntu22.04
# Set up time zone.
ENV TZ=UTC
RUN sudo ln -snf /usr/share/zoneinfo/$TZ /etc/localtime
# Install system libraries required by OpenCV.
RUN sudo apt-get update \
&& sudo apt-get install -y libgl1-mesa-glx libgtk2.0-0 libsm6 libxext6 \
&& sudo rm -rf /var/lib/apt/lists/*
Errors:
Step 1/13 : FROM anibali/pytorch:2.0.0-cuda11.8-ubuntu22.04
2.0.0-cuda11.8-ubuntu22.04: Pulling from anibali/pytorch
Digest: sha256:a1459aac2f1cb30bc1e382d6949c0806a111c46a83ab67fa60694248dcd2c03f
Status: Downloaded newer image for anibali/pytorch:2.0.0-cuda11.8-ubuntu22.04
---> d8420ba982b7
Step 2/13 : ENV TZ=UTC
---> Using cache
---> 0b5861802da0
Step 3/13 : RUN sudo ln -snf /usr/share/zoneinfo/$TZ /etc/localtime
---> Using cache
---> a5a8d93bbbf1
Step 4/13 : RUN sudo apt-get update && sudo apt-get install -y libgl1-mesa-glx libgtk2.0-0 libsm6 libxext6 && sudo rm -rf /var/lib/apt/lists/*
---> Running in 0ed0b8f554fa
Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
Err:1 http://security.ubuntu.com/ubuntu jammy-security InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [108 kB]
Err:2 http://archive.ubuntu.com/ubuntu jammy InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
Err:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
Err:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
Reading package lists...
W: http://security.ubuntu.com/ubuntu/dists/jammy-security/InRelease: The key(s) in the keyring /etc/apt/trusted.gpg.d/ubuntu-keyring-2012-cdimage.gpg are ignored as the file is not readable by user '_apt' executing apt-key.
W: http://security.ubuntu.com/ubuntu/dists/jammy-security/InRelease: The key(s) in the keyring /etc/apt/trusted.gpg.d/ubuntu-keyring-2018-archive.gpg are ignored as the file is not readable by user '_apt' executing apt-key.
W: GPG error: http://security.ubuntu.com/ubuntu jammy-security InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
E: The repository 'http://security.ubuntu.com/ubuntu jammy-security InRelease' is not signed.
W: http://archive.ubuntu.com/ubuntu/dists/jammy/InRelease: The key(s) in the keyring /etc/apt/trusted.gpg.d/ubuntu-keyring-2012-cdimage.gpg are ignored as the file is not readable by user '_apt' executing apt-key.
W: http://archive.ubuntu.com/ubuntu/dists/jammy/InRelease: The key(s) in the keyring /etc/apt/trusted.gpg.d/ubuntu-keyring-2018-archive.gpg are ignored as the file is not readable by user '_apt' executing apt-key.
W: GPG error: http://archive.ubuntu.com/ubuntu jammy InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
E: The repository 'http://archive.ubuntu.com/ubuntu jammy InRelease' is not signed.
W: http://archive.ubuntu.com/ubuntu/dists/jammy-updates/InRelease: The key(s) in the keyring /etc/apt/trusted.gpg.d/ubuntu-keyring-2012-cdimage.gpg are ignored as the file is not readable by user '_apt' executing apt-key.
W: http://archive.ubuntu.com/ubuntu/dists/jammy-updates/InRelease: The key(s) in the keyring /etc/apt/trusted.gpg.d/ubuntu-keyring-2018-archive.gpg are ignored as the file is not readable by user '_apt' executing apt-key.
W: GPG error: http://archive.ubuntu.com/ubuntu jammy-updates InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
E: The repository 'http://archive.ubuntu.com/ubuntu jammy-updates InRelease' is not signed.
W: http://archive.ubuntu.com/ubuntu/dists/jammy-backports/InRelease: The key(s) in the keyring /etc/apt/trusted.gpg.d/ubuntu-keyring-2012-cdimage.gpg are ignored as the file is not readable by user '_apt' executing apt-key.
W: http://archive.ubuntu.com/ubuntu/dists/jammy-backports/InRelease: The key(s) in the keyring /etc/apt/trusted.gpg.d/ubuntu-keyring-2018-archive.gpg are ignored as the file is not readable by user '_apt' executing apt-key.
W: GPG error: http://archive.ubuntu.com/ubuntu jammy-backports InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
E: The repository 'http://archive.ubuntu.com/ubuntu jammy-backports InRelease' is not signed.
E: Problem executing scripts APT::Update::Post-Invoke 'rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true'
E: Sub-process returned an error code
The command '/bin/sh -c sudo apt-get update && sudo apt-get install -y libgl1-mesa-glx libgtk2.0-0 libsm6 libxext6 && sudo rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100
Attempts to install the keys manually failed with a variety of confusing errors. Is this likely a problem with images, or something wrong in my local setup?
Currently there is a dearth of pytorch that supports the arm architecture,
and the cuda version provided by ngc is so big,
is it possible to build own version of the arm.
Hi, I am facing the following error, when I try to run specified dockerfile.
ERROR: failed to solve: nvidia/cuda:11.0-base-ubuntu20.04: docker.io/nvidia/cuda:11.0-base-ubuntu20.04: not found
I am trying to use your image as a base image to build my own container
ARG BASE_IMAGE=anibali/pytorch:1.8.1-cuda11.1-ubuntu20.04
FROM $BASE_IMAGE
ARG TF_SERVING_VERSION=0.0.0
ARG NB_USER=jovyan
USER root
ENV DEBIAN_FRONTEND noninteractive
ENV NB_USER $NB_USER
ENV NB_UID 1033
ENV HOME /home/$NB_USER
ENV NB_PREFIX /
ENV PATH $HOME/.local/bin:$PATH
# Use bash instead of sh
SHELL ["/bin/bash", "-c"]
# Install system libraries required by OpenCV.
RUN sudo apt-get update \
&& sudo apt-get install -y libgl1-mesa-glx libgtk2.0-0 libsm6 libxext6 \
&& sudo rm -rf /var/lib/apt/lists/*
# Install OpenCV from PyPI.
RUN pip install opencv-python==4.5.1.48
# more code below ...
When I do docker build, I get below error
=> ERROR [ 2/19] RUN sudo apt-get update && sudo apt-get install -y libgl1-mesa-glx libgtk2.0-0 libsm6 libxext6 && sudo rm -rf /var/lib/apt/lists/* 9.6s
------
> [ 2/19] RUN sudo apt-get update && sudo apt-get install -y libgl1-mesa-glx libgtk2.0-0 libsm6 libxext6 && sudo rm -rf /var/lib/apt/lists/*:
#4 1.644 Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease [1575 B]
#4 1.671 Ign:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64 InRelease
#4 1.679 Get:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64 Release [564 B]
#4 1.690 Get:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64 Release.gpg [833 B]
#4 1.761 Err:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease
#4 1.761 The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
#4 1.772 Get:5 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
#4 1.773 Get:6 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
#4 1.872 Get:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64 Packages [2445 B]
#4 2.284 Get:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
#4 2.298 Get:9 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [25.8 kB]
#4 2.378 Get:10 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
#4 2.391 Get:11 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [1778 kB]
#4 2.653 Get:12 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
#4 2.666 Get:13 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
#4 3.462 Get:14 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1139 kB]
#4 3.970 Get:15 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [873 kB]
#4 4.289 Get:16 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
#4 4.290 Get:17 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
#4 4.401 Get:18 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2227 kB]
#4 4.805 Get:19 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.3 kB]
#4 4.809 Get:20 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1156 kB]
#4 4.883 Get:21 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1247 kB]
#4 4.988 Get:22 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [51.2 kB]
#4 4.989 Get:23 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [26.0 kB]
#4 5.188 Reading package lists...
#4 6.378 W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
#4 6.378 E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease' is not signed.
------
executor failed running [/bin/bash -c sudo apt-get update && sudo apt-get install -y libgl1-mesa-glx libgtk2.0-0 libsm6 libxext6 && sudo rm -rf /var/lib/apt/lists/*]: exit code: 100
Looks like NVIDIA did a key rotation that cases this breaking change https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212772
Can you please help?
Hi @anibali ,
When I try to build the docker image for CUDA 7.5, I get the following error for step 18/28:
EnvironmentLocationNotFound: Not a conda environment: /home/user/miniconda/envs/py36/envs/pytorch-py36
The command '/bin/sh -c conda install -y --name pytorch-py36 -c soumith magma-cuda75 && conda clean -ya' returned a non-zero code: 1
Do you have a pre-built docker image somewhere stored or in the docker-hub? That should help avoid building from scratch.
Pytorch 1.4 has been released. Would it be possibly to adapt the cuda 10 container to it?
Could you help me to get an image with pytorch 1.3 and cuda 10 on ubuntu 18.04? Please.
May I know what is the usage of the file "Dockerfile.template". Seems that it is not used by any of the Dockerfile in all other directories.
sudo apt-get install vim seem error.
Hello, dear author, thank you for your work!
As a novice, I have encountered a problem that may be relatively low-level. Please give me some advice.
When I tried to build my own image based on your image: anibali/pytorch:1.10.0-nocuda, there was a problem with Dockerfile's RUN sudo mkdir-p / superBrain_face,error was reported as:
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?
The command '/bin/sh -c sudo mkdir -p /superBrain_face' returned a non-zero code: 1
When I tried to execute the command sudo su
in your image, I encountered the same problem.
Can you give me some advice? Thank You Very much~
Thanks for taking the initiative and putting up this repo. Very helpful!
I have been looking for an easy way to put pytorch in a container and your approach is interesting, in specific the lines used to install the pytorch related packages. For example, for the CUDA 10 Dockerfile you run conda install for:
pytorch=1.2.0=py3.6_cuda10.0.130_cudnn7.6.2_0
I havent found this package name in the pytorch docs nor is it a typical package name ( =
symbol shows twice?). Where is this package coming? Does it install cudnn
as well?
Also I was supper amazed that you start with the base
build of nvidia/cuda
and not the devel
, which is common one. Unfortunately, after everything gets installed the resulting image is around 2.8 GB, which is very similar to another pytorch images that use the devel
flavor of nvidia/cuda
, so not much is saved in terms of image size. Why is that?
Hello,
I am trying to build a container from the Dockerfile for cuda-10.0; after downloading this file, I run the command "sudo docker build -t ai-api ." in the directory where the file is saved. I then get the following error: "Error response from daemon: failed to parse Dockerfile: Syntax error - can't find = in "RUN". Must be of the form: name=value". Any idea what the issue could be ?
Thanks!
Thanks for sharing script to install pytorch with docker? Could you also share the dockerfile to install pytorch 0.4.1 in cuda 9-1 and cudnn 7.0? Thanks
such as :pandas
I tried to run the docker command, however, I got this error:
docker: Error response from daemon: Unknown runtime specified nvidia
I am on Mac OS High Sierra, any idea?
I am new to Docker & your project has helped me understand better about how to write Dockerfiles (Thank you ^^).
You seem to using MiniConda for installing pytorch. But for my case, since I need more packages, installing a few more packages leads to a size of 5GB+. (Size of the docker-pytorch Image with Miniconda was already 3.5GB).
Is there anyway to reduce the sizes further? Would be very helpful if the size can be brought down to <2GB (since I have a bad internet connection with limited bandwidth).
I read somewhere about Multi-Stage builds but was wondering if that could be used somehow here? Also, are there any other means through which I can achieve Image size reduction?
I copy and paste ( several times ) the cmd of the readme which is:
docker run --rm -it --init \
--runtime=nvidia \
--ipc=host \
--user="$(id -u):$(id -g)" \
--volume=$PWD:/app \
-e NVIDIA_VISIBLE_DEVICES=0 \
anibali/pytorch python3 main.py
And I get the following error:
docker: invalid reference format: repository name must be lowercase
Hi, I found it is really rare for a docker that is based on ubuntu16.04 and with cuda10.2. I really hope there's a version like this available. Thanks!
I able to crunch python3.10, torch=2.0.0, Cu=11.8 in just 2.5GB.
https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/Dockerfile
Is this approach something you would consider?
Hi,
Thanks for the script! I have one question regarding the Volume command. If I define a Volume /home/shared
in the Dockerfile, it seems like this volume will become root access only. Do you know how to mount a volume while preserving its user access?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.