Comments (15)
We haven't done any work with CUDA in a docker image yet, but I can give some pointers of where to start, and external contributions to make it work are welcomed. Since TensorFlow already has working docker files with GPU support and their files are similar to ours, that would be a good example to use.
Our dockerfile is based on TensorFlow's Dockerfile.devel. If you compare the differences and see what had to be changed, you can apply similar changes to Tensorflow's Dockerfile.devel-gpu and we can add it here. Since they figured out all issues with CUDA and cudnn in their dockerfile, hopefully basing it on their example will work here as well.
from serving.
@kirilg I have compiled my steps to set up TF serving with CUDA on docker: https://gist.github.com/revilokeb/58f3419340652bbe73ab07903fdc9426.
However those steps unfortunately still lead to the following error at the last step bazel build -c opt --config=cuda tensorflow_serving/...
:
ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/org_tensorflow/tensorflow/core/kernels/BUILD:72:1: undeclared inclusion(s) in rule '@org_tensorflow//tensorflow/core/kernels:concat_lib_gpu':
this rule is missing dependency declarations for the following files included by 'external/org_tensorflow/tensorflow/core/kernels/concat_lib_gpu.cu.cc':
'/serving/tensorflow/third_party/eigen3/Eigen/Core'.
Any idea what could be missing?
(- compilation without CUDA and running mnist and inception examples is working smoothly, - I needed to manually set CROSSTOOL as in #17, else that error occured)
from serving.
Took a look and I'm still not entirely sure why this is happening. I see your dockerfile is using a different technique to install CUDA and Bazel than what Tensorflow is doing. Perhaps installing Bazel using the tested way is worth a try since it may be different from what's installed using apt-get.
@damienmg do you know why this error is showing up in a Docker container using CUDA?
from serving.
Thanks for your reply. I went back and installed bazel-0.2.0 as described in https://tensorflow.github.io/serving/setup and then also bazel-0.3.0 as in the tested way, but neither does affect the above error showing up.
On the other hand I have no problem building tensorflow with GPU, i.e. git clone --recursive https://github.com/tensorflow/tensorflow.git
and then ./configure && \ bazel build -c opt --config=cuda tensorflow/tools/pip_package:build_pip_package
.
So error seems not related to bazel and tensorflow part.
from serving.
I have now used Craig Citro's recipe (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-gpu) to build docker container. I can confirm that this works perfectly for tensorflow, but I am getting the exact same error as above when trying to build TF serving.
from serving.
I got to build Serving docker GPU image successfully after some modifications on
https://github.com/tensorflow/serving/blob/master/tensorflow_serving/tools/docker/Dockerfile.devel
- Changed
FROM ubuntu:14.04
toFROM nvidia/cuda:7.5
- Manually added
cxx_builtin_include_directory: "/usr/local/cuda-7.5/include"
in
$SERVING_ROOT/tensorflow/third_party/gpus/crosstool
from serving.
@LiberiFatali good to hear! Unfortunately I cannot confirm it from my side as the above error keeps appearing. I have also checked with a second machine, but again the same error.
I have tried your recipe - here is what I did:
docker build -t tfs_cuda .
(with https://github.com/tensorflow/serving/blob/master/tensorflow_serving/tools/docker/Dockerfile.devel as Dockerfile in current directory)docker run --rm -v /myhosthome:/mydockerhome -ti tfs_cuda:latest bash
- copy cudnn5 from /mydockerhome to /usr/local/cuda/include and /usr/local/cuda/lib64
git clone --recurse-submodules https://github.com/tensorflow/serving
- set cxx_builtin_include_directory: "/usr/local/cuda-7.5/targets/x86_64-linux/include" (setting to "/usr/local/cuda-7.5/include" does not work for me, i.e. gives error "this rule is missing dependency declarations for the following files included by 'external/org_tensorflow/tensorflow/core/kernels/resize_nearest_neighbor_op_gpu.cu.cc': '/usr/local/cuda-7.5/targets/x86_64-linux/include/cuda_runtime.h' etc")
cd /serving/tensorflow; ./configure; cd ..
bazel build -c opt --config=cuda tensorflow_serving/...
Results in: ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/org_tensorflow/tensorflow/core/kernels/BUILD:585:1: undeclared inclusion(s) in rule '@org_tensorflow//tensorflow/core/kernels:transpose_functor_gpu':
this rule is missing dependency declarations for the following files included by 'external/org_tensorflow/tensorflow/core/kernels/transpose_functor_gpu.cu.cc':
'/serving/tensorflow/third_party/eigen3/Eigen/Core'.
Anything obvious that you have done differently?
from serving.
@revilokeb I use CUDA 7.5 with cudnn-7.0-linux-x64-v4.0-prod on Ubuntu 14.04.
My local Tensorflow Serving is not up-to-date with official github repo, I searched my local repo and there is no build rule that contains "tensorflow/core/kernels:transpose_functor_gpu"
from serving.
@LiberiFatali many thanks! No clue yet why this strange error is coming back to me consistently on two different machines. Will keep investigating...
from serving.
LiberiFatali, how do you run the docker image build? using nvidia-docker?
from serving.
Not with nvidia-docker command, although I went through the process of installing nvidia-docker.
I ran it like this:
docker build --pull -t tfserving_gpu -f tensorflow_serving/tools/docker/Dockerfile.devel-gpu .
Note that my local repo is not the lastest github code.
from serving.
I have tried latest code and also got error "undeclared inclusion(s) in rule"
Here is the version of my local repo that can be used to build Docker GPU image successfully:
from serving.
@LiberiFatali many thanks for checking latest code, too, and also providing your local repo version. I can confirm that with your above version I am also able to build TF serving on docker with GPU.
Complete steps using nvidia-docker can be found here. Of course using nvidia-docker is only one way of doing it.
I was then curious if I can use that docker image to run the inception example, in particular have the server on docker and client on host (or any other IP). It is indeed working, steps are here.
I have had a quick look at the diff between b4e9815 and current master branch, but so far have not been able resolve the above build error.
from serving.
Great work, revilokeb. I tried it out, except you need to specify LD_LIBRARY_PATH, everything worked.
from serving.
Closing out old issue. There a GPU docker container here now.
from serving.
Related Issues (20)
- Building tensorflow serving with TCMalloc HOT 1
- Unable to compile prediction_service.proto for Golang HOT 4
- TF Serving batching for Sparse Tensors HOT 6
- TF Serving gets stuck in the polling loop due to a non-existing model provided in config file HOT 3
- Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT HOT 3
- TensorFlow serving seems to have no version attribute HOT 3
- GPU inference in Docker container fails due to missing libdevice directory HOT 4
- CPU Memory occupied by TF Serving even though serving is on GPU HOT 6
- Version 2.15 release? HOT 7
- Mismatch between TensorRT version used in TF 2.14 GPU docker images for tensorflow/serving and tensorflow/tensorflow causes segfault during inference HOT 1
- Critical Vulnerability HOT 3
- Who to contact for security issues HOT 3
- Difference between Metrics emitted by TF Serving HOT 4
- OP_REQUIRES failed at xla_ops : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND HOT 7
- java.lang.RuntimeException: Unexpected code Response{protocol=http/1.1, code=400, message=Bad Request, url=http://localhost:8501/v1/models/myfruit:predict} HOT 4
- CUDA Graphs support for Tensorflow Serving HOT 2
- OP_REQUIRES failed at xla_compile_on_demand_op.cc:290 : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND HOT 4
- Add health check to Dockerfile HOT 1
- ETA for TensorFlow Runtime Integration?
- Why TF Serving using one CUDA Compute Stream HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from serving.