Comments (20)
Hi @Edwardmark !
Thank you for extensive description of the problem. I suspect your issue might be connected to the "gpu"
backend of external_source
operator in DALI. Currently, the GPU input is not yet supported - we are finishing this effort (
#53). It's going to be released in tritonserver:21.06
.
Should you like to verify that it's about the GPU input, please update your tritonserver
to 21.04
. With this version we added missing error log in DALI Backend (#43).
from dali_backend.
@szalpal I changed the dali_det_post pipeline as follows:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import nvidia.dali as dali
import nvidia.dali.fn as fn
import nvidia.dali.types as types
pipe = dali.pipeline.Pipeline(batch_size=32, num_threads=8)
with pipe:
nmsed_boxes = fn.external_source(device='cpu', name="NMSED_BOXES")
scale_ratio = fn.external_source(device='cpu', name='SCALE_RATIO_INPUT')
# Rescale BBOX
ratio = fn.reductions.min(scale_ratio)
nmsed_boxes /= ratio
pipe.set_outputs(nmsed_boxes)
pipe.serialize(filename="1/model.dali")
But I met the same error:
I0520 02:15:41.783194 133528 ensemble_scheduler.cc:509] Internal response allocation: nmsed_classes, size 400, addr 0x7fb0844b0e00, memory type 2, type id 0
I0520 02:15:41.788463 133528 ensemble_scheduler.cc:524] Internal response release: size 4, addr 0x7fb0844b0200
I0520 02:15:41.788483 133528 ensemble_scheduler.cc:524] Internal response release: size 1600, addr 0x7fb0844b0400
I0520 02:15:41.788489 133528 ensemble_scheduler.cc:524] Internal response release: size 400, addr 0x7fb0844b0c00
I0520 02:15:41.788496 133528 ensemble_scheduler.cc:524] Internal response release: size 400, addr 0x7fb0844b0e00
I0520 02:15:41.788517 133528 infer_request.cc:502] prepared: [0x0x7fadd40015e0] request id: , model: dali_det_post, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fadd40019b8] input: NMSED_BOXES, type: FP32, original shape: [1,100,4], batch + shape: [1,100,4], shape: [100,4]
[0x0x7fadd4001868] input: SCALE_RATIO_INPUT, type: FP32, original shape: [1,2], batch + shape: [1,2], shape: [2]
override inputs:
inputs:
[0x0x7fadd4001868] input: SCALE_RATIO_INPUT, type: FP32, original shape: [1,2], batch + shape: [1,2], shape: [2]
[0x0x7fadd40019b8] input: NMSED_BOXES, type: FP32, original shape: [1,100,4], batch + shape: [1,100,4], shape: [100,4]
original requested outputs:
SCALED_NMSED_BOXES_OUTPUT
requested outputs:
SCALED_NMSED_BOXES_OUTPUT
tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Socket closed
> /app/model_repository/ensemble-face_det-ucs/grpc_client.py(182)main()
In addition, my first preprocess model is defined as follows:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import nvidia.dali as dali
import nvidia.dali.fn as fn
import nvidia.dali.types as types
import argparse
import numpy as np
import os
pipe = dali.pipeline.Pipeline(batch_size=32, num_threads=8)
with pipe:
expect_output_size = (640., 640.)
images = fn.external_source(device='cpu', name="IMAGE_RAW")
images = fn.image_decoder(images, device="mixed", output_type=types.RGB)
raw_shapes = fn.shapes(images, dtype=types.INT32)
images = fn.resize(
images,
mode='not_larger',
size=expect_output_size,
)
resized_shapes = fn.shapes(images, dtype=types.INT32)
ratio = fn.slice(resized_shapes / raw_shapes, 0, 2, axes=[0])
images = fn.crop_mirror_normalize(images, mean=[0.], std=[255.], output_layout='CHW')
images = fn.pad(images, axis_names="HW", align=expect_output_size)
pipe.set_outputs(images, ratio)
os.system('rm -rf 1 && mkdir -p 1')
pipe.serialize(filename="1/model.dali")
Any advise to make it work please? Thanks. @szalpal
from dali_backend.
@szalpal I changed the version to 21.04 and change all input to cpu, but still no error log is shown, and I get the same log as below, what is your advise? Thanks.
The outpus is same as 21.03
I0520 02:58:13.877026 1181 plan_backend.cc:2447] Running face_det-ucs_0_gpu0 with 1 requests
I0520 02:58:13.877071 1181 plan_backend.cc:3378] Optimization profile default [0] is selected for face_det-ucs_0_gpu0
I0520 02:58:13.877337 1181 plan_backend.cc:2869] Context with profile default [0] is being executed for face_det-ucs_0_gpu0
I0520 02:58:14.543531 1181 infer_response.cc:139] add response output: output: num_detections, type: INT32, shape: [1,1]
I0520 02:58:14.543578 1181 ensemble_scheduler.cc:509] Internal response allocation: num_detections, size 4, addr 0x7f7bf04b0200, memory type 2, type id 0
I0520 02:58:14.543609 1181 infer_response.cc:139] add response output: output: nmsed_boxes, type: FP32, shape: [1,100,4]
I0520 02:58:14.543621 1181 ensemble_scheduler.cc:509] Internal response allocation: nmsed_boxes, size 1600, addr 0x7f7bf04b0400, memory type 2, type id 0
I0520 02:58:14.543642 1181 infer_response.cc:139] add response output: output: nmsed_scores, type: FP32, shape: [1,100]
I0520 02:58:14.543653 1181 ensemble_scheduler.cc:509] Internal response allocation: nmsed_scores, size 400, addr 0x7f7bf04b0c00, memory type 2, type id 0
I0520 02:58:14.543672 1181 infer_response.cc:139] add response output: output: nmsed_classes, type: FP32, shape: [1,100]
I0520 02:58:14.543683 1181 ensemble_scheduler.cc:509] Internal response allocation: nmsed_classes, size 400, addr 0x7f7bf04b0e00, memory type 2, type id 0
I0520 02:58:14.544713 1181 ensemble_scheduler.cc:524] Internal response release: size 4, addr 0x7f7bf04b0200
I0520 02:58:14.544741 1181 ensemble_scheduler.cc:524] Internal response release: size 1600, addr 0x7f7bf04b0400
I0520 02:58:14.544749 1181 ensemble_scheduler.cc:524] Internal response release: size 400, addr 0x7f7bf04b0c00
I0520 02:58:14.544764 1181 ensemble_scheduler.cc:524] Internal response release: size 400, addr 0x7f7bf04b0e00
I0520 02:58:14.544789 1181 infer_request.cc:497] prepared: [0x0x7f79300016b0] request id: , model: dali_det_post, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7f7930001a88] input: NMSED_BOXES, type: FP32, original shape: [1,100,4], batch + shape: [1,100,4], shape: [100,4]
[0x0x7f7930001938] input: SCALE_RATIO_INPUT, type: FP32, original shape: [1,2], batch + shape: [1,2], shape: [2]
override inputs:
inputs:
[0x0x7f7930001938] input: SCALE_RATIO_INPUT, type: FP32, original shape: [1,2], batch + shape: [1,2], shape: [2]
[0x0x7f7930001a88] input: NMSED_BOXES, type: FP32, original shape: [1,100,4], batch + shape: [1,100,4], shape: [100,4]
original requested outputs:
SCALED_NMSED_BOXES_OUTPUT
requested outputs:
SCALED_NMSED_BOXES_OUTPUT
tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Socket closed
> /app/model_repository_2104/ensemble-face_det-ucs/grpc_client.py(182)main()
from dali_backend.
It's possible, that even though you changed the ExternalSource to "cpu"
, the bug still prevents normal processing. Anyhow, we've just merged the GPU input feature to upstream. It's going to be released in tritonserver:21.06
, however it's very easy to run the upstream dali_backend
with the latest tritonserver
release.
Could you try it out and verify if the GPU input solves you problem, or we need to dig deeper? The instructions how to build dali_backend
docker image are here: Docker build
from dali_backend.
@szalpal It works, thank you very much.
from dali_backend.
@szalpal How to build the docker without download the git repositorys? I mean if I download the related git repos beforehand, what changes should I make to the cmakelists in dali_backend? When build the docker, it happens the following erros which seems like network error:
Step 12/19 : RUN mkdir build_in_ci && cd build_in_ci && cmake -D CMAKE_INSTALL_PREFIX=/opt/tritonserver -D CMAKE_BUILD_TYPE=Release -D TRITON_COMMON_REPO_TAG="r$TRITON_VERSION" -D TRITON_CORE_REPO_TAG="r$TRITON_VERSION" -D TRITON_BACKEND_REPO_TAG="r$TRITON_VERSION" .. && make -j"$(grep ^processor /proc/cpuinfo | wc -l)" install
---> Running in e11becb3e19f
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build configuration: Release
-- RapidJSON found. Headers: /usr/include
-- RapidJSON found. Headers: /usr/include
Scanning dependencies of target repo-core-populate
[ 11%] Creating directories for 'repo-core-populate'
[ 22%] Performing download step (git clone) for 'repo-core-populate'
Cloning into 'repo-core-src'...
Switched to a new branch 'r21.05'
Branch 'r21.05' set up to track remote branch 'r21.05' from 'origin'.
[ 33%] No patch step for 'repo-core-populate'
[ 44%] Performing update step for 'repo-core-populate'
fatal: unable to access 'https://github.com/triton-inference-server/core.git/': GnuTLS recv error (-110): The TLS connection was non-properly terminated.
CMake Error at /dali/build_in_ci/_deps/repo-core-subbuild/repo-core-populate-prefix/tmp/repo-core-populate-gitupdate.cmake:55 (message):
Failed to fetch repository
'https://github.com/triton-inference-server/core.git'
make[2]: *** [CMakeFiles/repo-core-populate.dir/build.make:117: repo-core-populate-prefix/src/repo-core-populate-stamp/repo-core-populate-update] Error 1
make[1]: *** [CMakeFiles/Makefile2:96: CMakeFiles/repo-core-populate.dir/all] Error 2
make: *** [Makefile:104: all] Error 2
CMake Error at /usr/local/share/cmake-3.17/Modules/FetchContent.cmake:912 (message):
Build step for repo-core failed: 2
Call Stack (most recent call first):
/usr/local/share/cmake-3.17/Modules/FetchContent.cmake:1003 (__FetchContent_directPopulate)
/usr/local/share/cmake-3.17/Modules/FetchContent.cmake:1044 (FetchContent_Populate)
CMakeLists.txt:72 (FetchContent_MakeAvailable)
from dali_backend.
as far as I now, unfortunately cloning git repos is immanent for building backends in Triton. Is there a particular reason you would like to clone repos beforehand? If you want to use the latest tritonserver version (21.05), I merged today the PR, which applies that #68 . So you can clone the upstream dali_backend
from dali_backend.
@szalpal because the network is not always good, so I want to clone repos beforhand, then just use the repo to make the build process more quicklyl.
from dali_backend.
I see. It would be possible to tweak the root CMakeLists.txt
file in order to achieve what you want. Although it is not in our scope right now (and I doubt it will ever be), so we will not implement it, you would need to try to do it yourself.
IMPORTANT: this is a dirty explanation of a workaround and we certainly do not support nor plan to support this way of building in foreseeable future. We also highly discourage changing this building procedure for production environments.
The point is, that there are these three repos, that need to be acquired for proper building any backend: core
, common
and backend
. Our build procedure acquires them in these three declarations:
Lines 54 to 71 in bb9204c
Should you like to change them to be acquired from your disk, firstly clone all three repos you need and then you can switch from fetching content from git repository to fetching content from disk location, by changing GIT
, GIT_SHALLOW
and GIT_REPOSITORY
subcommands. Below is the documentation of the FetchContent
functions, which might be helpful:
https://cmake.org/cmake/help/latest/module/FetchContent.html
https://cmake.org/cmake/help/latest/module/ExternalProject.html#command:externalproject_add
You should pay attention to Directory Options
in ExternalProject_Add
directive
from dali_backend.
@szalpal Thank you very much.
from dali_backend.
@szalpal could you please give me more hit on how to change GIT, GIT_SHALLOW and GIT_REPOSITORY subcommands? Thanks. I changed the lines as follows:
FetchContent_Declare(
repo-common
SOURCE_DIR /dali/common/
)
FetchContent_Declare(
repo-core
SOURCE_DIR /dali/core/
)
FetchContent_Declare(
repo-backend
SOURCE_DIR /dali/backend/
)
FetchContent_MakeAvailable(repo-common repo-core repo-backend)
is that right?
The DIR /dali/common, /dali/core/, /dali/backend/ is the obtained by :
git clone https://github.com/triton-inference-server/common.git
git clone https://github.com/triton-inference-server/core.git
git clone https://github.com/triton-inference-server/backend.git
I build the docker image successfully.
from dali_backend.
what is the problem you are facing?
from dali_backend.
@szalpal could you please give me more hit on how to change GIT, GIT_SHALLOW and GIT_REPOSITORY subcommands? Thanks. I changed the lines as follows:
FetchContent_Declare( repo-common SOURCE_DIR /dali/common/ ) FetchContent_Declare( repo-core SOURCE_DIR /dali/core/ ) FetchContent_Declare( repo-backend SOURCE_DIR /dali/backend/ ) FetchContent_MakeAvailable(repo-common repo-core repo-backend)
is that right?
The DIR /dali/common, /dali/core/, /dali/backend/ is the obtained by :git clone https://github.com/triton-inference-server/common.git git clone https://github.com/triton-inference-server/core.git git clone https://github.com/triton-inference-server/backend.git
I build the docker image successfully.
@szalpal I just want to make sure if the way I tried is correct to replace git repo with local repos.
The docker-build process is ok, but when I want to run the server, it shows a bug:
I0617 08:17:59.462826 81 dali_backend.cc:269] Triton TRITONBACKEND API version: 1.0
I0617 08:17:59.462836 81 dali_backend.cc:273] 'dali' TRITONBACKEND API version: 1.4
Segmentation fault (core dumped)
So how to deal with that?
from dali_backend.
As I mentioned above, we do not support nor plan to support this kind of building procedure. Therefore I unfortunately won't be able to answer all the questions, simply because I didn't tried it nor tested it.
The error you're facing is there because the server verifies the API version the backend has been built with. Be sure to use proper version of backend.git
repo, which has the following defines:
#define TRITONBACKEND_API_VERSION_MAJOR 1
#define TRITONBACKEND_API_VERSION_MINOR 0
from dali_backend.
As I mentioned above, we do not support nor plan to support this kind of building procedure. Therefore I unfortunately won't be able to answer all the questions, simply because I didn't tried it nor tested it.
The error you're facing is there because the server verifies the API version the backend has been built with. Be sure to use proper version of
backend.git
repo, which has the following defines:#define TRITONBACKEND_API_VERSION_MAJOR 1 #define TRITONBACKEND_API_VERSION_MINOR 0
I checkout to the 21.05 branch, and the problem is solved. Thank you very much.@szalpal
from dali_backend.
@szalpal Do I have to install nvidia-dali-nightly?
https://github.com/triton-inference-server/dali_backend/blob/main/docker/Dockerfile.release#L65
Thanks.
from dali_backend.
@szalpal Thanks.
from dali_backend.
@szalpal Do I have to install nvidia-dali-nightly?
https://github.com/triton-inference-server/dali_backend/blob/main/docker/Dockerfile.release#L65
Thanks.
Not necessarily. We recommend using latest DALI release
from dali_backend.
https://github.com/triton-inference-server/dali_backend/blob/main/docker/Dockerfile.release#L65
It's possible, that even though you changed the ExternalSource to
"cpu"
, the bug still prevents normal processing. Anyhow, we've just merged the GPU input feature to upstream. It's going to be released intritonserver:21.06
, however it's very easy to run the upstreamdali_backend
with the latesttritonserver
release.Could you try it out and verify if the GPU input solves you problem, or we need to dig deeper? The instructions how to build
dali_backend
docker image are here: Docker build
If I use dali 1.2, would the dali_backend support gpu input?
from dali_backend.
If I use dali 1.2, would the dali_backend support gpu input?
@Edwardmark, yes. Although we don't guarantee backwards compatibility. Therefore, only the latest DALI version is properly tested and maintained
from dali_backend.
Related Issues (20)
- how to use the numpy data in the DALI HOT 3
- Batching does not improve performance with dali HOT 10
- Can dali backend support default values or optional input? HOT 2
- Unexpected large memory needed for gpu resize HOT 4
- Error in thread 31: nvJPEG error (5): The user-provided allocator functions, for either memory allocation or for releasing the memory, returned a non-zero code. HOT 6
- Cannot compile dali_backend with older version of triton HOT 2
- how to provide batch input data for dali pipeline whicn input shapes [-1] HOT 1
- if I want to crop from different start point, how can I build pipe to do this? HOT 2
- Test issue
- Connecting InputOperator with no explicit inputs to Triton HOT 12
- Could not serialize dali.fn.python_function HOT 1
- when using crop_mirror_normalize func, Output layout "CHW" is slower than "HWC" HOT 5
- dlopen libcuda.so failed!. Please install GPU dirverTraceback (most recent call last): HOT 4
- Prefeed multiple input batches to the inference pipeline HOT 7
- Unable to load numpy module in a DALI backend HOT 3
- DALI pipeline in Triton - formatting InferInput batch of images for UINT8 HOT 3
- 'NoneType' object has no attribute 'loader' when trying to load DALI model. HOT 15
- How to format client code for inception example HOT 14
- How to get list of image paths into dali pipeline? HOT 4
- How to use scalar inputs HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dali_backend.