triton-inference-server / dali_backend Goto Github PK

View Code? Open in Web Editor NEW

117.0 9.0 27.0 24.37 MB

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.

Home Page: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html

License: MIT License

CMake 4.19% Python 35.28% C++ 44.43% C 0.65% Shell 15.44%

nvidia-dali dali deep-learning gpu data-preprocessing python fast-data-pipeline image-processing

dali_backend's People

Contributors

Stargazers

Watchers

dali_backend's Issues

Using custom DALI plugins with Triton

Hi! I'm trying to use custom DALI plugins with Triton and I've followed the guide in dali_backend/docs/examples/dali_plugin/README.md, but somehow it doesn't work. Below are the two errors encountered that really confuse me.

The first one looks like this:

I've noticed the mismatch between the version of DALI I'm using(1.15) and the version of 'dali' TRITONBACKEND API(1.10). Is this really a problem? And if so, how can I fix it?

The second one looks like this:

I think it is the same problem as the one mentioned in the guide. However, I've double checked my backend configuration and I think it's fine. The plugin library locates at /model_repo/libcustom_operation.so, and the command I use to start the triton server is sudo docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/yychen/Work/triton/model_repo/:/models nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models --backend-config dali,plugin_libs=/models/libcustom_operation.so

Thanks! Wish to have your help.

Ensemble model cannot be inferenced by clients without clear error log to debug.

Description
I run a ensemble model contains three model which executed sequencely one by one, I check each model, and each one is ok, and I also check two models ensemble, that is ok too. But when I connect them together,and I run the grpc client, the server crashed without meaningful error logs as follows:

Traceback (most recent call last):
  File "grpc_client.py", line 209, in <module>
    main()
  File "grpc_client.py", line 182, in main
    results = triton_client.infer(model_name=model_name, inputs=inputs, outputs=outputs)
  File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 1086, in infer
    raise_error_grpc(rpc_error)
  File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Socket closed

Triton Information
21.03 docker container

To Reproduce
The ensemble config.pbtxt is as follows:

platform: "ensemble"
max_batch_size: 16
input [
  {
    name: "IMAGE_RAW"
    data_type: TYPE_UINT8
    dims: [ -1 ]
  }
]
output [
  {
    name: "SCALE_RATIO" # ratio
    data_type: TYPE_FP32
    dims: [2]
  },
  {
    name: "NUM_DETECTIONS"
    data_type: TYPE_INT32
    dims: [ 1 ]
  },
  {
    name: "NMSED_SCORES"
    data_type: TYPE_FP32
    dims: [ 100 ]
  },
  {
    name: "NMSED_CLASSES"
    data_type: TYPE_FP32
    dims: [ 100 ]
  },
  {
    name: "SCALED_NMSED_BOXES"
    data_type: TYPE_FP32
    dims: [ 100, 4 ]
  }
]

ensemble_scheduling {
  step [
    {
      model_name: "dali_det_pre"
      model_version: -1
      input_map {
        key: "IMAGE_RAW"
        value: "IMAGE_RAW"
      }
      output_map {
        key: "DALI_OUTPUT_0"
        value: "NORM_IMG"
      }
      output_map {
        key: "DALI_OUTPUT_1"
        value: "SCALE_RATIO"
      }
    },
    {
      model_name: "face_det-ucs"
      model_version: -1
      input_map {
        key: "images"
        value: "NORM_IMG"
      }
      output_map {
        key: "num_detections"
        value: "NUM_DETECTIONS"
      }
      output_map {
        key: "nmsed_boxes"
        value: "NMSED_BOXES"
      }
      output_map {
        key: "nmsed_scores"
        value: "NMSED_SCORES"
      }
      output_map {
        key: "nmsed_classes"
        value: "NMSED_CLASSES"
      }
    },
    {
      model_name: "dali_det_post"
      model_version: -1
      input_map {
        key: "NMSED_BOXES"
        value: "NMSED_BOXES"
      }
      input_map {
        key: "SCALE_RATIO_INPUT"
        value: "SCALE_RATIO"
      }
      output_map {
        key: "SCALED_NMSED_BOXES_OUTPUT"
        value: "SCALED_NMSED_BOXES"
      }
    }
  ]
}

The client is as follows:

FLAGS = parse_args()

    triton_client = tritonclient.grpc.InferenceServerClient(url=FLAGS.url,
                                                            verbose=FLAGS.verbose)

    model_name = FLAGS.model_name
    model_version = -1

    print("Loading images")

    image_data = load_images(FLAGS.img_dir if FLAGS.img_dir is not None else FLAGS.img,
                             max_images=FLAGS.batch_size * FLAGS.n_iter)

    image_data = array_from_list(image_data)
    inputs = generate_inputs(FLAGS.input_names, image_data.shape, "UINT8")
    outputs = generate_outputs(FLAGS.output_names)

    # Initialize the data
    inputs[0].set_data_from_numpy(image_data)
    # Test with outputs
    results = triton_client.infer(model_name=model_name, inputs=inputs, outputs=outputs)
    print(results)

Expected behavior
results should be obtained without error, but now, the server just crashes.
@deadeyegoodwin Looking forward to your reply.

The dali_det_post model config.pbtxt is as follows:

backend: "dali"
max_batch_size: 32
input [
  {
    name: "NMSED_BOXES"
    data_type: TYPE_FP32
    dims: [ 100, 4 ]
  },
  {
    name: "SCALE_RATIO_INPUT"
    data_type: TYPE_FP32
    dims: [ 2 ]
  }
]
output [
  {
    name: "SCALED_NMSED_BOXES_OUTPUT" 
    data_type: TYPE_FP32
    dims: [100, 4]
  }
]
dynamic_batching {
  preferred_batch_size: [ 4, 8, 16, 32 ]
  max_queue_delay_microseconds: 100
}

the above pipeline is generated using the following code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import nvidia.dali as dali
import nvidia.dali.fn as fn
import nvidia.dali.types as types

pipe = dali.pipeline.Pipeline(batch_size=32, num_threads=8)
with pipe:
    nmsed_boxes = fn.external_source(device='gpu', name="NMSED_BOXES")
    scale_ratio = fn.external_source(device='gpu', name='SCALE_RATIO_INPUT')
   
    # Rescale BBOX
    ratio = fn.reductions.min(scale_ratio)
    nmsed_boxes /= ratio
    pipe.set_outputs(nmsed_boxes)

pipe.serialize(filename="1/model.dali")

The above dali_det_post modle can run correctly itself, but connecting it to the firt two models causes crashes in server.

Change the above post-processing model with python backend model as follows can run without error:

for request in requests:
            # Get INPUT0
            in_0 = pb_utils.get_input_tensor_by_name(request, "INPUT0")
            # Get INPUT1
            in_1 = pb_utils.get_input_tensor_by_name(request, "INPUT1")

            out_0 = in_0.as_numpy() / np.min(in_1.as_numpy())

            # Create output tensors. You need pb_utils.Tensor
            # objects to create pb_utils.InferenceResponse.
            out_tensor_0 = pb_utils.Tensor("OUTPUT0",
                                           out_0.astype(output0_dtype))

            # Create InferenceResponse. You can set an error here in case
            # there was a problem with handling this inference request.
            # Below is an example of how you can set errors in inference
            # response:
            #
            # pb_utils.InferenceResponse(
            #    output_tensors=..., TritonError("An error occured"))
            inference_response = pb_utils.InferenceResponse(
                output_tensors=[out_tensor_0])
            responses.append(inference_response)

Python config.pbtxt is as follows:

name: "python_det_post"
backend: "python"

input [
  {
    name: "INPUT0"
    data_type: TYPE_FP32
    dims: [ 100, 4 ]
    
  }
]
input [
  {
    name: "INPUT1"
    data_type: TYPE_FP32
    dims: [ 2 ]
    
  }
]
output [
  {
    name: "OUTPUT0"
    data_type: TYPE_FP32
    dims: [ 100, 4 ]
  }
]

instance_group [{ kind: KIND_CPU }]

Any suggestions please? @deadeyegoodwin Thanks in advance.

With triton ensemble model, how to make DALI internal output in GPU?

I have an ensemble model DALI+detect model, and I found the internal output of DALI is fixed to TRITONSERVER_MEMORY_CPU_PINNED in server log,

I0206 08:36:48.496881 1 infer_response.cc:165] add response output: output: DALI_OUTPUT_0, type: FP32, shape: [64,3,512,320]
I0206 08:36:48.496914 1 pinned_memory_manager.cc:131] pinned memory allocation: size 125829120, addr 0x7f085e000090
I0206 08:36:48.496918 1 ensemble_scheduler.cc:509] Internal response allocation: DALI_OUTPUT_0, size 125829120, addr 0x7f085e000090, memory type 1, type id 139652737555264
I0206 08:36:48.496923 1 infer_response.cc:165] add response output: output: DALI_OUTPUT_1, type: INT64, shape: [64,3]
I0206 08:36:48.496926 1 pinned_memory_manager.cc:131] pinned memory allocation: size 1536, addr 0x7f08658000a0
I0206 08:36:48.496928 1 ensemble_scheduler.cc:509] Internal response allocation: DALI_OUTPUT_1, size 1536, addr 0x7f08658000a0, memory type 1, type id 139652737552464

after I change the memtype to TRITONSERVER_MEMORY_GPU,

dali_backend/src/dali_backend.h

Lines 175 to 180 in 076e988

 output_shape.data(), output_shape.size())); 

 void* buffer; 

 TRITONSERVER_MemoryType memtype = TRITONSERVER_MEMORY_CPU_PINNED; 

 int64_t memid = 0; 

 auto buffer_byte_size = std::accumulate( 

 output_shape.begin(), output_shape.end(), 1,

DALI output img shape became GPU but image data is is still TRITONSERVER_MEMORY_CPU_PINNED :

I0207 07:39:24.490871 1 infer_response.cc:165] add response output: output: DALI_OUTPUT_0, type: FP32, shape: [64,3,512,320]
I0207 07:39:24.490905 1 pinned_memory_manager.cc:131] pinned memory allocation: size 125829120, addr 0x7f47fc000090
I0207 07:39:24.490909 1 ensemble_scheduler.cc:509] Internal response allocation: DALI_OUTPUT_0, size 125829120, addr 0x7f47fc000090, memory type 1, type id 0
I0207 07:39:24.490913 1 infer_response.cc:165] add response output: output: DALI_OUTPUT_1, type: INT64, shape: [64,3]
I0207 07:39:24.490917 1 ensemble_scheduler.cc:509] Internal response allocation: DALI_OUTPUT_1, size 1536, addr 0x7f47f8000000, memory type 2, type id 0
I0207 07:39:24.490986 1 ensemble_scheduler.cc:524] Internal response release: size 125829120, addr 0x7f47fc000090

I want to know is it possible to make the internal output all in GPU and what's the right method to set this. I think it can improve the throughput of ensemble model.

Cannot build dali backend docker

I try to build dali backend docker both on two branch 20.12 and master, but i am stuck at cmake process.

-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- DALI includes dir: /usr/local/lib/python3.6/dist-packages/nvidia/dali/include
-- DALI libs dir: /usr/local/lib/python3.6/dist-packages/nvidia/dali
-- DALI libs: dalidali_kernelsdali_operators
CMake Error at CMakeLists.txt:81 (add_subdirectory):
  The source directory

    /dali/extern/Catch2

  does not contain a CMakeLists.txt file.


-- Configuring incomplete, errors occurred!
See also "/dali/build_in_ci/CMakeFiles/CMakeOutput.log".
See also "/dali/build_in_ci/CMakeFiles/CMakeError.log".
The command '/bin/sh -c mkdir build_in_ci && cd build_in_ci &&     cmake                                                -D CMAKE_BUILD_TYPE=Release                        -D TRITON_DALI_SKIP_DOWNLOAD=ON ..                 -D CMAKE_INSTALL_PREFIX=/opt/tritonserver &&     make -j"$(grep ^processor /proc/cpuinfo | wc -l)" install' returned a non-zero code: 1

How to use C++ to call resnet50_trt generated "model. dali "to preprocess data?

Hi!

I have build TensorRT via ONNX, and can you tell me how to use C++ to call resnet50_trt generated "model. dali "to preprocess data?

Not able to build Triton-server with dali backend

Hi Team,

I am trying to build docker container of triton-server with dali implemented.

I have downloaded the tar file "dali_backend-main.zip" and extracted the zip file then below step followed
$ cd dali_backend-main/
$ docker build -f docker/Dockerfile.release -t tritonserver:dali-latest .
//but I am facing below issue I am attaching the log segment where issue started.
[..]
Setting up packagekit (1.1.13-2ubuntu1.1) ...
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of force-reload.
Failed to open connection to "system" message bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
Created symlink /etc/systemd/user/sockets.target.wants/pk-debconf-helper.socket → /usr/lib/systemd/user/pk-debconf-helper.socket.
Setting up packagekit-tools (1.1.13-2ubuntu1.1) ...
Setting up software-properties-common (0.98.9.5) ...
Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
Processing triggers for dbus (1.12.16-2ubuntu2.1) ...
Cannot add PPA: 'ppa:~deadsnakes/ubuntu/ppa'.
ERROR: '~deadsnakes' user or team does not exist.
The command '/bin/sh -c apt-get update && apt-get install -y software-properties-common && add-apt-repository ppa:deadsnakes/ppa && apt-get update && apt-get install -y zip wget build-essential autoconf autogen unzip python3.8 python3-pip libboost-all-dev rapidjson-dev && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 1
[..]
whole log is in attached segment
triton build fail log.txt

Please suggest what is going wrong

Failed to open serialized model file: models/dali/1/model.dali

Hello!

I'm trying to use autoserialization feature: taking the pipeline definition and config from this example, but it doesn't work. Triton tries to load the model, but then fails:

I0714 12:40:59.798507 1 dali_model.h:92] Loading DALI pipeline from file models/dali/1/model.dali
E0714 12:40:59.798660 1 dali_model.cc:32] Failed to open serialized model file: models/dali/1/model.dali
I0714 12:40:59.798695 1 dali_backend.cc:169] TRITONBACKEND_ModelFinalize: delete model state
E0714 12:40:59.798719 1 model_repository_manager.cc:1348] failed to load 'dali' version 1: Unknown: DALI Backend error: Failed to open serialized model file: models/dali/1/model.dali

Triton image I use: nvcr.io/nvidia/tritonserver:22.06-py3

Segfault when max_batch_size > 1

Hi everybody

I am facing issues when enabling dynamic scheduler with a max_batch_size bigger than 1, which gives me a segfault when submitting requests. In the main readme it says, that dali requires homogenous batch sizes. How would I achieve that when using the triton C API directly? In the tests introduced with the PR enabling dynamic batching, I can't find anything enforcing homogenous batch sizes. Am I missing something?

We are using the C API of the triton r21.06 release with a dali pipeline which is created with a batch size of 64 and then set the max_batch_size in the triton config.pbtxt file to 32 for all elements of the ensemble model.

Memory increase when decoding exception occurs

Restart pipeline cause memory leak .

inception_ensemble example uses wrong preprocessor

The inception_ensemble example uses channel-wise mean/std normalization, whereas I believe the source inception model just rescales to [-1,1] see here

Return String Value Issue

I have a dali model which gets size 16 integer array and I want to return a string value "Positive" or "Negative" depending on the sum of the values in this array. So far having errors and not sure if it is even possible with a dali model. What may be the issue with below pipeline?
Thanks,

Dali pipeline

@dali.pipeline_def(batch_size=8, num_threads=min(mp.cpu_count(), 4), device_id=0)
def pipe():
    input0 = dali.fn.external_source(device="cpu", name="DALI_INPUT_0")
    sum = dali.fn.reductions.sum(input0)

    if (sum >= 0):
         string_result = "Positive"
    else:
         string_result = "Negative"

    string_result_e = np.char.encode(string_result, encoding='utf-8')

    output0 = dali.fn.external_source(source=string_result_e)

    return  output0;

Model server configuration:

name: "dali_simple_test"
backend: "dali"
max_batch_size: 8
input [
    {
        name: "DALI_INPUT_0"
        data_type: TYPE_INT8
        dims: [ -1 ]
    }
]
output [
    {
        name: "DALI_OUTPUT_0"
        data_type: TYPE_UINT8
        dims: [ -1 ]
    }
]

Error message I got:

Operator node with name DALI_INPUT_0 not found.

How to send image binary to dali in triton?

Hi, I am trying to send data through the triton sdk with BYTES data how ever I got 400 error with the following message

[/opt/dali/dali/operators/decoder/nvjpeg/nvjpeg_decoder_decoupled_api.h:546] [/opt/dali/dali/image/image_factory.cc:89] Assert on "CheckIsPNG(encoded_image, length) + CheckIsBMP(encoded_image, length) + CheckIsGIF(encoded_image, length) + CheckIsJPEG(encoded_image, length) + CheckIsTiff(encoded_image, length) + CheckIsPNM(encoded_image, length) + CheckIsJPEG2k(encoded_image, length) == 1" failed: Encoded image has ambiguous format

It seems like the image format was not recognised correctly. Would you mind to point out what's wrong?
The following are code I used to test. I changed the type to type_string so that BYTES data could be send through the triton sdk.

 import nvidia.dali as dali

 pipe = dali.pipeline.Pipeline(batch_size=256, num_threads=4, device_id=0)
 with pipe:
     images = dali.fn.external_source(device="cpu", name="DALI_INPUT_0")
     images = dali.fn.image_decoder(images, device="mixed")
     images = dali.fn.resize(images, resize_x=224, resize_y=224)
     pipe.set_outputs(images)

 pipe.serialize(filename="/models/dali/1/model.dali")

triton model repository

models
├── dali
│   ├── 1
│   │   └── model.dali
│   └── config.pbtxt

config.pbtxt

name: "dali"
backend: "dali"
max_batch_size: 256
input [
    {
     name: "DALI_INPUT_0"
     data_type: TYPE_STRING
     dims: [ -1 ]
    }
]

output [
    {
     name: "DALI_OUTPUT_0"
     data_type: TYPE_FP32
     dims: [ 224, 224, 3 ]
    }
]

predict.py

import tritonclient.http as httpclient
import numpy as np


if __name__ == "__main__":
    img_bytes = open("sample.jpg", "rb").read()

    img_data = np.array([img_bytes], dtype=bytes)
    img_data = np.stack([img_data], axis=0)

    infer_input = httpclient.InferInput("DALI_INPUT_0", img_data.shape, "BYTES")
    infer_input.set_data_from_numpy(img_data, binary_data=True)
    triton_client = httpclient.InferenceServerClient(url="localhost:8000", verbose=True)

    result = triton_client.infer("dali", [infer_input])
    print(result)

sample.jpg

The doc seems complicated, could you give us a simple guide as how to use triton to inference the model with pre and post data processing with python code?

DALI Pipeline with GPU input broken with latest version.

Hello,

We are using multiple DALI backend steps in an ensemble pipeline.

We are using master version of dali_backend compiled from source inside a docker image of tritonserver 21.02.

There is a DALI pipeline in the middle of the ensemble - that receives GPU input.

The pipeline will segfault on any input.

I've installed LLDB and compiled a debug instance of dali_backend.

The issue stems from the addition of #16 .

(lldb) bt
* thread #15, name = 'tritonserver', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffe56096000)
    frame #0: 0x00007ffff69e96c3 libc.so.6`___lldb_unnamed_symbol1092$$libc.so.6 + 83
  * frame #1: 0x00007fff88368d15 libtriton_dali.so`void triton::backend::dali::detail::copy<triton::backend::dali::StorageCpu>(dst=0x00007ffa90003580, src=0x00007ffe56096000, count=2764800) at io_descriptor.h:78:9
    frame #2: 0x00007fff8836558e libtriton_dali.so`triton::backend::dali::IODescriptor<char, triton::backend::dali::StorageCpu, true>::append(this=0x00007ffa90003500, buffer="", size=2764800) at io_descriptor.h:160:33
    frame #3: 0x00007fff88357049 libtriton_dali.so`triton::backend::dali::detail::GenerateInputs(request=0x00007ffb640038d0) at dali_backend.h:148:24
    frame #4: 0x00007fff88357ac4 libtriton_dali.so`triton::backend::dali::detail::ProcessRequest(response=0x00007ffa90002fb0, request=0x00007ffb640038d0, executor=0x00007ffdf0006d90) at dali_backend.h:206:44
    frame #5: 0x00007fff8835b483 libtriton_dali.so`triton::backend::dali::TRITONBACKEND_ModelInstanceExecute(instance=0x00007ffdf0010340, reqs=0x00007ffa90000f30, request_count=1) at dali_backend.cc:423:71

Since GenerateInputs is templated over IoDescr<true> - It is only ever implemeted for StorageBackend=Cpu.

The code seems to assume that at https://github.com/triton-inference-server/dali_backend/blob/main/src/dali_backend.h#L129, but this is overridden by triton in https://github.com/triton-inference-server/dali_backend/blob/main/src/dali_backend.h#L133 with possibly an Input with device_type == GPU.

However when a GPU input is present, because of the templating memcpy will be called with a GPU pointer, and crash the pipeline.

Since this information is only provided to dali_backend at runtime, we need some runtime checks to correctly choose between GPU or CPU storages, instead of templating.

Thanks!

Insufficient CUDA driver version error on CPU-only system

Hi! I am using dali_backend as a image pre-processing model in my pipeline. While when deploying on a system with GPUs it works as expected, when I launch the server in a CPU-only system I have the following error; dali_backend.cc:438 | CUDA runtime API error cudaErrorInsufficientDriver (35).

Here is my pipeline:

def export_pipeline(
    input_name='DALI_INPUT_0',
    image_dim=[800, 800],
    batch_size=1,
    num_threads=1,
    device='cpu',
    padding=True,
    device_id=None,
    output_dir='./out/',
):

    pipeline = dali.pipeline.Pipeline(
        batch_size=batch_size,
        num_threads=num_threads,
        device_id=device_id
    )

    with pipeline:
        data = fn.external_source(name=input_name, device=device)

        images = fn.image_decoder(data, device=device)
        images = fn.resize(images, size=image_dim, mode="not_larger", max_size=image_dim)
        images = fn.pad(images, fill_value=0, shape=[image_dim[0], image_dim[1], 1])

        input_shape = fn.shapes(images)
        input_shape = fn.cast(input_shape, dtype=dali.types.FLOAT)

        images = fn.transpose(images, perm=[2, 0, 1])
        images = fn.cast(images, dtype=dali.types.FLOAT)

        shapes = fn.peek_image_shape(data)
        shapes = fn.cast(shapes, dtype=dali.types.FLOAT)

        out = [
            images,
            input_shape,
            shapes
        ]
        pipeline.set_outputs(*out)

Thank you!

dynamic batching for dali_backend

hi, guys. is there dynamic batching for dali backend ? Currently, I can only improve throughput by more model instance. I using dali_backend for resnet50's preprocess(e.g. image decode, resized, and normalize)

Encoded image has ambiguous format

I have some trouble when takes a request from client triton to server triton have model dali , I made model dali like the pipeline you recommend :
Error in thread 8: [/opt/dali/dali/operators/decoder/nvjpeg/nvjpeg_decoder_decoupled_api.h:565] [/opt/dali/dali/image/image_factory.cc:89] Assert on "CheckIsPNG(encoded_image, length) + CheckIsBMP(encoded_image, length) + CheckIsGIF(encoded_image, length) + CheckIsJPEG(encoded_image, length) + CheckIsTiff(encoded_image, length) + CheckIsPNM(encoded_image, length) + CheckIsJPEG2k(encoded_image, length) == 1" failed: Encoded image has ambiguous format

How to provide mean & stddev to dali.fn.normalize

Hi all, I am super new to Dali Backend, but the benchmark I recently ran are incredible. I am trying to preprocess the data with Dali backend, but struggle with the "normalize" operation. I would like to specify a "per channel" mean & stddev ([c1, c2, c3] for example), but keep getting errors.

Here is what I've tried so far:

...
mean=[0.485, 0.456, 0.406],
stddev=[0.229, 0.224, 0.225]
...

...
mean=(0.485, 0.456, 0.406),
stddev=(0.229, 0.224, 0.225)
...

...
mean=dali.types.Constant([0.485, 0.456, 0.406]),
stddev=dali.types.Constant([0.229, 0.224, 0.225])
...

Here is the full code of my pipeline:

import nvidia.dali as dali
from nvidia.dali.plugin.triton import autoserialize


@autoserialize
@dali.pipeline_def(batch_size=8, num_threads=8, device_id=0)
def pipe():

    # Getting the images.
    images = dali.fn.external_source(device="gpu", name="input_0")

    # Resizing.
    images = dali.fn.resize(images, resize_x=224, resize_y=224)

    # Permutting NHWC to NCHW.
    images = dali.fn.transpose(images, perm=[2, 0, 1])

    # Rescaling 0-255 to 0-1.
    images = images / 255.

    # Normalizing (images - mean) / std.
    images = dali.fn.normalize(
        images,
        batch=False,
        mean=dali.types.Constant([0.485, 0.456, 0.406]),
        stddev=dali.types.Constant([0.229, 0.224, 0.225])
    )

    return images

Many thanks !!

multi input to emsemble model will crash the server.

https://github.com/triton-inference-server/server/issues/2884

Error when executing Mixed operator decoders__Image when sending image binary to dali in triton

Hello, I'm trying to send image in binary format to triton server with Dali preprocessing
I'm trying to send JPEG or PNG images as bytes but getting error about Unrecognized image format. Supported formats are: JPEG, PNG, BMP, TIFF, PNM, JPEG2000 and WebP.
That how dali pipeline looks:

@dali.pipeline_def(batch_size=64, num_threads=4, device_id=0)
def pipe():
    images = dali.fn.external_source(device="cpu", name="DALI_INPUT_0")
    images = dali.fn.decoders.image(images, device="mixed", output_type=types.RGB)
    images = dali.fn.resize(images, resize_x=224, resize_y=224, device='gpu')
    return dali.fn.crop_mirror_normalize(images,
                                           dtype=types.FLOAT16,
                                           output_layout="CHW",
                                           device='gpu',
                                           mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
                                           std=[0.229 * 255, 0.224 * 255, 0.225 * 255])

Model repository for triton server:

model_repository_dali_back/
├── dali
│   ├── 1
│   │   └── model.dali
│   └── config.pbtxt
├── ensemble_dali_vit
│   ├── 1
│   └── config.pbtxt
└── vit_base_trt
    ├── 1
    │   └── model.plan
    └── config.pbtxt

Dali config.pbtxt:

name: "dali"
backend: "dali"
max_batch_size: 64
input [
{
    name: "DALI_INPUT_0"
    data_type: TYPE_STRING
    dims: [ -1 ]
}
]
output [
{
    name: "DALI_OUTPUT_0"
    data_type: TYPE_FP16
    dims: [ 3, 224, 224 ]
}
]
parameters: [
  {
    key: "num_threads"
    value: { string_value: "4" }
  }
]
dynamic_batching {}

Script to send request to triton:

import numpy as np
import tritonclient.http as httpclient
from tritonclient.utils import triton_to_np_dtype

# Setup a connection with the Triton Inference Server. 
triton_client = httpclient.InferenceServerClient(url="localhost:8000")

input_name = "INPUT"
output_name = "OUTPUT"
model_name = "ensemble_dali_vit"
## READ IMAGE
img_bytes = open('test_img.png', "rb").read() ## Also try with .jpeg image
img_data = np.array([img_bytes], dtype=bytes)
transformed_img = np.stack([img_data], axis=0)

# Specify the names of the input and output layer(s) of our model.
test_input = httpclient.InferInput(input_name, transformed_img.shape, datatype="BYTES")
test_input.set_data_from_numpy(transformed_img, binary_data=True)
test_output = httpclient.InferRequestedOutput(output_name, binary_data=True)
# Querying the server
results = triton_client.infer(model_name=model_name, inputs=[test_input], outputs=[test_output])
test_output = results.as_numpy(output_name)
print(test_output)

Error Log here

Traceback (most recent call last):
  File "/home/proevgenii1/tensorrt/mock_test_triron.py", line 22, in <module>
    results = triton_client.infer(model_name=model_name, inputs=[test_input], outputs=[test_output])
  File "/root/anaconda3/envs/tensorrt/lib/python3.9/site-packages/tritonclient/http/__init__.py", line 1490, in infer
    _raise_if_error(response)
  File "/root/anaconda3/envs/tensorrt/lib/python3.9/site-packages/tritonclient/http/__init__.py", line 65, in _raise_if_error
    raise error
tritonclient.utils.InferenceServerException: in ensemble 'ensemble_dali_vit', Runtime error: Critical error in pipeline:
Error when executing Mixed operator decoders__Image encountered:
Error in thread 1: [/opt/dali/dali/operators/decoder/nvjpeg/nvjpeg_decoder_decoupled_api.h:615] [/opt/dali/dali/image/image_factory.cc:102] Unrecognized image format. Supported formats are: JPEG, PNG, BMP, TIFF, PNM, JPEG2000 and WebP.
Stacktrace (7 entries):
[frame 0]: /opt/tritonserver/backends/dali/dali/libdali.so(+0x85422) [0x7fdb5ee1e422]
[frame 1]: /opt/tritonserver/backends/dali/dali/libdali.so(dali::ImageFactory::CreateImage(unsigned char const*, unsigned long, dali::DALIImageType)+0x204) [0x7fdb5ef2adf4]
[frame 2]: /opt/tritonserver/backends/dali/dali/libdali_operators.so(+0x96c9d4) [0x7fdb519bc9d4]
[frame 3]: /opt/tritonserver/backends/dali/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool, std::string const&)+0x1d0) [0x7fdb5ef01430]
[frame 4]: /opt/tritonserver/backends/dali/dali/libdali.so(+0x7470bf) [0x7fdb5f4e00bf]
[frame 5]: /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fdc20e87609]
[frame 6]: /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fdc1f7fa133]
. File: 
Stacktrace (6 entries):
[frame 0]: /opt/tritonserver/backends/dali/dali/libdali_operators.so(+0x595282) [0x7fdb515e5282]
[frame 1]: /opt/tritonserver/backends/dali/dali/libdali_operators.so(+0x96d53d) [0x7fdb519bd53d]
[frame 2]: /opt/tritonserver/backends/dali/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool, std::string const&)+0x1d0) [0x7fdb5ef01430]
[frame 3]: /opt/tritonserver/backends/dali/dali/libdali.so(+0x7470bf) [0x7fdb5f4e00bf]
[frame 4]: /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fdc20e87609]
[frame 5]: /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fdc1f7fa133]

Current pipeline object is no longer valid.

It looks like the image format was not recognized correctly. Or am I doing something wrong

Jetson Jetpack Support

Hi! I would like to ask if or there is plan for Dali backend to be support aarch64 architectures (ie Jetson devices etc). Thanks!

Multi-GPU Configuration with dali Signal (11)s on triton 21.12 during ensemble processing

Hi!

I'm trying to use dali for preprocessing images to send to yolo. I have a two GPU system, that after a few minutes running dali, segfaults and dies. The Dali step is run in an triton ensemble model that sends data to a yolo model compiled to tensorrt. The stacktrace seems to indicate the problem is within dali.

Repro:

The preprocessing code is the following:

import nvidia.dali as dali


@dali.pipeline_def(batch_size=64, num_threads=4, device_id=0)
def pipe():
    images = dali.fn.external_source(
        device="gpu", name="DALI_INPUT_0", dtype=dali.types.UINT8)
    images = dali.fn.color_space_conversion(
        images, image_type=dali.types.BGR, output_type=dali.types.RGB, device='gpu')
    images = dali.fn.resize(images, mode="not_larger",
                            resize_x=640, resize_y=384, device='gpu')
    images = dali.fn.crop(images, crop_w=640, crop_h=384, crop_pos_x=0, crop_pos_y=0,
                          fill_values=114, out_of_bounds_policy="pad", device='gpu')
    images = dali.fn.transpose(images, perm=[2, 0, 1], device='gpu')
    images = dali.fn.cast(images, dtype=dali.types.FLOAT, device='gpu')
    return images


pipe().serialize(filename="1/model.dali")

and config.pbtxt

name: "preprocessbgr"
backend: "dali"
max_batch_size: 64 
input [
{
    name: "DALI_INPUT_0"
    data_type: TYPE_UINT8
    dims: [ -1, -1, 3 ]
}
]
 
output [
{
    name: "OUTPUT_0"
    data_type: TYPE_FP32
    dims: [ 3, 384, 640 ]
}
]
dynamic_batching { }

There are about 100fps (1 batch size with 32 concurrency) of images being sent to triton, causing it to throw this error after several rounds of processing. The error reproduces after a few minutes:

Signal (11) received.
 0# 0x00005572DF1FBBD9 in tritonserver
 1# 0x00007F6BFF552210 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x00007F6BF5602D40 in /usr/local/cuda/compat/lib.real/libcuda.so.1
 3# 0x00007F6BF57383E3 in /usr/local/cuda/compat/lib.real/libcuda.so.1
 4# 0x00007F6BF585EB02 in /usr/local/cuda/compat/lib.real/libcuda.so.1
 5# 0x00007F6BF55BF2E3 in /usr/local/cuda/compat/lib.real/libcuda.so.1
 6# 0x00007F6BF55BFAC4 in /usr/local/cuda/compat/lib.real/libcuda.so.1
 7# 0x00007F6BF55C1BD5 in /usr/local/cuda/compat/lib.real/libcuda.so.1
 8# 0x00007F6BF562FAAE in /usr/local/cuda/compat/lib.real/libcuda.so.1
 9# 0x00007F6BC80CF0D9 in /opt/tritonserver/backends/dali/libtriton_dali.so
10# 0x00007F6BC809EFED in /opt/tritonserver/backends/dali/libtriton_dali.so
11# 0x00007F6BC80F3B65 in /opt/tritonserver/backends/dali/libtriton_dali.so
12# 0x00007F6BC8098130 in /opt/tritonserver/backends/dali/libtriton_dali.so
13# dali::ThreadPool::ThreadMain(int, int, bool) in /opt/tritonserver/backends/dali/dali/libdali.so
14# 0x00007F6ABDAE526F in /opt/tritonserver/backends/dali/dali/libdali.so
15# 0x00007F6BFFDBE609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
16# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

Also, interestingly enough, running this with

instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [ 0 ]
    }
]

Causes Signal (11) to show up faster (seconds) rather than minutes during ensembling over two GPUs.

Other things I've tried

I've tried multiple changes, such as

calling .gpu() and setting the external source to cpu, which causes the signal 11 to show up later.
single GPU instance makes dali play nicely and does not crash.

Theories

There is some kind of weirdness with the dali backend confusing gpu tensors across gpus perhaps?
Since the issue takes a few minutes to occur without instance groups, it is likely that cross scheduling devices (preprocessing in gpu:0 and then tensorrt on gpu:1) is causing issues and segfaults.

Versions

NVIDIA Release 21.12 (build 30441439)

Any thoughts? Appreciate this in advance.

resnet50_trt example doesn't work based on README instructions

resnet50_trt example doesn't work based on README instructions.

first, there is a typo parser -> perser in client.py

second, load_image func does not return result. problem position, so there is no data send to triton inference server 😞

ModuleNotFoundError: No module named 'numpy'

When running this pipeline it complains it can't find numpy:

import nvidia.dali as dali
import nvidia.dali.types as types
from nvidia.dali.plugin.triton import autoserialize
import numpy as np


@autoserialize
@dali.pipeline_def(batch_size=3, num_threads=1, device_id=0)
def pipe():
    max_size = 896
    fill_value = 114.0/255

    images = dali.fn.external_source(device="gpu", name="images")
    images = dali.fn.crop_mirror_normalize(images, output_layout=types.NCHW, scale=1.0/255, dtype=types.FLOAT)
    images = dali.fn.resize(images, max_size=max_size, mode="not_larger")

    shapes = dali.fn.shapes(images, dtype=types.INT64)

    w = dali.fn.slice(shapes, 2, 1, axes=[0])
    h = dali.fn.slice(shapes, 1, 1, axes=[0])

    dx = dali.fn.cast((max_size - w) // 2, dtype=types.FLOAT)
    dy = dali.fn.cast((max_size - h) // 2, dtype=types.FLOAT)

    shift = dali.fn.stack(dx, dy, 0.0, axis=0)
    transform = dali.fn.cat(np.identity(3, dtype=np.float32), shift, axis=1)

    letterboxed = dali.fn.warp_affine(images, transform, size=[max_size, max_size], fill_value=fill_value, interp_type=types.INTERP_LINEAR)
    
    return letterboxed

Cannot run dali model in deepstream-triton 5.1.21.2

Same model, i am successful run on trition server 20.11. But when i ran on deepstream-trition 5.1.21.2, there is an error related to tag version.

E0716 06:43:17.929970 5269 logging.cc:43] coreReadArchive.cpp (32) - Serialization Error in verifyHeader: 0 (Magic tag does not match)
E0716 06:43:17.930068 5269 logging.cc:43] INVALID_STATE: std::exception
E0716 06:43:17.930074 5269 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
W0716 06:43:17.930082 5269 autofill.cc:225] Autofiller failed to detect the platform for retinaface_preprocess (verify contents of model directory or use --log-verbose=1 for more details)
W0716 06:43:17.930086 5269 autofill.cc:248] Proceeding with simple config for now
I0716 06:43:17.930418 5269 model_repository_manager.cc:810] loading: retinaface_preprocess:1
E0716 06:43:17.941151 5269 model_repository_manager.cc:986] failed to load 'retinaface_preprocess' version 1: Not found: unable to load backend library: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: cannot allocate memory in static TLS block
ERROR: infer_trtis_server.cpp:1044 Triton: failed to load model retinaface_preprocess, triton_err_str:Invalid argument, err_msg:load failed for model 'retinaface_preprocess': version 1: Not found: unable to load backend library: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: cannot allocate memory in static TLS block;

ERROR: infer_trtis_backend.cpp:45 failed to load model: retinaface_preprocess, nvinfer error:NVDSINFER_TRTIS_ERROR
ERROR: infer_trtis_backend.cpp:184 failed to initialize backend while ensuring model:retinaface_preprocess ready, nvinfer error:NVDSINFER_TRTIS_ERROR
0:00:09.727932979  5269      0x4ab4780 ERROR          nvinferserver gstnvinferserver.cpp:362:gst_nvinfer_server_logger:<primary-inference> nvinferserver[UID 5]: Error in createNNBackend() <infer_trtis_context.cpp:246> [UID = 5]: failed to initialize trtis backend for model:retinaface_preprocess, nvinfer error:NVDSINFER_TRTIS_ERROR
I0716 06:43:17.941406 5269 server.cc:280] Waiting for in-flight requests to complete.
I0716 06:43:17.941431 5269 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
0:00:09.728082338  5269      0x4ab4780 ERROR          nvinferserver gstnvinferserver.cpp:362:gst_nvinfer_server_logger:<primary-inference> nvinferserver[UID 5]: Error in initialize() <infer_base_context.cpp:81> [UID = 5]: create nn-backend failed, check config file settings, nvinfer error:NVDSINFER_TRTIS_ERROR
0:00:09.728095248  5269      0x4ab4780 WARN           nvinferserver gstnvinferserver_impl.cpp:439:start:<primary-inference> error: Failed to initialize InferTrtIsContext
0:00:09.728099778  5269      0x4ab4780 WARN           nvinferserver gstnvinferserver_impl.cpp:439:start:<primary-inference> error: Config file path: /data/deepstream-retinaface/dstest_ssd_nopostprocess.txt
0:00:09.728476338  5269      0x4ab4780 WARN           nvinferserver gstnvinferserver.cpp:460:gst_nvinfer_server_start:<primary-inference> error: gstnvinferserver_impl start failed
Error: gst-resource-error-quark: Failed to initialize InferTrtIsContext (1): gstnvinferserver_impl.cpp(439): start (): /GstPipeline:pipeline0/GstNvInferServer:primary-inference:

How do i fix this problem?

Dali backend support feature to handle init function

Is your feature request related to a problem? Please describe.
Currently dali_backend just works with post and preprocessing for the functions which do not requires initialization. What if when the init part takes longer time? Is there any solution for this?

Best practice for video ingestion

With DALI, outside of Triton, I would call fn.readers.video with a list of video filepaths. However, is that still feasible within Triton, because all the examples I see here are passing numpy arrays to ExternalSource. Was there a example with video decoding available that I missed? My current use case isn't streaming video, so I have videos 'at rest'.

Multi input model crashes TRITON server without errors

Description
Using a multi input DALI model the TRITON server crashes when sending an inference request from a python client. The server does not give any errors.
The amount of outputs does not influence the issue.

Triton Information
What version of Triton are you using?
20.11 and 20.12

Are you using the Triton container or did you build it yourself?
NGC Container.
The server is run using the following command: tritonserver --model-store=/Workdir/Models --strict-model-config=false --exit-on-error=false --model-control-mode=poll --log-verbose=2

To Reproduce
Baseline + model creation
The following code can be used to confirm the validity of the pipeline as well as the creation of the serialized model. Tested on the NGC Tensorflow 20.11 and 20.12 images.

import nvidia.dali as dali
import nvidia.dali.types as types
import numpy as np
 
batch_size = 1
pipe = dali.pipeline.Pipeline(batch_size=batch_size, num_threads=1, device_id=0)

# Create the pipeline
with pipe:
    x = dali.fn.external_source(device="cpu", name="DALI_X_INPUT")
    y = dali.fn.external_source(device="cpu", name="DALI_Y_INPUT")
    pipe.set_outputs(x, y)

# Build the pipeline and save to file for testing in TRITON
pipe.build()
pipe.serialize(filename="./model_multi_input.dali")

# Input the test data
pipe.feed_input("DALI_X_INPUT", np.array([1200.0]*batch_size, dtype=np.float32))
pipe.feed_input("DALI_Y_INPUT", np.array([1920.0]*batch_size, dtype=np.float32))

# Run the pipeline
pipe_out = pipe.run()

# Check the output data
print(f"X Input: {np.array(pipe_out[0][0])}, Y Input: {np.array(pipe_out[1][0])}")
# Outputs: "X Input: 1200.0, Y Input: 1920.0"

Tests using TRITON
Using the following config.pbtxt:

name: "dali_multi_input"
backend: "dali"
max_batch_size: 1
default_model_filename: "model_multi_input.dali"
input [
  {
    name: "DALI_X_INPUT"
    data_type: TYPE_UINT8
    dims: [ -1 ]
  },
  {
    name: "DALI_Y_INPUT"
    data_type: TYPE_UINT8
    dims: [ -1 ]
  }
]

output [
  {
    name: "DALI_OUTPUT_X"
    data_type: TYPE_UINT8
    dims: [ -1 ]
  },
  {
    name: "DALI_OUTPUT_Y"
    data_type: TYPE_UINT8
    dims: [ -1 ]
  }
]

And the following Client.py running using the python client wheels from 20.11 and 20.12:

import argparse
import numpy as np
import os
from builtins import range
import sys

import tritonclient.grpc as grpcclient
import tritonclient.grpc.model_config_pb2 as model_config
import tritonclient.http as httpclient
from tritonclient.utils import triton_to_np_dtype
from tritonclient.utils import InferenceServerException

FLAGS = None


def parse_model_grpc(model_metadata, model_config):
    """
    Check the configuration of a model to make sure it meets the
    requirements for an image classification network (as expected by
    this client)
    """
    if len(model_metadata.inputs) != 2:
        raise Exception("expecting 2 input, got {}".format(len(model_metadata.inputs)))

    if len(model_config.input) != 2:
        raise Exception(
            "expecting 2 input in model configuration, got {}".format(len(model_config.input)))

    input_metadata = model_metadata.inputs
    output_metadata = model_metadata.outputs

    return (input_metadata, output_metadata, model_config.max_batch_size)


def parse_model_http(model_metadata, model_config):
    """
    Check the configuration of a model to make sure it meets the
    requirements for an image classification network (as expected by
    this client)
    """
    if len(model_metadata['inputs']) != 2:
        raise Exception("expecting 2 input, got {}".format(len(model_metadata['inputs'])))

    if len(model_config['input']) != 2:
        raise Exception(
            "expecting 2 input in model configuration, got {}".format(len(model_config['input'])))

    input_metadata = model_metadata['inputs']
    output_metadata = model_metadata['outputs']

    return (input_metadata, output_metadata, model_config['max_batch_size'])


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-v',
                        '--verbose',
                        action="store_true",
                        required=False,
                        default=True,
                        help='Enable verbose output')
    parser.add_argument('-m',
                        '--model-name',
                        type=str,
                        required=False,
                        default='dali_multi_input',
                        help='Name of model. Default is preprocess_inception_ensemble.')
    parser.add_argument('-u',
                        '--url',
                        type=str,
                        required=False,
                        default='localhost:7000',
                        help='Inference server URL. Default is localhost:8000.')
    parser.add_argument('-i',
                        '--protocol',
                        type=str,
                        required=False,
                        default='HTTP',
                        help='Protocol (HTTP/gRPC) used to ' +
                        'communicate with inference service. Default is HTTP.')
    FLAGS = parser.parse_args()

    protocol = FLAGS.protocol.lower()

    try:
        if protocol == "grpc":
            # Create gRPC client for communicating with the server
            triton_client = grpcclient.InferenceServerClient(url=FLAGS.url, verbose=FLAGS.verbose)
        else:
            # Create HTTP client for communicating with the server
            triton_client = httpclient.InferenceServerClient(url=FLAGS.url, verbose=FLAGS.verbose)
    except Exception as e:
        print("client creation failed: " + str(e))
        sys.exit(1)

    model_name = FLAGS.model_name

    # Make sure the model matches our requirements, and get some
    # properties of the model that we need for preprocessing
    try:
        model_metadata = triton_client.get_model_metadata(model_name=model_name)
    except InferenceServerException as e:
        print("failed to retrieve the metadata: " + str(e))
        sys.exit(1)

    try:
        model_config = triton_client.get_model_config(model_name=model_name)
    except InferenceServerException as e:
        print("failed to retrieve the config: " + str(e))
        sys.exit(1)

    if FLAGS.protocol.lower() == "grpc":
        input_metadata, output_metadata, batch_size = parse_model_grpc(model_metadata, model_config.config)
    else:
        input_metadata, output_metadata, batch_size = parse_model_http(model_metadata, model_config)

    in_x = np.stack([np.array([400]*batch_size, dtype=np.uint8)], axis=0)
    in_y = np.stack([np.array([640]*batch_size, dtype=np.uint8)], axis=0)

    # Set the input data
    inputs = []
    if FLAGS.protocol.lower() == "grpc":
        inputs.append(grpcclient.InferInput(input_metadata[0].name, in_x.shape, "UINT8"))
        inputs[0].set_data_from_numpy(in_x)
        inputs.append(grpcclient.InferInput(input_metadata[1].name, in_y.shape, "UINT8"))
        inputs[1].set_data_from_numpy(in_y)
    else:
        inputs.append(httpclient.InferInput(input_metadata[0]['name'], in_x.shape, "UINT8"))
        inputs[0].set_data_from_numpy(in_x)
        inputs.append(httpclient.InferInput(input_metadata[1]['name'], in_y.shape, "UINT8"))
        inputs[1].set_data_from_numpy(in_y)


    output_names = [ output.name if FLAGS.protocol.lower() == "grpc" else output['name'] for output in output_metadata ]

    outputs = []
    for output_name in output_names:
        if FLAGS.protocol.lower() == "grpc":
            outputs.append(grpcclient.InferRequestedOutput(output_name))
        else:
            outputs.append(httpclient.InferRequestedOutput(output_name))

#This is where the client fails due to the server crash
    # Send request
    result = triton_client.infer(model_name, inputs, outputs=outputs)

    output_x = result.as_numpy('DALI_OUTPUT_X')
    output_y = result.as_numpy('DALI_OUTPUT_Y')

    print("PASS")

When the client is in HTTP mode the client gives the following error:

**Exception has occurred: HTTPConnectionClosed**
connection closed.

When the client is in gRPC mode the client gives the following error:

**Exception has occurred: InferenceServerException**
[StatusCode.UNAVAILABLE] Socket closed

The server outputs the following (gRPC client) and than just quits:

I1231 11:50:04.685638 1 grpc_server.cc:270] Process for ModelMetadata, rpc_ok=1, 0 step START
I1231 11:50:04.685677 1 grpc_server.cc:225] Ready for RPC 'ModelMetadata', 1
I1231 11:50:04.685684 1 model_repository_manager.cc:496] GetInferenceBackend() 'dali_multi_input' version -1
I1231 11:50:04.685690 1 model_repository_manager.cc:452] VersionStates() 'dali_multi_input'
I1231 11:50:04.685756 1 grpc_server.cc:270] Process for ModelMetadata, rpc_ok=1, 0 step COMPLETE
I1231 11:50:04.685763 1 grpc_server.cc:408] Done for ModelMetadata, 0
I1231 11:50:04.686291 1 grpc_server.cc:270] Process for ModelConfig, rpc_ok=1, 0 step START
I1231 11:50:04.686317 1 grpc_server.cc:225] Ready for RPC 'ModelConfig', 1
I1231 11:50:04.686323 1 model_repository_manager.cc:496] GetInferenceBackend() 'dali_multi_input' version -1
I1231 11:50:04.689018 1 grpc_server.cc:270] Process for ModelConfig, rpc_ok=1, 0 step COMPLETE
I1231 11:50:04.689028 1 grpc_server.cc:408] Done for ModelConfig, 0
I1231 11:50:04.690189 1 grpc_server.cc:3089] Process for ModelInferHandler, rpc_ok=1, 1 step START
I1231 11:50:04.690210 1 grpc_server.cc:3082] New request handler for ModelInferHandler, 4
I1231 11:50:04.690218 1 model_repository_manager.cc:496] GetInferenceBackend() 'dali_multi_input' version -1
I1231 11:50:04.690227 1 model_repository_manager.cc:496] GetInferenceBackend() 'dali_multi_input' version -1
I1231 11:50:04.690246 1 infer_request.cc:502] prepared: [0x0x7fd11c0035a0] request id: , model: dali_multi_input, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fd11c003908] input: DALI_Y_INPUT, type: UINT8, original shape: [1,1], batch + shape: [1,1], shape: [1]
[0x0x7fd11c003828] input: DALI_X_INPUT, type: UINT8, original shape: [1,1], batch + shape: [1,1], shape: [1]
override inputs:
inputs:
[0x0x7fd11c003828] input: DALI_X_INPUT, type: UINT8, original shape: [1,1], batch + shape: [1,1], shape: [1]
[0x0x7fd11c003908] input: DALI_Y_INPUT, type: UINT8, original shape: [1,1], batch + shape: [1,1], shape: [1]
original requested outputs:
DALI_OUTPUT_X
DALI_OUTPUT_Y
requested outputs:
DALI_OUTPUT_X
DALI_OUTPUT_Y

The full TRITON server log: TRITON.log.

Expected behavior
Give the same output as the DALI script does, without crashing the server.

How to use multi-GPUs with dali backend?

When I use inception_pipeline.py to generate the dali model, parts code as following:

@dali.pipeline_def(batch_size=3, num_threads=1, device_id=0)
def pipe():
images = dali.fn.external_source(device="cpu", name="DALI_INPUT_0")
images = dali.fn.decoders.image(images, device="mixed", output_type=types.RGB)
images = dali.fn.resize(images, resize_x=299, resize_y=299)
images = dali.fn.crop_mirror_normalize(images,
dtype=types.FLOAT,
output_layout="HWC",
crop=(299, 299),
mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
std=[0.229 * 255, 0.224 * 255, 0.225 * 255])
return images

It specify the GPU ID as "0" by the parameter "device_id", so my question is: If I want to use dali backend with multi-GPUs, how should I do? Thanks.

data overrided problem with dali backend

I met the problem described in triton-inference-server/server#2616.
After swithing to nvcr.io/nvidia/tritonserver:21.02-py3 and following the Known limitations

Due to DALI limitations, you might observe unnaturally increased memory consumption when defining instance group for DALI model with higher count than 1. We suggest using default instance group for DALI model.

I find something strange about dali backend, the repeated face feature number increased as dali backend instance decreased, for examples:

python grpc client infer with single thread

1 dali backend instances
embedding_array: (11190, 512)
unmatch_num/total_num: 0/11190

python grpc client infer with thread pool (simulate high concurrency situation)

1 dali backend instances
embedding_array: (11190, 512)
unmatch_num/total_num: 7246/11190

2 dali backend instances
embedding_array: (11190, 512)
unmatch_num/total_num: 4040/11190

4 dali backend instances
embedding_array: (11190, 512)
unmatch_num/total_num: 1628/11190

8 dali backend instances
embedding_array: (11190, 512)
unmatch_num/total_num: 64/11190

so I wonder whether there is some kind of data overrided problem in dali backend ？

In the example examples/resnet50 trt, how to load multiple images of different sizes with client.py

client.py

    image_data = np.expand_dims(image_data, axis=0)
    inputs.append(tritongrpcclient.InferInput(input_name, image_data.shape, "UINT8"))
    outputs.append(tritongrpcclient.InferRequestedOutput(output_name))

i can not load multiple images of different sizes. image_data.shape is fixed.

how to use the numpy data in the DALI

I use your NVIDIA hardware decoder, nvdecode, to decode the video and generate data in numpy format. How to use dali's gpu accelerated pre-processing, I use Dali.fn.decoders. image, and the error is as follows: Error in thread 1: [/opt/dali/dali/operators/decoder/host/host_decoder.cc:30] Assert on "input.ndim() == 1" failed: Input must be 1D encoded jpeg string.
If I remove decoders and directly use dali.fn.resize(device='gpu'), an error will be reported. Data on cpu cannot be operated by gpu

what shoud i do, my data.shape is (1620, 1920)

Need to increase performance dali-backend

Hi,
I'm struggling to use dali-backend to increase inference performance.
Pure nvcr.io/nvidia/tritonserver:22.02-py3 was used for below tests under ubuntu 20.04 + Titan XP + nvidia driver 510.54.

First,

Previous version of preprocessor was built using onnx-runtime.
I am trying to change preprocessor of ensemble model on tritonserver using dali-backend.
The code below has been verified as equivalent with onnx-runtime.
But execution speed quite slower than preprocessor onnx-runtime, almost 3~4x slow.

@pipeline_def(batch_size=max_batch_size, num_threads=4, device_id=0)
def pipe():
    # input = fn.external_source(source="cpu", name="input_8n", parallel=True)
    input = fn.external_source(device="cpu", name="input_8n", dtype=types.UINT8)#, parallel=True)
    imgG = fn.decoders.image(input, output_type=types.DALIImageType.GRAY) 

    images_32f = fn.cast(imgG, dtype=types.DALIDataType.FLOAT) / 255.0

    input_A = fn.normalize(images_32f, mean=0.5, stddev=1/0.225)
    input_A = fn.expand_dims(input_A, axes=(0))

    image_B = fn.resize(images_32f, size=[640, 360], image_type=types.DALIImageType.GRAY)
    B_norm_c0 = fn.normalize(image_B, mean=0.485, stddev=0.229)
    B_norm_c1 = fn.normalize(image_B, mean=0.456, stddev=0.224)
    B_norm_c2 = fn.normalize(image_B, mean=0.406, stddev=0.225)
    image_B2 = fn.stack( B_norm_c0, B_norm_c1, B_norm_c2 )

    return image_B2, input_A

Q. Is there any tips to increase performance?

Second,

So I also tried to use GPU more aggressive using mixed argument at decoder.image as below:

input = fn.external_source(device="cpu", name="input_8n", dtype=types.UINT8)
imgG = fn.decoders.image(input, output_type=types.DALIImageType.GRAY, device="mixed") # HWC layout

But it caused library loading error when starting tritonserver docker.

NAVAILABLE: Unknown: DALI Backend error: [/opt/dali/dali/pipeline/pipeline.cc:276] Assert on "device_id_ != CPU_ONLY_DEVICE_ID || device == "cpu"" failed: Cannot add a mixed operator decoders__Image with a GPU output, device_id should not be CPU_ONLY_DEVICE_ID. |
Stacktrace (15 entries):
| [frame 0]: /opt/tritonserver/backends/dali/dali/
| [frame 4]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x3386b) [0x7fb95c2f686b]  
| [frame 5]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x2c635) [0x7fb95c2ef635]   
| [frame 6]: /opt/tritonserver/backends/dali/libtriton_dali.so(TRITONBACKEND_ModelInstanceInitialize+0x3f6) [0x7fb95c2ded26] 
| [frame 7]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x307dee) [0x7fba44fcedee] 
| [frame 8]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3093b3) [0x7fba44fd03b3] 
| [frame 9]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x301067) [0x7fba44fc8067]      
| [frame 10]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18a7ca) [0x7fba44e517ca] 
| [frame 11]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1979b1) [0x7fba44e5e9b1]   
| [frame 12]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4) [0x7fba4481cde4]  
| [frame 13]: /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7fba44c9a609] 
| [frame 14]: /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fba4450a293]

I tried to use sample code. But same errors are there.

import nvidia.dali as dali

 @dali.pipeline_def(batch_size=256, num_threads=4, device_id=0)
 def pipe():
     images = dali.fn.external_source(device="cpu", name="DALI_INPUT_0")
     images = dali.fn.image_decoder(images, device="mixed")
     images = dali.fn.resize(images, resize_x=224, resize_y=224)
     return images

 pipe().serialize(filename="/my/model/repository/path/dali/1/model.dali")

Q. Is there anything I missed using dali-backend with tritonserver for inference?

DALI backend not releasing device memory

Hello!

I was excited for the 16-bit TIFF decoding, but there is a bug regarding the DALI backend not releasing memory when model is unloaded. Even when you unload the DALI model and load it again, it will consume more memory than previously. Although it converges to a fixed number, it is very high - 7Gb for batch size 3.

In the image, you can see, that when a DALI model is loaded, the memory increases, which is fine. The dip is when the model is unloaded. It doesn't release the memory to the initial minimum. When loading the model again, the memory increases again. I've done this model "reloading" multiple times and after some time the memory growth stops.

DALI version: 1.22.0dev, but I think the problem persists with older versions too.

from nvidia.dali import pipeline_def
import nvidia.dali
import nvidia.dali.fn as fn
import nvidia.dali.types as types

def set_normalization_value(pixel_value):
    condition = pixel_value > 255.0
    neg_condition = condition ^ True
    return condition * 65535.0 + neg_condition * 255.0


@pipeline_def(
    batch_size=3,
    num_threads=1,
    device_id=0,
    output_dtype=[types.FLOAT],
    output_ndim=[4],  # Dimensions of image, not including batch dimension
)
def decode_pipeline():
    images = fn.external_source(device="cpu", name="input_0", dtype=types.UINT8, ndim=1)
    images_2 = fn.external_source(
        device="cpu", name="input_1", dtype=types.UINT8, ndim=1
    )
    images_3 = fn.external_source(
        device="cpu", name="input_2", dtype=types.UINT8, ndim=1
    )
    images = fn.experimental.decoders.image(
        images,
        device="mixed",
        dtype=types.UINT16,
    )
    images_2 = fn.experimental.decoders.image(
        images_2, device="mixed", dtype=types.UINT16
    )
    images_3 = fn.experimental.decoders.image(
        images_3, device="mixed", dtype=types.UINT16
    )

    images = fn.transpose(images, perm=[2, 0, 1])
    images_2 = fn.transpose(images_2, perm=[2, 0, 1])
    images_3 = fn.transpose(images_3, perm=[2, 0, 1])
    images = fn.cast([images], dtype=types.FLOAT)
    images_2 = fn.cast([images_2], dtype=types.FLOAT)
    images_3 = fn.cast([images_3], dtype=types.FLOAT)
    image_max_value = fn.reductions.max(images)
    normalization_value = set_normalization_value(image_max_value)
    images /= normalization_value
    images_2 /= normalization_value
    images_3 /= normalization_value
    images = fn.stack(*[images, images_2, images_3], axis=0)
    return images


pipe = decode_pipeline()
pipe.serialize(filename="model.dali")

This is how the pipeline was generated. I don't think the pipeline is the problem, rather the DALI backend not cleaning up the memory.

To reproduce this problem:

Launch Triton Inference Server with DALI nightly backend
Load DALI model explicitly
Make inference request to the said DALI model
Unload the model
Repeat steps 2.-5.

Is there a way to limit the memory growth (because 7Gb over the baseline is too much) or fix this issue?
I want to decode 3x5000x10000x3x2(size of uint16) images - it should be around 900Mb of pure data.

runtime parameters like resize shape from config.pbtxt

I want to write preprocessing pipeline that I intend to reuse with multiple yolo models having different input shapes.

Some would have 512x512 input while others might have 256x256, etc. Is there a way to keep this resize_x and resize_y parameters of dali.fn.resize configurable from the config.pbtxt ?

I tried digging through the examples and tried passing these to pipeline_def but it only expects the Pipeline initialisation kwargs. I know these kwargs can be passed from config.pbtxt.

I also saw that I can pass the resize_x and resize_y as in input and get them through fn.external_source but is there a way to pass these values directly in config.pbtxt instead of taking these from user input (since they are dependent on downstream model input size that the user might not have knowledge of). ?

Dali logs model configuration even with `--log-verbose "0"` flag

Launching the tritonserver w/ or w/out log-verbose doesn't seems to affect dali backend. It would be nice to have the same behaviour as the others!

Here is the output with --log-verbose "0";

I0528 18:03:34.171215 1 dali_backend.cc:316] TRITONBACKEND_ModelInitialize: dali (version 1)
I0528 18:03:34.171221 1 dali_backend.cc:328] Repository location: /tmp/folderAQX4gi
I0528 18:03:34.171224 1 dali_backend.cc:339] backend state is 'backend state'
I0528 18:03:34.171954 1 dali_backend.cc:126] Loading DALI pipeline from file /tmp/folderAQX4gi/1/model.dali
I0528 18:03:34.171996 1 dali_backend.cc:79] model configuration:
{
    "name": "dali",
    "platform": "",
    "backend": "dali",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 1,
    "input": [

While for other backends it looks like:

I0528 18:03:34.211270 1 model_repository_manager.cc:1043] loading: model:1
I0528 18:03:35.426644 1 libtorch.cc:981] TRITONBACKEND_ModelInitialize: model (version 1)
I0528 18:03:35.427339 1 libtorch.cc:1022] TRITONBACKEND_ModelInstanceInitialize: model_0 (device 0)
I0528 18:03:38.321389 1 model_repository_manager.cc:1210] successfully loaded 'model' version 1

Dali Output does not match name or order

Hi i'd like to report a bug regarding the order of the outputs don't seem to match when there are multiple outputs:

# Just a simple peek image shape and resize and normalize code
#  Did not do anything special 
pipeline.set_outputs(resized_images, shapes) # FP 32 and INT64

Here's an example of my config file for outputs:

output: [
    {
        name: "processed_image"
        dims: [256, 256, 3]
        data_type: TYPE_FP32 
    },
    {
        name: "image_shape"
        dims: [3]
        data_type: TYPE_INT64
    }
]
# Image shape becomes processed_image and vice versa

When i call with Triton for some weird reason the output of these 2 are flipped whether its in ensemble or even just calling via the client. Unless the problem is own my side? I've tried a lot of things but just can't seem to get it correct.

Error in thread 1: [/opt/dali/dali/operators/decoder/nvjpeg/nvjpeg_helper.h:167] [/opt/dali/dali/image/jpeg_mem.cc:441] Assert on "jpeg_finish_decompress(&cinfo)" failed

decode failed, how to catch the error so that the server will not crash?
The error message is as follows:

Error when executing Mixed operator ImageDecoder encountered:
Error in thread 1: [/opt/dali/dali/operators/decoder/nvjpeg/nvjpeg_helper.h:167] [/opt/dali/dali/image/jpeg_mem.cc:441] Assert on "jpeg_finish_decompress(&cinfo)" failed

docker build error

While I build the docker images with the dockerfile.release in docker repository,
there was an error like below.

#8 222.9 [ 65%] Building CXX object Source/CMakeFiles/CMakeLib.dir/cmInstallGenerator.cxx.o
#8 245.6 g++: fatal error: Killed signal terminated program cc1plus
#8 245.6 compilation terminated.
#8 245.6 make[2]: *** [Source/CMakeFiles/CMakeLib.dir/build.make:1266: Source/CMakeFiles/CMakeLib.dir/cmGeneratorTarget.cxx.o] Error 1
#8 245.6 make[2]: *** Waiting for unfinished jobs....
#8 251.4 make[1]: *** [CMakeFiles/Makefile2:2205: Source/CMakeFiles/CMakeLib.dir/all] Error 2
#8 251.4 make: *** [Makefile:183: all] Error 2
------
executor failed running [/bin/sh -c CMAKE_VERSION=3.17 &&     CMAKE_BUILD=3.17.4 &&     wget -nv https://cmake.org/files/v${CMAKE_VERSION}/cmake-${CMAKE_BUILD}.tar.gz &&     tar -xf cmake-${CMAKE_BUILD}.tar.gz &&     cd cmake-${CMAKE_BUILD} &&     ./bootstrap --parallel=$(grep ^processor /proc/cpuinfo | wc -l) -- -DCMAKE_USE_OPENSSL=OFF &&     make -j"$(grep ^processor /proc/cpuinfo | wc -l)" install &&     rm -rf /cmake-${CMAKE_BUILD}]: exit code: 2

Can you show me the solution that I could execute without cmake command,
or update the dockerfile?

how to generate inputs with different shape as batch to DALI_backend?

For example, the pipeline is as follows, the input is HWC,

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import nvidia.dali as dali
import nvidia.dali.fn as fn
import nvidia.dali.types as types

pipe = dali.pipeline.Pipeline(batch_size=32, num_threads=8)
with pipe:
    expect_output_size = (120., 120.)
    images = fn.external_source(device='cpu', name="IMAGE_ARRAY", layout='HWC')
    images = images.gpu()
    raw_shapes = fn.shapes(images, dtype=types.UINT32)
    # images = fn.image_decoder(images, device="mixed", output_type=types.RGB)
    images = fn.resize(images, mode='not_larger', max_size=expect_output_size, size=expect_output_size)
    resized_shapes = fn.shapes(images, dtype=types.UINT32)
    ratio = fn.cast(resized_shapes / raw_shapes, dtype=types.FLOAT)
    images = fn.pad(images, axis_names="HW", align=expect_output_size)
    images = fn.crop_mirror_normalize(images, crop=expect_output_size, output_layout='CHW')
    pipe.set_outputs(images, ratio, raw_shapes)

pipe.serialize(filename="1/model.dali")

The pbtxt is as follows:

backend: "dali"
max_batch_size: 32
input [
  {
    name: "IMAGE_ARRAY"
    data_type: TYPE_UINT8
    format: FORMAT_NHWC
    dims: [ -1, -1, 3]
  }
]
output [
   {
    name: "DALI_OUTPUT_0" # image_normalized
    data_type: TYPE_FP32
    dims: [3, 120, 120]
  },
  {
    name: "DALI_OUTPUT_1" # ratio
    data_type: TYPE_FP32
    dims: [3]
  },
  {
    name: "DALI_OUTPUT_2" # origin_shape
    data_type:TYPE_UINT32
    dims: [3]
  }
]
dynamic_batching {
  preferred_batch_size: [ 4, 8, 16, 32 ]
  max_queue_delay_microseconds: 1000
}

If the input images is of different shape, how to pad the input numpy array to same input?

Can dila be used to process the output of the network? Such as nms.

layout parameter to external_source causes assert error

I pass numpy arrays from my client code into Triton through a DALI pipeline. The following DALI pipeline worked in older versions of Triton/DALI but now throws an assertion error:

@pipeline_def(batch_size=256, num_threads=4, device_id=0)
def pipeline():
    images = fn.external_source(device="cpu", name="DALI_INPUT_0", layout="HWC")
    ...

When Triton starts I get the following error:

E1015 07:50:36.366937 1 dali_model_instance.cc:43] [/opt/dali/dali/pipeline/operator/builtin/external_source.h:567] Assert on "layout_ == batch.GetLayout()" failed: Expected data with layout: "HWC" and got: "".
Stacktrace (21 entries):
[frame 0]: /opt/tritonserver/backends/dali/dali/libdali.so(+0x869b2) [0x7efe0f3e89b2]
[frame 1]: /opt/tritonserver/backends/dali/dali/libdali.so(+0x1e8f98) [0x7efe0f54af98]
[frame 2]: /opt/tritonserver/backends/dali/dali/libdali.so(void dali::Pipeline::SetDataSourceHelper<dali::TensorList<dali::CPUBackend>, dali::CPUBackend>(std::string const&, dali::TensorList<dali::CPUBackend> const&, dali::OperatorBase*, dali::AccessOrder, dali::ExtSrcSettingMode)+0x94) [0x7efe0f5554a4]
[frame 3]: /opt/tritonserver/backends/dali/dali/libdali.so(void dali::Pipeline::SetExternalInputHelper<dali::TensorList<dali::CPUBackend> >(std::string const&, dali::TensorList<dali::CPUBackend> const&, dali::AccessOrder, dali::ExtSrcSettingMode)+0x107) [0x7efe0f55ab97]
[frame 4]: /opt/tritonserver/backends/dali/dali/libdali.so(daliSetExternalInputAsync+0xe8f) [0x7efe0f54705f]
[frame 5]: /opt/tritonserver/backends/dali/dali/libdali.so(daliSetExternalInput+0x1d) [0x7efe0f54764d]
[frame 6]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x3df29) [0x7efe684c5f29]
[frame 7]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x3df8c) [0x7efe684c5f8c]
[frame 8]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x3e202) [0x7efe684c6202]
[frame 9]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x3bbe0) [0x7efe684c3be0]
[frame 10]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x3becc) [0x7efe684c3ecc]
[frame 11]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x2ad05) [0x7efe684b2d05]
[frame 12]: /opt/tritonserver/backends/dali/libtriton_dali.so(+0x2b4a6) [0x7efe684b34a6]
[frame 13]: /opt/tritonserver/backends/dali/libtriton_dali.so(TRITONBACKEND_ModelInstanceExecute+0x190) [0x7efe684a0e00]
[frame 14]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x11290a) [0x7efe7585990a]
[frame 15]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1130b7) [0x7efe7585a0b7]
[frame 16]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1e5541) [0x7efe7592c541]
[frame 17]: /opt/tritonserver/bin/../lib/libtritonserver.so(+0x10d287) [0x7efe75854287]
[frame 18]: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4) [0x7efe75394de4]
[frame 19]: /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7efe7670d609]
[frame 20]: /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7efe7507f133]

Currently I have to workaround this by setting the layout afterwards in a reinterpret:

@pipeline_def(batch_size=256, num_threads=4, device_id=0)
def pipeline():
    images = fn.external_source(device="cpu", name="DALI_INPUT_0")
    images = fn.reinterpret(images, layout="HWC")
    ...

The DALI pipeline was serialized using the NGC Pytorch container 22.09. And I'm trying to load it in Triton Server container 22.09. I am not sure what version I was using when this was working.

Is there any support for text preprocessing mainly for transformer models

Hi,

I have seen that dali already have prebuilt functions for image preprocessing like dali.fn.resize(images, resize_x=299, resize_y=299).
But does it provide any preprocessing functions for text as well like performing tokenization of text on the Triton server-side, typically for transformer models?

Thanks!

DALI on Jetson Devices

Hello everyone,

I have Jetson Nano and Xavier.

My purpose is that using DALI in Triton inference server pipeline as a preprocess model. I installed Triton Inference DALI backend, but when I run the example DALI script(https://github.com/triton-inference-server/dali_backend/tree/main/docs/examples/inception_ensemble), I got an error that 'Import error: No module named nvidia.dali' for import nvidia.dali as dali line.

How can I add DALI to my Triton server ensemble model pipeline as a preprocess model?

Thanks for any help

Restart DALI Backend when it Breaks

Hello!

When an image of invalid type pass though the DALI pipeline (eg gif), triton server breaks and outputs the following:
Current pipeline object is no longer valid and then the whole server has to be restarted. Even though we can filter the data before they go though, I would like to have the option either to not completely break the pipeline and continue to the next sample, or a way to automatically restart the pipeline.

Here is the code sample for my DALI pipeline (in case we can introduce some try/except blocks):

def setup_dali(
    input_name='DALI_INPUT_0',
    image_dim=[800, 1600],
    batch_size=1,
    num_threads=4,
    device='cpu',
    device_id=0,
    output_dir='./out/',
):

    pipeline = dali.pipeline.Pipeline(
        batch_size=batch_size,
        num_threads=num_threads,
        device_id=device_id
    )

    with pipeline:
        data = fn.external_source(name=input_name, device="cpu")
        # image preprocess
        images = fn.image_decoder(data, device=device)
        images = fn.resize(images, size=image_dim, mode="not_larger", max_size=image_dim)
        images = fn.pad(images, fill_value=0, shape=[image_dim[0], image_dim[1], 1])
        images = fn.transpose(images, perm=[2, 0, 1])
        images = fn.cast(images, dtype=dali.types.FLOAT)
        # input shape
        input_shape = np.float32((image_dim[0], image_dim[1], 1))
        # original shape
        shapes = fn.peek_image_shape(data)
        shapes = fn.cast(shapes, dtype=dali.types.FLOAT)
        # gather outputs
        out = [
            images,
            input_shape,
            shapes
        ]
        pipeline.set_outputs(*out)

    os.makedirs(os.path.dirname(output_dir), exist_ok=True)
    pipeline.serialize(filename=os.path.join(output_dir, 'model.dali'))

Thank you for your time!

example resnet50_trt doesn't work when following the steps of "Quick setup"

When I try to run the example of resnet50_trt following the steps of "https://github.com/triton-inference-server/dali_backend/tree/main/docs/examples/resnet50_trt", it reports the error:

root@d092a933a2c1:/workspace# python serialize_dali_pipeline.py --save ./model_repository/dali/1/model.dali
Traceback (most recent call last):
File "serialize_dali_pipeline.py", line 34, in
@dali.pipeline_def(batch_size=256, num_threads=4, device_id=0)
AttributeError: module 'nvidia.dali' has no attribute 'pipeline_def'

looks like some version issue? How to solve this?

Example of using DALI with binary data

Are there any samples of sending images as binary data to the inception_ensemble example without using Triton client and with raw HTTP requests using CURL or similar? Are there modifications that need to be made (like changing the input data type to string and the dimensions)?

DALI backend does not report statistics.

Low throughput with triton perf_client

When I test a simple dali pipeline, I found it shows lower throughtput and gpu utilization with triton than a simple python script.
it's about 1705fps down to 1100fps and triton runs with a utilization about 60%.
I don't know what make this difference.
2080Ti,CUDA11.2,tritonr20.12

TEST with python demo

from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types
from timeit import default_timer as timer

image_dir = "/test_imgs"

class HybridPipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(HybridPipeline, self).__init__(batch_size, num_threads, device_id, seed = 12)
        self.input = ops.FileReader(file_root=image_dir, random_shuffle=True, initial_fill=21)
        self.decode = ops.ImageDecoder(device='mixed', output_type=types.RGB)
        self.resize = ops.Resize(device='gpu',
                                 interp_type=types.INTERP_LINEAR,
                                 resize_x=512,
                                 resize_y=320)

    def define_graph(self):
        jpegs, labels = self.input()
        images = self.decode(jpegs)
        images = self.resize(images)
        # images are on the GPU
        return images


def speedtest(pipeclass, batch, n_threads):
    pipe = pipeclass(batch, n_threads, 0)
    pipe.build()
    # warmup
    for i in range(5):
        pipe.run()
    # test
    n_test = 200
    t_start = timer()
    for i in range(n_test):
        pipe.run()
    t = timer() - t_start
    print("Speed: {} imgs/s".format((n_test * batch)/t))


if __name__ == "__main__":
    test_batch_size = 64
    speedtest(HybridPipeline, test_batch_size, 4)

got result:

read 17365 files from 1 directories
Speed: 1757.3024810392662 imgs/s

TEST with triton perf_client, I set a large request-rate-range aim to acces the limit

import nvidia.dali as dali
import nvidia.dali.types as types
import argparse

INPUT_HEIGHT = 512
INPUT_WIDTH = 320

def main(filename):
    pipe = dali.pipeline.Pipeline(batch_size=64, num_threads=10, device_id=0)
    with pipe:
        images = dali.fn.external_source(device="cpu", name="DALI_INPUT_0")
        images = dali.fn.image_decoder(images, device="mixed", output_type=types.BGR)
        images = dali.fn.resize(images, size=(INPUT_HEIGHT, INPUT_WIDTH))
        pipe.set_outputs(images)
        pipe.serialize(filename=filename)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Serialize pipeline and save it to file")
    parser.add_argument('file_path', type=str, help='Path, where to save serialized pipeline')
    args = parser.parse_args()
    main(args.file_path)

result:

./perf_client -v -i grpc -m dali_ctdet  --request-rate-range 50  -b 64 -p 5000 -u localhost:8001 --input-data /json_data/test_jpg.json

 Successfully read data for 1 stream/streams with 256 step/steps.
*** Measurement Settings ***
  Batch size: 64
  Measurement window: 5000 msec
  Using uniform distribution on request generation
  Using synchronous calls for inference
  Stabilizing using average latency

Request Rate: 50 inference requests per seconds
  Pass [1] throughput: 1100.8 infer/sec. Avg latency: 250086 usec (std 118550 usec)
  Pass [2] throughput: 1100.8 infer/sec. Avg latency: 233251 usec (std 10702 usec)
  Pass [3] throughput: 1100.8 infer/sec. Avg latency: 234106 usec (std 10974 usec)
  Client: 
    Request count: 86
    Delayed Request Count: 86
    Throughput: 1100.8 infer/sec
    Avg latency: 234106 usec (standard deviation 10974 usec)
    p50 latency: 233919 usec
    p90 latency: 250165 usec
    p95 latency: 253393 usec
    p99 latency: 260765 usec
    Avg gRPC time: 232563 usec (marshal 710 usec + response wait 231850 usec + unmarshal 3 usec)
  Server: 
    Inference count: 6592
    Execution count: 103
    Successful request count: 103
    Avg request latency: 188862 usec (overhead 4 usec + queue 143417 usec + compute input 6 usec + compute infer 38842 usec + compute output 6593 usec)

Inferences/Second vs. Client Average Batch Latency
Request Rate: 50, throughput: 1100.8 infer/sec, latency 234106 usec

dali backend device parameter setting question

The dali.py file content is as below:

import nvidia.dali as dali
from nvidia.dali.plugin.triton import autoserialize
import nvidia.dali.types as types

@autoserialize
@dali.pipeline_def(batch_size=1, num_threads=1, device_id=0)
def pipe():
    images = dali.fn.external_source(device="cpu", name="INPUT_0")
    shape_list = dali.fn.external_source(device="cpu", name="INPUT_1")
    images = dali.fn.decoders.image(device="mixed", images, device="mixed", output_type=types.RGB) # The output of the decoder is in HWC layout.
    images_converted = dali.fn.color_space_conversion(device="gpu", images, image_type=types.RGB, output_type=types.BGR)
    images = dali.fn.resize(device="gpu", images_converted, resize_y=shape_list[0, 2]*shape_list[0, 0], resize_x=shape_list[0, 3]*
                            shape_list[0, 1])
    images = dali.fn.crop_mirror_normalize(device="gpu", images,
                                           dtype=types.FLOAT,
                                           output_layout="CHW",
                                           scale=1.0/255,
                                           mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
                                           std=[0.229, 0.224, 0.225])

    return images, shape_list

A peculiar circumstance I found is that if I donot set the device parameter for the color_space_conversion, resize and crop_mirror_normalize operator, the latency will boost to 90ms(comparing to 40ms when explicitly setting the device parameter to 'gpu'). I assumed that if the device parameter is not set, the default gpu to gpu behavior will be selected as the input of the three operators are all in gpu memory, but the program running result reveals that my assumption may be wrong. I am wondering why does this happen?

	output_shape.data(), output_shape.size()));
	void* buffer;
	TRITONSERVER_MemoryType memtype = TRITONSERVER_MEMORY_CPU_PINNED;
	int64_t memid = 0;
	auto buffer_byte_size = std::accumulate(
	output_shape.begin(), output_shape.end(), 1,

triton-inference-server / dali_backend Goto Github PK

dali_backend's People

Contributors

Stargazers

Watchers

Forkers

dali_backend's Issues

Repro:

Other things I've tried

Theories

Versions

First,

Second,

Recommend Projects

Recommend Topics

Recommend Org