client.py <div class="highlight highlight-source-python notranslate position-relat

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

In the example examples/resnet50 trt, how to load multiple images of different sizes with client.py,about triton-inference-server/dali_backend

Comments (4)

szalpal commented on May 30, 2024

Hello @Seraph1990 !

If I understood you correctly, you would like to send a batch of images to the server. Generally there are two possible approaches, that we advice:

Use dynamic_batching. To explain dynamic batching is out of scope of such short answer, I encourage you to read a very good tutorial about this in tritonserver repo. When using dynamic batching, the suggested solution is to send a single sample per request, therefore the code presented in the example you are referring to is sufficient.
Use static_batching. When you don't want to use dynamic batching, you need to put together a batch of images manually. This example shows what you need to do, please refer there. In this approach we recommend sending encoded JPEGs. You should put together a batch of encoded JPEGs - every sample will have different length, but all of them will be 1-dimensional. Simply pad all samples to the size of the longest buffer with zeros, e.g.

batch:
sample 1: [3 4 1 3 5 0 0 0]  <-- padded three zeros
sample 2: [3 5 6 2 1 3 5 6]
sample 3: [3 4 4 6 7 4 3 0]  <-- padded one zero

Please remember, that sending decoded images with static batching is generally bad idea and should be avoided. The reasons are: problems with different image sizes in a single batch (that’s not supported in Triton), not leveraging DALI’s decoding, enlarging network load and many more. As I said before, we recommend using encoded JPEGs with static batching.

Please refer also to similar questions: NVIDIA/DALI#3234. Should you have any more questions, don't hesitate to ask!

from dali_backend.

Seraph1990 commented on May 30, 2024

@szalpal If use dynamic_batching, the server itself is going to combine the samples sent into a batch. How did it do it？Is it similar to your pad method?
Another question: Can dynamic_batching and static_batching be used together? If not. In terms of performance, which method is recommended?
Thank you!

from dali_backend.

szalpal commented on May 30, 2024

@Seraph1990

If use dynamic_batching, the server itself is going to combine the samples sent into a batch. How did it do it？Is it similar to your pad method?

Yes, the server combines the batch. It's not really similar to pad method - server will just put together samples of the same shape into a batch (or the max delay time is exceeded).

Can dynamic_batching and static_batching be used together? If not. In terms of performance, which method is recommended?

They can't be used together. Unfortunately for the performance question, the answer is "it depends". It's hard to formulate a general rule, but you can start with "static_batching is better for optimizing throughput, while dynamic_batching - latency".

Everything depends on your use-case. E.g. if you design a conversational-AI system, the typical inference scenario would be to send chunks of audio data, where at the beginning of user's utterance there will be less data sent, so the server can start the processing while the user is still speaking. Therefore in this case dynamic batching is the way to go. However when you have a CCTV system, where you record users and identify them with face recognition, your bigger concern would be the throughput, not the latency. Therefore it would be better to use static batching.

It's nearly always a matter of empirical check, which way suits you most. This section in Triton docs shows a one of possible ways to optimize the dynamic vs static batching.

from dali_backend.

Seraph1990 commented on May 30, 2024

@szalpal Thank you very much ! In addition, based on static batching method, I found another way to load multiple encoded images: change the input to TYPE_STRING. Because the length of TYPE_STRING is not fixed.

from dali_backend.

In the example examples/resnet50 trt, how to load multiple images of different sizes with client.py about dali_backend HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent