I have an ensemble model DALI+detect model, and I found the internal output of DALI is

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

sorry for my late reply. I use the version <div class="snippet-clipboard-conte

With triton ensemble model, how to make DALI internal output in GPU?,about triton-inference-server/dali_backend

Comments (5)

szalpal commented on May 30, 2024

thank you for pointing that out! Could you tell, which version of dali_backend are you running? #16 fixed issues with choosing the device, this PR will be available in tritonserver-21.02. In case you'd like to use the main branch of dali_backend, please follow docker build instructions.

If you are using main branch and you still observe CPU allocation of DALI output, could you run the server with --log_verbose=2 and check the instance groups in full DALI model configuration logged? Here's how it should look like:

I0208 23:00:49.566997 1 dali_backend.cc:71] Loading DALI pipeline from file /models/dali/1/model.dali
I0208 23:00:49.567063 1 dali_backend.cc:44] model configuration:
{
    "name": "dali",
    "platform": "",
    "backend": "dali",
[...]
   "instance_group": [
       {
           "name": "dali",
           "kind": "KIND_GPU",
           "count": 1,
            "gpus": [
                0
            ],
            "profile": []
        }
    ],
[...]
}

Anyway, your proposal to change the default requested memory type is good. It's available in #21

from dali_backend.

uefall commented on May 30, 2024

sorry for my late reply.
I use the version

commit 076e98841c976e0f6c55fc360431cfb5bfd7f485 (HEAD -> main, origin/r21.02)
Author: Michał <[email protected]>
Date:   Tue Jan 26 02:43:57 2021 +0100

the log-verbose2 shows

I0220 02:03:17.687831 1 dali_backend.cc:71] Loading DALI pipeline from file /models/dali_ctdet/1/model.dali
I0220 02:03:17.687940 1 dali_backend.cc:44] model configuration:
{
    "name": "dali_ctdet",
    "platform": "",
    "backend": "dali",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 128,
    "input": [
        {
            "name": "DALI_INPUT_0",
            "data_type": "TYPE_UINT8",
            "format": "FORMAT_NONE",
            "dims": [
                -1
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false
        }
    ],
    "output": [
        {
            "name": "DALI_OUTPUT_0",
            "data_type": "TYPE_FP32",
            "dims": [
                3,
                512,
                320
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "DALI_OUTPUT_1",
            "data_type": "TYPE_INT64",
            "dims": [
                3
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        }
    },
    "instance_group": [
        {
            "name": "dali_ctdet_0",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "profile": []
        }
    ],
    "default_model_filename": "",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": []
}
I0220 02:03:17.688125 1 dali_backend.cc:348] TRITONBACKEND_ModelInstanceInitialize: dali_ctdet_0 (GPU device 0)

and dali output is still memory type 1 and 2

I0220 02:08:46.983816 1 infer_response.cc:165] add response output: output: DALI_OUTPUT_0, type: FP32, shape: [64,3,512,320]
I0220 02:08:46.983851 1 pinned_memory_manager.cc:131] pinned memory allocation: size 125829120, addr 0x7feb2e000090
I0220 02:08:46.983855 1 ensemble_scheduler.cc:509] Internal response allocation: DALI_OUTPUT_0, size 125829120, addr 0x7feb2e000090, memory type 1, type id 0
I0220 02:08:46.983860 1 infer_response.cc:165] add response output: output: DALI_OUTPUT_1, type: INT64, shape: [64,3]
I0220 02:08:46.983864 1 ensemble_scheduler.cc:509] Internal response allocation: DALI_OUTPUT_1, size 1536, addr 0x7feb2a000000, memory type 2, type id 0
I0220 02:08:46.983931 1 ensemble_scheduler.cc:524] Internal response release: size 125829120, addr 0x7feb2e000090
I0220 02:08:46.983936 1 ensemble_scheduler.cc:524] Internal response release: size 1536, addr 0x7feb2a000000

I will update to the latest and try again, thank you.

from dali_backend.

uefall commented on May 30, 2024

I update the code and test again, got same result,
DALI_OUTPUT_0 is decoded image data DALI_OUTPUT_1 is image shape,
why it made DALI_OUTPUT_1 to GPU and DALI_OUTPUT_0 still remain CPU?

I remove the image shape output and left only image data output, still memory type 1

I0220 06:30:07.825872 1 pinned_memory_manager.cc:131] pinned memory allocation: size 125829120, addr 0x7f664e000090
I0220 06:30:07.825877 1 ensemble_scheduler.cc:509] Internal response allocation: DALI_OUTPUT_0, size 125829120, addr 0x7f664e000090, memory type 1, type id 0
I0220 06:30:07.825922 1 ensemble_scheduler.cc:524] Internal response release: size 125829120, addr 0x7f664e000090

from dali_backend.

uefall commented on May 30, 2024

tested ok with the latest version, maybe I forgot to change the container last time.
I will close this issue.
@szalpal ,Thank you!

from dali_backend.

uefall commented on May 30, 2024

The problem is that I use a large batchsize to do perf test and the DALI_OUTPUT exceed the cuda memory limit

W0222 10:25:26.739351 1 memory.cc:135] Failed to allocate CUDA memory with byte size 251658240 on GPU 0: CNMEM_STATUS_OUT_OF_MEMORY, falling back to pinned system memory

this info shows only once and i missed it.

from dali_backend.

With triton ensemble model, how to make DALI internal output in GPU? about dali_backend HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent