hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Multiple Cuda Context and multi threading about insightface-rest HOT 4 CLOSED

bhargavravat commented on June 15, 2024

Multiple Cuda Context and multi threading

from insightface-rest.

Comments (4)

SthPhoenix commented on June 15, 2024

Hi! This is a bit out of scope of this API.
What you are looking for can be achieved by running multiple FastAPI workers by changing appropriate param in deploy_trt.sh to number of workers you GPU can afford. Than you can use rest API to process data in multiple threads.

from insightface-rest.

ThiagoMateo commented on June 15, 2024

hello @SthPhoenix i try to create multi thead with your tensorrt project by follow this thread.
but i got a problem when i stop threading.

PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

reproduce:
trt loader

class TrtModel(object):
    def __init__(self, model):
        self.cfx = None
        self.engine_file = model
        self.engine = None
        self.inputs = None
        self.outputs = None
        self.bindings = None
        self.stream = None
        self.context = None
        self.input_shapes = None
        self.out_shapes = None
        self.max_batch_size = 1

    def build(self):
        self.cfx = cuda.Device(0).make_context()
        with open(self.engine_file, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
            self.engine = runtime.deserialize_cuda_engine(f.read())
        self.inputs, self.outputs, self.bindings, self.stream, self.input_shapes, self.out_shapes, self.out_names, self.max_batch_size = allocate_buffers(
            self.engine)

        self.context = self.engine.create_execution_context()
        self.context.active_optimization_profile = 0

    def run(self, input, deflatten: bool = True, as_dict=False):
        threading.Thread.__init__(self)
        self.cfx.push()
        # lazy load implementation
        engine = self.engine
        bindings = self.bindings
        inputs = self.inputs
        outputs = self.outputs
        stream = self.stream
        context = self.context
        out_shapes = self.out_shapes
        out_names = self.out_names
        if engine is None:
            self.build()

        input = np.asarray(input)
        batch_size = input.shape[0]
        allocate_place = np.prod(input.shape)
        inputs[0].host[:allocate_place] = input.flatten(order='C').astype(np.float32)
        context.set_binding_shape(0, input.shape)
        trt_outputs = do_inference(
            context, bindings=bindings,
            inputs=inputs, outputs=outputs, stream=stream)
        #Reshape TRT outputs to original shape instead of flattened array
        if deflatten:
            trt_outputs = [output.reshape(shape) for output, shape in zip(trt_outputs, out_shapes)]
        if as_dict:
            return {name: trt_outputs[i] for i, name in enumerate(self.out_names)}
        # self.cfx.pop()
        return [trt_outputs[0][:batch_size]]

    def destroy(self):
        self.cfx.pop()
        del self.cfx
        del self.engine

kill thread code:

det_model = thread_buckets[thread_name]['det_model']
        rec_model = thread_buckets[thread_name]['rec_model']
        processor_thread = thread_buckets[thread_name]['processor_thread']
        processor_thread.kill()
        processor_thread.join()
        det_model.retina.model.rec_model.destroy()
        rec_model.rec_model.destroy()
        del det_model
        del rec_model
        del thread_buckets[thread_name]

from insightface-rest.

SthPhoenix commented on June 15, 2024

Hi, @ThiagoMateo ! I haven't checked my code in such scenario, as I said before it's a bit out of scope of this project. I might check it later, but can't give you any warranty right now.

You can check #18 for now, it seems to be connected to you problem, but I haven't figured out how to neglect GPU RAM overhead.

from insightface-rest.

SthPhoenix commented on June 15, 2024

Closing for now, since problem isn't related to current intended use cases.

from insightface-rest.

Multiple Cuda Context and multi threading about insightface-rest HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent