Giter Club home page Giter Club logo

Comments (4)

SthPhoenix avatar SthPhoenix commented on June 15, 2024

Hi! This is a bit out of scope of this API.
What you are looking for can be achieved by running multiple FastAPI workers by changing appropriate param in deploy_trt.sh to number of workers you GPU can afford. Than you can use rest API to process data in multiple threads.

from insightface-rest.

ThiagoMateo avatar ThiagoMateo commented on June 15, 2024

hello @SthPhoenix i try to create multi thead with your tensorrt project by follow this thread.
but i got a problem when i stop threading.

PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

reproduce:
trt loader

class TrtModel(object):
    def __init__(self, model):
        self.cfx = None
        self.engine_file = model
        self.engine = None
        self.inputs = None
        self.outputs = None
        self.bindings = None
        self.stream = None
        self.context = None
        self.input_shapes = None
        self.out_shapes = None
        self.max_batch_size = 1

    def build(self):
        self.cfx = cuda.Device(0).make_context()
        with open(self.engine_file, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
            self.engine = runtime.deserialize_cuda_engine(f.read())
        self.inputs, self.outputs, self.bindings, self.stream, self.input_shapes, self.out_shapes, self.out_names, self.max_batch_size = allocate_buffers(
            self.engine)

        self.context = self.engine.create_execution_context()
        self.context.active_optimization_profile = 0

    def run(self, input, deflatten: bool = True, as_dict=False):
        threading.Thread.__init__(self)
        self.cfx.push()
        # lazy load implementation
        engine = self.engine
        bindings = self.bindings
        inputs = self.inputs
        outputs = self.outputs
        stream = self.stream
        context = self.context
        out_shapes = self.out_shapes
        out_names = self.out_names
        if engine is None:
            self.build()

        input = np.asarray(input)
        batch_size = input.shape[0]
        allocate_place = np.prod(input.shape)
        inputs[0].host[:allocate_place] = input.flatten(order='C').astype(np.float32)
        context.set_binding_shape(0, input.shape)
        trt_outputs = do_inference(
            context, bindings=bindings,
            inputs=inputs, outputs=outputs, stream=stream)
        #Reshape TRT outputs to original shape instead of flattened array
        if deflatten:
            trt_outputs = [output.reshape(shape) for output, shape in zip(trt_outputs, out_shapes)]
        if as_dict:
            return {name: trt_outputs[i] for i, name in enumerate(self.out_names)}
        # self.cfx.pop()
        return [trt_outputs[0][:batch_size]]

    def destroy(self):
        self.cfx.pop()
        del self.cfx
        del self.engine

kill thread code:

det_model = thread_buckets[thread_name]['det_model']
        rec_model = thread_buckets[thread_name]['rec_model']
        processor_thread = thread_buckets[thread_name]['processor_thread']
        processor_thread.kill()
        processor_thread.join()
        det_model.retina.model.rec_model.destroy()
        rec_model.rec_model.destroy()
        del det_model
        del rec_model
        del thread_buckets[thread_name]

from insightface-rest.

SthPhoenix avatar SthPhoenix commented on June 15, 2024

Hi, @ThiagoMateo ! I haven't checked my code in such scenario, as I said before it's a bit out of scope of this project. I might check it later, but can't give you any warranty right now.

You can check #18 for now, it seems to be connected to you problem, but I haven't figured out how to neglect GPU RAM overhead.

from insightface-rest.

SthPhoenix avatar SthPhoenix commented on June 15, 2024

Closing for now, since problem isn't related to current intended use cases.

from insightface-rest.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.