Giter Club home page Giter Club logo

Comments (8)

antinucleon avatar antinucleon commented on August 16, 2024 2

I think it will be straightforward to add A10 support, only need to extend the profiling range a little bit. We will make A10 support this week.

from aitemplate.

antinucleon avatar antinucleon commented on August 16, 2024

I enabled A10 (by educated guess how it will output) in this PR: #18

I think to provide better performance on A10, it requires a little bit more extension.

from aitemplate.

Purvak-L avatar Purvak-L commented on August 16, 2024

+1 on @MatthieuTPHR's request. A10g are widely used for inference and are cheaper wrt to A100s. Can you elaborate a bit more on what extension?

from aitemplate.

antinucleon avatar antinucleon commented on August 16, 2024

I have enabled A10, because we don't have access to A10 I can't guess more on perf optimization for A10. Should be ok.

from aitemplate.

harishprabhala avatar harishprabhala commented on August 16, 2024

Hello, thanks for this great project.

Following this request it would be amazing to have support for NVidia's latest generation of inference GPUs: the A10g.

They are roughly 2-3x faster than the T4 and very cheap w.r.t A100s.

On another topic, if we wanted to add this support ourselves for this GPU type or any future GPU from NVidia what would be the process ?

Hey @MatthieuTPHR have you tried voltaml stable diffusion library? They support T4 and A10 acceleration and they claim to have the fastest inference speed for now.

from aitemplate.

MatthieuToulemont avatar MatthieuToulemont commented on August 16, 2024

Hello @harishprabhala looking at their metrics we get a faster inference using Tensor RT.

TensorRT recently added support for Flash Attention here.

With it we get 27 it/s on a A10G compared to the 17 it/s shown in VoltaML's github repo.

On the A100, we get 36 it/s with pytorch and xformers too. I haven't benchmarked the TRT model on the A100 yet though

from aitemplate.

harishprabhala avatar harishprabhala commented on August 16, 2024

Woah didn't know that. Will look into it. Thanks.

from aitemplate.

harishprabhala avatar harishprabhala commented on August 16, 2024

Hello @harishprabhala looking at their metrics we get a faster inference using Tensor RT.

TensorRT recently added support for Flash Attention here.

With it we get 27 it/s on a A10G compared to the 17 it/s shown in VoltaML's github repo.

On the A100, we get 36 it/s with pytorch and xformers too. I haven't benchmarked the TRT model on the A100 yet though

Looks like they don't support GPUs with SM<80. So, technically for NVIDIA T4, V100 etc voltaML could be the fastest

from aitemplate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.