Comments (8)
I think it will be straightforward to add A10 support, only need to extend the profiling range a little bit. We will make A10 support this week.
from aitemplate.
I enabled A10 (by educated guess how it will output) in this PR: #18
I think to provide better performance on A10, it requires a little bit more extension.
from aitemplate.
+1 on @MatthieuTPHR's request. A10g are widely used for inference and are cheaper wrt to A100s. Can you elaborate a bit more on what extension?
from aitemplate.
I have enabled A10, because we don't have access to A10 I can't guess more on perf optimization for A10. Should be ok.
from aitemplate.
Hello, thanks for this great project.
Following this request it would be amazing to have support for NVidia's latest generation of inference GPUs: the A10g.
They are roughly 2-3x faster than the T4 and very cheap w.r.t A100s.
On another topic, if we wanted to add this support ourselves for this GPU type or any future GPU from NVidia what would be the process ?
Hey @MatthieuTPHR have you tried voltaml stable diffusion library? They support T4 and A10 acceleration and they claim to have the fastest inference speed for now.
from aitemplate.
Hello @harishprabhala looking at their metrics we get a faster inference using Tensor RT.
TensorRT recently added support for Flash Attention here.
With it we get 27 it/s on a A10G compared to the 17 it/s shown in VoltaML's github repo.
On the A100, we get 36 it/s with pytorch and xformers too. I haven't benchmarked the TRT model on the A100 yet though
from aitemplate.
Woah didn't know that. Will look into it. Thanks.
from aitemplate.
Hello @harishprabhala looking at their metrics we get a faster inference using Tensor RT.
TensorRT recently added support for Flash Attention here.
With it we get 27 it/s on a A10G compared to the 17 it/s shown in VoltaML's github repo.
On the A100, we get 36 it/s with pytorch and xformers too. I haven't benchmarked the TRT model on the A100 yet though
Looks like they don't support GPUs with SM<80. So, technically for NVIDIA T4, V100 etc voltaML could be the fastest
from aitemplate.
Related Issues (20)
- Confused on the shape of input Tensor HOT 7
- AIT seems not able to achieve max along a given dim HOT 9
- complie controlnet error HOT 2
- Stable Diffusion (GLIGEN) Download Error HOT 4
- Building rocm docker image is very slow. Is there a prebuilt docker image? HOT 1
- error during inferencing: Error: Constant embeddings_token_embedding_weight was not set! Set the value with set_constant. HOT 2
- Failed to compile the controlnet: thepowefuldeez/sd21-controlnet-canny HOT 1
- gemm_gen_profiler() takes 3 positional arguments but 4 were given HOT 2
- <class 'src.pipeline_stable_diffusion_ait.StableDiffusionAITPipeline'> is incorrectly implemented. Expected {'feature_extractor', 'scheduler', 'tokenizer', 'text_encoder', 'safety_checker', 'unet', 'vae'} to be defined HOT 4
- `Fatal Python error: Floating point exception` with `run_with_tensors` HOT 3
- gcc: internal compiler error HOT 2
- `Unsupported workload for this conv2d specialization` when using dynamic shape together with permute HOT 6
- Does AIT handle if/else in forward function? HOT 1
- multi-gpu at runtime error HOT 4
- windows platform cannot link _binary_constants_bin_end and _binary_constants_bin_start HOT 5
- Model is successfully compiled, but OOM when loading
- Does Concatenate order matters? HOT 1
- Docker image fails to build due to python dependency issue HOT 2
- Got cutlass error: Error Internal at: 214 HOT 1
- model_interface.cu:231: Error: Constant pretrained_model_patch_embed_proj_weight was not set! Set the value with set_constant. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aitemplate.