Comments (4)
Sounds exciting! From my POV on the torch.compile team, I think we'd definitely be very interested in supporting custom passes within Inductor better. This has been something we've been particularly interested in recently, and I suspect that it should be possible (without that much work) to make your lives much easier :)
from vllm.
Hi, does it means (or something like) that we wont notice fused_add_rms_norm in model definition (just write rms_norm and residuals) and the graph optimization mechanism will recognize it and call the fused kernel (but not compile it directly into assembles)?
@gx16377 , we don't do this currently but it is something we are thinking about for future iterations of the optimizer.
from vllm.
Sounds exciting! From my POV on the torch.compile team, I think we'd definitely be very interested in supporting custom passes within Inductor better. This has been something we've been particularly interested in recently, and I suspect that it should be possible (without that much work) to make your lives much easier :)
That is great to hear. Ultimately we want this to integrate with Inductor as natively as possible. We'd appreciate whatever help we can get there, and better support for custom passes would be phenomenal.
from vllm.
Hi, does it means (or something like) that we wont notice fused_add_rms_norm in model definition (just write rms_norm and residuals) and the graph optimization mechanism will recognize it and call the fused kernel (but not compile it directly into assembles)?
from vllm.
Related Issues (20)
- [Usage]: Is there an option to reduce GPU memory usage? HOT 1
- [Bug]: Dockerfile Build breaks in local
- [Installation]: build docker images: Failed to build mamba-ssm
- [Bug]: pytest failure on kernels/test_cutlass.py test_cutlass_fp8_gemm
- [BUG] OpenAI server stalled after processing an embedding request while serving a chat model HOT 1
- [Model]: Does vllm currently support the Llama-3.1-405B-Instruct multimodal ? HOT 1
- [Bug]: Error in how HiddenStates are handled for speculative decoding
- [Usage]: Can the embedding model be deployed using the openai interface? HOT 2
- [Bug]: Docker.xpu build failed
- [Bug] : ROCM quantization check fail in version 0.5.4 for GPTQ and AWQ HOT 1
- [Bug]: error while attempting to bind on address ('0.0.0.0', 8000): address already in use HOT 3
- [Bug]: ImportError related to compressed tensors module HOT 1
- [Bug]: AutoAWQ marlin methods error HOT 5
- [Performance]: How to measure and understand the performance benefit of prefix caching HOT 2
- [Feature]: Context Parallelism HOT 1
- [Bug]: vllm serve downloads both consolidated.safetensors and shared safetensors, when it should only download one of them. HOT 3
- [Feature]: need a GB-based alternative for gpu_memory_utilization HOT 9
- [Installation]: vLLM KV Cache Initialization Error Despite Sufficient VRAM HOT 3
- [Tracking issue] [Help wanted]: Multi-step scheduling follow-ups HOT 1
- [Bug]: Mistral-Nemo-Instruct-2407 Model Error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vllm.