Comments (5)
@Lissanro wouldn't it be killed by pcie latency?
from segmoe.
I think PCI-E latency is only relevant during training (not to mention it could be quite good if PCI-E 4.0 or PCI-E 5.0 with sufficient number of lanes is used, or NVLink in case of a pair 3090 cards).
For inference, PCI-E latency should not matter much, it is just independent experts doing their job once their fully loaded to the VRAM. This is how for example running Mixtral (8x7B MoE) is possible at 4-bit or higher quantization with 24GB cards - since it cannot fit in 24GB of a single card, it gets split across more than 1 GPU, and speed is comparable to running on a single GPU.
Potentially, it could be even better if parallelism across multiple GPUs is implemented (for a case when one expert is fully allocated at one GPU, and another expert at different GPU, and the gate network decided it needs to use both). In any case, even naive sequential implementation (to process experts one-by-one even if they are on different GPUs) is still better than crashing with OOM, and in terms of speed should be at least comparable to running on a single GPU with the higher VRAM.
from segmoe.
Thanks for the suggestion, we are working on optimizing the memory usage, but feel free to create a PR for Multi-GPU usage.
from segmoe.
@Warlord-K Hi Admin, is there any possible that the homepage README file that tells the GPU needs or specifications?
from segmoe.
@g29times I have added the GPU requirements, thanks for the suggestion!
from segmoe.
Related Issues (20)
- Minor mistake in readme HOT 1
- Thank you! + model suggestion HOT 2
- TypeError: SparseMoeBlock.forward() missing 1 required positional argument: 'scale' HOT 5
- Issue with Civitai downloads HOT 2
- Any benefit to implementing this with lycoris/lora instead of full models? HOT 2
- Support local safetensors file HOT 1
- Support Colab and Local Storage HOT 1
- TypeError: no_grad.__init__() on import HOT 3
- positive and negative keywords in .yaml files HOT 2
- Is torch 2.0 mandatory? HOT 2
- [feature] Support StableDiffusionImg2ImgPipeline HOT 1
- Does this work for Stable Cascade? HOT 1
- 77 token limit HOT 5
- Why using negative prompt hidden states as gate weight? HOT 2
- Could you explain the effect of Pos and Neg prompts of each experts? HOT 2
- How to choose the positive prompt and negative prompt? HOT 2
- How to finetune the segmoe and train lora HOT 1
- Got noise image sample
- MoE in the attn heads
- Can you support SD3? HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from segmoe.