Comments (7)
Why are we combining accelerate then? I'm not sure those two can exist at the same time. This feels like a
torchkeras
issue more than accelerate.
Thanks, I find the solution...This is a bug of Pytorch. In 'torch.utils.data.sampler', we must use "torch.randperm" method, which does not work in a multiprocess environment when passed a value greater than 2**15. So I reduce the size of my dataset, and it works!!!
from accelerate.
When I use debug mode of Pycharm, I find that the procedure will stuck in the dataloader loop.
from accelerate.
You may need to set the env variable inside the training function. Can you share your full notebook code?
from accelerate.
You may need to set the env variable inside the training function. Can you share your full notebook code?
Thank you for your prompt reply, it doesn't work as well. And the code above is all I have. I'm trying to standardize my code with torchkeras.
from accelerate.
Not sure if we support TorchKeras
models. Does isinstance(model, torch.nn.Module)
return True
?
from accelerate.
isinstance(model, torch.nn.Module)
yes, it returns Ture. And I think we can think of torchkeras as a normal training program.
from accelerate.
Why are we combining accelerate then? I'm not sure those two can exist at the same time. This feels like a torchkeras
issue more than accelerate.
from accelerate.
Related Issues (20)
- Multiple processes on CPUs
- 4-bit quantization cannot load weights to meta device for bias terms of the linear layer: NotImplementedError: Cannot copy out of meta tensor; no data!
- SageMaker config stopped working in Accelerate version 0.29+ HOT 1
- Does Accelerate support tensor parallelism?
- can't set `enable_cpu_affinity` through accelerate config in version 0.31.0.dev0
- PicklingError: Can't pickle <function Embedding.forward at XXXXXXX> it's not the same object as torch.nn.modules.sparse.Embedding.forward HOT 5
- Duplicate elements in `split_between_processes` HOT 4
- Performance on single GPU is much better than on Multi-GPUs HOT 3
- How to specify the backend of Trainer
- Cuda Out of memory while loading PEFT weights using accelerate on multi gpu
- Accelerate 0.30.0 Breaks FSDP QLora HOT 5
- Accelerate FSDP RuntimeError: Tensors of the same index must be on the same device and the same dtype
- minor typo in Accelerator docs
- accelerate test does not work when using 3 GPUs
- pydantic.errors.PydanticUserError: A non-annotated attribute was detected: `is_crew_class = True` HOT 1
- unexpected lr scheduler behavior when using accelerate HOT 8
- Error while trying to fine-tune using FSDP on a TPU HOT 2
- Unable to load mistralai/Mixtral-8x7B-Instruct-v0.1 using mps HOT 5
- Unable to launch DeepSpeed multinode training with a heterogenous mix of # devices per node. HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from accelerate.