Comments (3)
cc: @pnunna93
from bitsandbytes.
Hi, I am sorry but I mistakenly deleted the container I was using for these tests. In any case I was able to trace the problem to accelerate when using device_map="auto"
. The same code using device_map="cuda:0"
was not hanging. I'm now trying to replicate the whole process with a new container.
Now I'm having issues with 8 bits support, but I'm going to post into the #538 thread about that.
from bitsandbytes.
Hi @DavideRossi, Could you please share python and hip traces for the script?
For python trace, you can add this before the line where script hangs. You can stop it once the stack trace doesn't change.
import faulthandler;faulthandler.dump_traceback_later(10, repeat=True)
Please run the script with AMD_LOG_LEVEL=3 for hip trace.
Please also share the torch version and your machine details, outputs of 'pip show torch' and 'rocminfo'.
from bitsandbytes.
Related Issues (20)
- Error in Windows HOT 2
- please provide python whel package in nvidia jetson agx orin (aarch64 + cuda) HOT 1
- Exact version match required between the system and PyTorch CUDA libraries for the compilation to succeed HOT 1
- Error invalid device ordinal at line 359 in file /opt/bitsandbytes/csrc/pythonInterface.c
- Request for AdamW8bit support on CPU (would help TorchTune) HOT 5
- 8bit CAME optimizer
- "Only Tensors of floating point and complex dtype can require gradients", on FSDP, Accelerate, quatization HOT 3
- Is it possible to enable fused op F.gemv_4bit in F.gemv_4bit backward? HOT 1
- OSError: libcusparse.so.11: cannot open shared object file: No such file or directory CUDA Setup failed despite CUDA being available.
- out kwarg in matmul_4bit() is not working HOT 3
- RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm) HOT 2
- NotImplementedError: igemmlt not available (probably built with NO_CUBLASLT) HOT 2
- ROCm and 8-bit quantization HOT 4
- "You have a version of `bitsandbytes` that is not compatible with 4bit inference and training" HOT 7
- need a nice installation HOT 6
- Fail to use zero_init to construct llama2 with deepspeed zero3 and bnb!
- Conda (forge) recipe
- LLama3-8B - FSDP + QLORA results in OOM with 4 A40's HOT 1
- AssertionError: Torch not compiled with CUDA enabled HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bitsandbytes.