System Info An AMD Epyc system with 3 MI210. Quite a complex s

cc: <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

load_in_8bit hangs on ROCm about bitsandbytes HOT 3 OPEN

DavideRossi commented on June 26, 2024 1

load_in_8bit hangs on ROCm

from bitsandbytes.

Comments (3)

matthewdouglas commented on June 26, 2024 1

cc: @pnunna93

from bitsandbytes.

DavideRossi commented on June 26, 2024 1

Hi, I am sorry but I mistakenly deleted the container I was using for these tests. In any case I was able to trace the problem to accelerate when using device_map="auto". The same code using device_map="cuda:0" was not hanging. I'm now trying to replicate the whole process with a new container.
Now I'm having issues with 8 bits support, but I'm going to post into the #538 thread about that.

from bitsandbytes.

pnunna93 commented on June 26, 2024

Hi @DavideRossi, Could you please share python and hip traces for the script?

For python trace, you can add this before the line where script hangs. You can stop it once the stack trace doesn't change.
import faulthandler;faulthandler.dump_traceback_later(10, repeat=True)

Please run the script with AMD_LOG_LEVEL=3 for hip trace.

Please also share the torch version and your machine details, outputs of 'pip show torch' and 'rocminfo'.

from bitsandbytes.

load_in_8bit hangs on ROCm about bitsandbytes HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent