Comments (11)
Finally i am able to import tensorflow and torch, and both able to detect my RX580 as GPU. I ran benchmark to test the TF, it seems a bit slow than the one I ran with ROCm 3.5.1. Not sure if there will be other issue occur when running TF or torch code, but here I share my steps of setting up the environment hopefully it will help someone.
I reimage my OS with Ubuntu 20.04 LTS and did following steps for the setup:-
sudo apt-get update
mkdir rocm5.3
cd rocm5.3/
wget https://repo.radeon.com/amdgpu-install/5.3/ubuntu/focal/amdgpu-install_5.3.50300-1_all.deb
sudo apt-get install ./amdgpu-install_5.3.50300-1_all.deb
amdgpu-install --usecase=rocm,hip,rocmdevtools,opencl,hiplibsdk,mllib,mlsdk --no-dkms
sudo usermod -a -G video $LOGNAME
sudo usermod -a -G render $LOGNAME
sudo reboot
sudo apt install python3-pip
cd rocm5.3
wget https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm530/hsa-rocr_1.7.0.50300-63.20.04_amd64.deb
wget https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm530/rocblas_2.45.0.50300-63.20.04_amd64.deb
wget https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm500/torch-1.11.0a0+git503a092-cp38-cp38-linux_x86_64.whl
wget https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm500/torchvision-0.12.0a0+2662797-cp38-cp38-linux_x86_64.whl
wget https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm500/tensorflow_rocm-2.8.0-cp38-cp38-linux_x86_64.whl
sudo dpkg -i hsa-rocr_1.7.0.50300-63.20.04_amd64.deb
sudo dpkg -i rocblas_2.45.0.50300-63.20.04_amd64.deb
pip3 install torch-1.11.0a0+git503a092-cp38-cp38-linux_x86_64.whl
pip3 install torchvision-0.12.0a0+2662797-cp38-cp38-linux_x86_64.whl
pip3 install tensorflow_rocm-2.8.0-cp38-cp38-linux_x86_64.whl
sudo apt install miopen-hip miopengemm libopenblas-dev hipfft rocrand hipsparse rocfft libopenmpi3
pip3 uninstall protobuf
pip3 install protobuf==3.19.0
sudo ln -s /opt/rocm-5.3.0/lib/libroctx64.so.4.1.0 /opt/rocm-5.3.0/lib/libroctx64.so.1
sudo ln -s /opt/rocm-5.3.0/lib/libroctracer64.so.4.1.0 /opt/rocm-5.3.0/lib/libroctracer64.so.1
export LD_LIBRARY_PATH=/opt/rocm-5.3.0/lib/
Actually i added LD_LIBRARY_PATH=/opt/rocm-5.3.0/lib/ to my /etc/environment file so that i dont need to run it every time when i reboot my system.
Thanks for @xuhuisheng provides the patches!
from rocm-gfx803.
The libroctx64.so and libroctracer64.so had renamed their name. We could create symbolic link for them.
sudo ln -s /opt/rocm-5.3.0/lib/libroctx64.so.4.1.0 /opt/rocm-5.3.0/lib/libroctx64.so.1
sudo ln -s /opt/rocm-5.3.0/lib/libroctracer64.so.4.1.0 /opt/rocm-5.3.0/lib/libroctracer64.so.1
from rocm-gfx803.
Thanks for the reply. I got a bit further; After removing everything with amdgpu-uninstall, I installed specific packages without dkms:
amdgpu-install --usecase=rocm,hip,rocmdevtools --no-dkms
Now rocminfo and clinfo print reasonable results.
Then:
- installed python3.8 since Ubuntu 22.04 has 3.10 and the *.whls are built against 3.8
- installed rocblas and the packages from this repo/README.
Ran into a missing lib error:
me@astra:~/Dev/scratch/install_amdgpu/xuhuisheng_patched$ python3.8
Python 3.8.15 (default, Oct 18 2022, 20:33:33)
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/me/Dev/env/lib/python3.8/site-packages/torch/__init__.py", line 199, in <module>
from torch._C import * # noqa: F403
ImportError: libroctx64.so.1: cannot open shared object file: No such file or directory
>>>
me@astra:~/Dev/env/lib$ pip3.8 list | grep torch
torch 1.11.0a0+git503a092
torchvision 0.12.0a0+266279
me@astra:~/Dev/env/lib$ echo $LD_LIBRARY_PATH
/opt/rocm-5.3.0/lib
me@astra:~/Dev/env/lib$ ls /opt/rocm/lib | grep libroctx
libroctx64.so
libroctx64.so.4
libroctx64.so.4.1.0
So that version of pytorch was built against a different rocm? Is there a workaround for this?
from rocm-gfx803.
Just to follow up, I think I was able to get pytorch working as well on Ubuntu 22.04. I followed basically all of the steps that @tmpuserx has summarized in their post, with the only difference being that I built+installed python3.8 manually along side the default 3.10 that ships with Ubuntu 22.04.
I ran the introductory pytorch mnist example which seemed to run fine. I used nvtop to verify the GPU was being used.
from rocm-gfx803.
Please dont install amdgpu-dkms right now, just install rocm-dev and rocm-libs with upstream amdgpu driver.
There is new bugs around rocm-5 driver and roct-thunk-interface, gfx803 will reports no-device. Older driver like upstream driver just fine.
I will update docs when have conclusion. before that we can play with upstream driver
from rocm-gfx803.
I got the same problem and the commands fix it. but now I am getting another error when try to import torch. any suggestion how to fix this? thanks.
import torch
Traceback (most recent call last):
File "", line 1, in
File "/home/mx/.local/lib/python3.8/site-packages/torch/init.py", line 199, in
from torch._C import * # noqa: F403
ImportError: libMIOpen.so.1: cannot open shared object file: No such file or directory
from rocm-gfx803.
OK. i am able to fix that libMIOpen.so.1 and several so file missing error by installing some packages. Now I am able to import torch and the torch.cuda.is_available() is able to return true now.
But I am getting another issue that tf.config.list_physical_devices('GPU') isn't able to return any GPU device. but rocminfo is able to find my RX580. any suggestion? thanks.
Agent 2
Name: gfx803
Uuid: GPU-XX
Marketing Name: Radeon RX 580 Series
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
Chip ID: 26591(0x67df)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1430
BDFID: 256
Internal Node ID: 1
from rocm-gfx803.
I had this problem too.
from rocm-gfx803.
@redthing1
The latest ROCm-5.4.1 should solve this issue, please have a try.
from rocm-gfx803.
@redthing1 The latest ROCm-5.4.1 should solve this issue, please have a try.
@xuhuisheng Thank you so much for your work here. It is absolutely invaluable and I am so grateful.
Now I'm able to run pytorch:
❯ python3 -c "import torch; print(torch.cuda.is_available())"
True
from rocm-gfx803.
Dont worry.
I just let we can play small samples like mnist on gfx803. More complex sample likes Diffusion stable always break gfx803.
from rocm-gfx803.
Related Issues (20)
- Is there any instructions for build torchvision HOT 2
- How to build patched tensorflow package HOT 4
- Does this work for RX 550 and on arch linux ? HOT 8
- OSError: libmpi_cxx.so.40: cannot open shared object file: No such file or directory HOT 1
- Does ROCm support Polaris 21 Family ? HOT 5
- OSError: libc10_cuda.so: cannot open shared object file: No such file or directory HOT 7
- Could we update the Torch package here? HOT 7
- We need wheels for python 3.9 HOT 6
- Pytorch binaries not working on arch4edu ROCm HOT 6
- Possible to update PyTorch build to support Torch 1.13.1 Rocm5.2? HOT 25
- Update blender? HOT 10
- Strange issue, images generates flawlessly but... HOT 3
- Question: where is the source for tensorflow-rocm? HOT 5
- Pytorch2.0.1 Rocm5.5 support HOT 12
- unhandled SGPR spill to memory - Blender(HIP) HOT 4
- How to set AMD GPU targets when compiling tensorflow-rocm? HOT 1
- SD_WebUI_V1.6.0 does not support python3.8
- blender 4.0 update ? HOT 2
- Is it me or we got no image textures?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rocm-gfx803.