Giter Club home page Giter Club logo

Comments (11)

tmpuserx avatar tmpuserx commented on August 28, 2024 7

Finally i am able to import tensorflow and torch, and both able to detect my RX580 as GPU. I ran benchmark to test the TF, it seems a bit slow than the one I ran with ROCm 3.5.1. Not sure if there will be other issue occur when running TF or torch code, but here I share my steps of setting up the environment hopefully it will help someone.

I reimage my OS with Ubuntu 20.04 LTS and did following steps for the setup:-

sudo apt-get update
mkdir rocm5.3
cd rocm5.3/
wget https://repo.radeon.com/amdgpu-install/5.3/ubuntu/focal/amdgpu-install_5.3.50300-1_all.deb
sudo apt-get install ./amdgpu-install_5.3.50300-1_all.deb
amdgpu-install --usecase=rocm,hip,rocmdevtools,opencl,hiplibsdk,mllib,mlsdk --no-dkms
sudo usermod -a -G video $LOGNAME
sudo usermod -a -G render $LOGNAME
sudo reboot

sudo apt install python3-pip
cd rocm5.3
wget https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm530/hsa-rocr_1.7.0.50300-63.20.04_amd64.deb
wget https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm530/rocblas_2.45.0.50300-63.20.04_amd64.deb
wget https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm500/torch-1.11.0a0+git503a092-cp38-cp38-linux_x86_64.whl
wget https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm500/torchvision-0.12.0a0+2662797-cp38-cp38-linux_x86_64.whl
wget https://github.com/xuhuisheng/rocm-gfx803/releases/download/rocm500/tensorflow_rocm-2.8.0-cp38-cp38-linux_x86_64.whl

sudo dpkg -i hsa-rocr_1.7.0.50300-63.20.04_amd64.deb
sudo dpkg -i rocblas_2.45.0.50300-63.20.04_amd64.deb
pip3 install torch-1.11.0a0+git503a092-cp38-cp38-linux_x86_64.whl
pip3 install torchvision-0.12.0a0+2662797-cp38-cp38-linux_x86_64.whl
pip3 install tensorflow_rocm-2.8.0-cp38-cp38-linux_x86_64.whl

sudo apt install miopen-hip miopengemm libopenblas-dev hipfft rocrand hipsparse rocfft libopenmpi3

pip3 uninstall protobuf
pip3 install protobuf==3.19.0

sudo ln -s /opt/rocm-5.3.0/lib/libroctx64.so.4.1.0 /opt/rocm-5.3.0/lib/libroctx64.so.1
sudo ln -s /opt/rocm-5.3.0/lib/libroctracer64.so.4.1.0 /opt/rocm-5.3.0/lib/libroctracer64.so.1

export LD_LIBRARY_PATH=/opt/rocm-5.3.0/lib/

Actually i added LD_LIBRARY_PATH=/opt/rocm-5.3.0/lib/ to my /etc/environment file so that i dont need to run it every time when i reboot my system.

Thanks for @xuhuisheng provides the patches!

from rocm-gfx803.

xuhuisheng avatar xuhuisheng commented on August 28, 2024 2

The libroctx64.so and libroctracer64.so had renamed their name. We could create symbolic link for them.

sudo ln -s /opt/rocm-5.3.0/lib/libroctx64.so.4.1.0 /opt/rocm-5.3.0/lib/libroctx64.so.1
sudo ln -s /opt/rocm-5.3.0/lib/libroctracer64.so.4.1.0 /opt/rocm-5.3.0/lib/libroctracer64.so.1

from rocm-gfx803.

preet avatar preet commented on August 28, 2024 1

Thanks for the reply. I got a bit further; After removing everything with amdgpu-uninstall, I installed specific packages without dkms:

amdgpu-install --usecase=rocm,hip,rocmdevtools --no-dkms

Now rocminfo and clinfo print reasonable results.

Then:

  • installed python3.8 since Ubuntu 22.04 has 3.10 and the *.whls are built against 3.8
  • installed rocblas and the packages from this repo/README.
    Ran into a missing lib error:
me@astra:~/Dev/scratch/install_amdgpu/xuhuisheng_patched$ python3.8
Python 3.8.15 (default, Oct 18 2022, 20:33:33) 
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/me/Dev/env/lib/python3.8/site-packages/torch/__init__.py", line 199, in <module>
    from torch._C import *  # noqa: F403
ImportError: libroctx64.so.1: cannot open shared object file: No such file or directory
>>> 

me@astra:~/Dev/env/lib$ pip3.8 list | grep torch
torch                        1.11.0a0+git503a092
torchvision                  0.12.0a0+266279

me@astra:~/Dev/env/lib$ echo $LD_LIBRARY_PATH
/opt/rocm-5.3.0/lib

me@astra:~/Dev/env/lib$ ls /opt/rocm/lib | grep libroctx
libroctx64.so
libroctx64.so.4
libroctx64.so.4.1.0

So that version of pytorch was built against a different rocm? Is there a workaround for this?

from rocm-gfx803.

preet avatar preet commented on August 28, 2024 1

Just to follow up, I think I was able to get pytorch working as well on Ubuntu 22.04. I followed basically all of the steps that @tmpuserx has summarized in their post, with the only difference being that I built+installed python3.8 manually along side the default 3.10 that ships with Ubuntu 22.04.

I ran the introductory pytorch mnist example which seemed to run fine. I used nvtop to verify the GPU was being used.

from rocm-gfx803.

xuhuisheng avatar xuhuisheng commented on August 28, 2024

Please dont install amdgpu-dkms right now, just install rocm-dev and rocm-libs with upstream amdgpu driver.

There is new bugs around rocm-5 driver and roct-thunk-interface, gfx803 will reports no-device. Older driver like upstream driver just fine.

I will update docs when have conclusion. before that we can play with upstream driver

from rocm-gfx803.

tmpuserx avatar tmpuserx commented on August 28, 2024

I got the same problem and the commands fix it. but now I am getting another error when try to import torch. any suggestion how to fix this? thanks.

import torch
Traceback (most recent call last):
File "", line 1, in
File "/home/mx/.local/lib/python3.8/site-packages/torch/init.py", line 199, in
from torch._C import * # noqa: F403
ImportError: libMIOpen.so.1: cannot open shared object file: No such file or directory

from rocm-gfx803.

tmpuserx avatar tmpuserx commented on August 28, 2024

OK. i am able to fix that libMIOpen.so.1 and several so file missing error by installing some packages. Now I am able to import torch and the torch.cuda.is_available() is able to return true now.

But I am getting another issue that tf.config.list_physical_devices('GPU') isn't able to return any GPU device. but rocminfo is able to find my RX580. any suggestion? thanks.


Agent 2


Name: gfx803
Uuid: GPU-XX
Marketing Name: Radeon RX 580 Series
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
Chip ID: 26591(0x67df)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1430
BDFID: 256
Internal Node ID: 1

from rocm-gfx803.

redthing1 avatar redthing1 commented on August 28, 2024

I had this problem too.

from rocm-gfx803.

xuhuisheng avatar xuhuisheng commented on August 28, 2024

@redthing1
The latest ROCm-5.4.1 should solve this issue, please have a try.

from rocm-gfx803.

redthing1 avatar redthing1 commented on August 28, 2024

@redthing1 The latest ROCm-5.4.1 should solve this issue, please have a try.

@xuhuisheng Thank you so much for your work here. It is absolutely invaluable and I am so grateful.

Now I'm able to run pytorch:

❯ python3 -c "import torch; print(torch.cuda.is_available())"
True

from rocm-gfx803.

xuhuisheng avatar xuhuisheng commented on August 28, 2024

@redthing1

Dont worry.

I just let we can play small samples like mnist on gfx803. More complex sample likes Diffusion stable always break gfx803.

from rocm-gfx803.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.