Giter Club home page Giter Club logo

Comments (14)

AndrewAR2 avatar AndrewAR2 commented on June 3, 2024

Most likely on this system there is no support for CUDA 11. Build for Ubuntu 18.04 and CUDA 10.2 works. Segmentation fault occurs at program exiting and needs to be investigated further.

from gpu-post.

avive avatar avive commented on June 3, 2024

Try running nvidia-smi - it returns the driver Cuda version and I believe it is 11 - latest Nvidia driver. Same system worked fine with earlier versions of the lib. These are the latest generation Nvidia GPUs and they should support Cuda 11. This is the output:

Mon May 24 16:06:40 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 40C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
| N/A 43C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

from gpu-post.

avive avatar avive commented on June 3, 2024

Looks like the 10GB boot-disk was running out of free space. Increasing disk size.

from gpu-post.

avive avatar avive commented on June 3, 2024

I'm tiding up the packages on this box and will try again.

from gpu-post.

avive avatar avive commented on June 3, 2024

I reinstalled drivers versions 460 (recommended for this gpu) and there are no more errors in nvidia-smi and it shows cuda version 11.2.

Here's what I get - no cuda providers:

avive@gpu-post-miner-1:~$ cd latest
avive@gpu-post-miner-1:~/latest$ ls
libgpu-setup.so  test_app
avive@gpu-post-miner-1:~/latest$ ls -la
total 14160
drwxrwxr-x  2 avive avive     4096 May 23 11:05 .
drwxr-xr-x 12 avive avive     4096 May 23 11:05 ..
-rwxrwxr-x  1 avive avive 14117488 May 23 11:04 libgpu-setup.so
-rwxrwxr-x  1 avive avive   364784 May 23 11:04 test_app
avive@gpu-post-miner-1:~/latest$ ./test_app -l
./test_app: symbol lookup error: ./test_app: undefined symbol: spacemesh_api_logging
avive@gpu-post-miner-1:~/latest$ export LD_LIBRARY_PATH=.
avive@gpu-post-miner-1:~/latest$ ./test_app -l
Available POST compute providers:
  0: [CPU] CPU
Segmentation fault (core dumped)
avive@gpu-post-miner-1:~/latest$ 

from gpu-post.

avive avatar avive commented on June 3, 2024

On the same system - run make test in /home/avive/pos-server. You can see that the older version of the lib used in the rust test code sees the 2 gpus just fine. So this is an issue with recent lib builds.

from gpu-post.

avive avatar avive commented on June 3, 2024

We need the lib working on ubuntu 20 and not only on 18.

from gpu-post.

AndrewAR2 avatar AndrewAR2 commented on June 3, 2024

Ubuntu 18.04 build works fine on ubuntu 20 except for the program termination problem.

from gpu-post.

AndrewAR2 avatar AndrewAR2 commented on June 3, 2024

I downgraded the CUDA version to 11.2. Now the cards are detected.

from gpu-post.

avive avatar avive commented on June 3, 2024

New library works okay after change to use Cuda 11.2 lib for ubuntu 20, however there's still a core dump in the test app. Is this a lib bug or a test app code issue? @AndrewAR2

avive@gpu-post-miner-1:~/test$ ./gpu-setup-test -l
Available POST compute providers:
  0: [CUDA] Tesla T4
  1: [CUDA] Tesla T4
  2: [CPU] CPU
Segmentation fault (core dumped)

from gpu-post.

AndrewAR2 avatar AndrewAR2 commented on June 3, 2024

This error is due to an old version of libgpu-setup.so in /lib that does not match the current API.

from gpu-post.

avive avatar avive commented on June 3, 2024

We confirmed this is an issue with the latest lib on Cuda / Ubuntu 20.04 systems.

from gpu-post.

avive avatar avive commented on June 3, 2024

@AndrewAR2 is this issue fixed in v0.1.17?

from gpu-post.

AndrewAR2 avatar AndrewAR2 commented on June 3, 2024

Yes!

from gpu-post.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.