Comments (14)
Most likely on this system there is no support for CUDA 11. Build for Ubuntu 18.04 and CUDA 10.2 works. Segmentation fault occurs at program exiting and needs to be investigated further.
from gpu-post.
Try running nvidia-smi
- it returns the driver Cuda version and I believe it is 11 - latest Nvidia driver. Same system worked fine with earlier versions of the lib. These are the latest generation Nvidia GPUs and they should support Cuda 11. This is the output:
Mon May 24 16:06:40 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 40C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
| N/A 43C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
from gpu-post.
Looks like the 10GB boot-disk was running out of free space. Increasing disk size.
from gpu-post.
I'm tiding up the packages on this box and will try again.
from gpu-post.
I reinstalled drivers versions 460 (recommended for this gpu) and there are no more errors in nvidia-smi
and it shows cuda version 11.2.
Here's what I get - no cuda providers:
avive@gpu-post-miner-1:~$ cd latest
avive@gpu-post-miner-1:~/latest$ ls
libgpu-setup.so test_app
avive@gpu-post-miner-1:~/latest$ ls -la
total 14160
drwxrwxr-x 2 avive avive 4096 May 23 11:05 .
drwxr-xr-x 12 avive avive 4096 May 23 11:05 ..
-rwxrwxr-x 1 avive avive 14117488 May 23 11:04 libgpu-setup.so
-rwxrwxr-x 1 avive avive 364784 May 23 11:04 test_app
avive@gpu-post-miner-1:~/latest$ ./test_app -l
./test_app: symbol lookup error: ./test_app: undefined symbol: spacemesh_api_logging
avive@gpu-post-miner-1:~/latest$ export LD_LIBRARY_PATH=.
avive@gpu-post-miner-1:~/latest$ ./test_app -l
Available POST compute providers:
0: [CPU] CPU
Segmentation fault (core dumped)
avive@gpu-post-miner-1:~/latest$
from gpu-post.
On the same system - run make test
in /home/avive/pos-server. You can see that the older version of the lib used in the rust test code sees the 2 gpus just fine. So this is an issue with recent lib builds.
from gpu-post.
We need the lib working on ubuntu 20 and not only on 18.
from gpu-post.
Ubuntu 18.04 build works fine on ubuntu 20 except for the program termination problem.
from gpu-post.
I downgraded the CUDA version to 11.2. Now the cards are detected.
from gpu-post.
New library works okay after change to use Cuda 11.2 lib for ubuntu 20, however there's still a core dump in the test app. Is this a lib bug or a test app code issue? @AndrewAR2
avive@gpu-post-miner-1:~/test$ ./gpu-setup-test -l
Available POST compute providers:
0: [CUDA] Tesla T4
1: [CUDA] Tesla T4
2: [CPU] CPU
Segmentation fault (core dumped)
from gpu-post.
This error is due to an old version of libgpu-setup.so in /lib that does not match the current API.
from gpu-post.
We confirmed this is an issue with the latest lib on Cuda / Ubuntu 20.04 systems.
from gpu-post.
@AndrewAR2 is this issue fixed in v0.1.17?
from gpu-post.
Yes!
from gpu-post.
Related Issues (20)
- Initialization problems on Nvidia with CUDA HOT 1
- what's the relation with https://github.com/spacemeshos/post-rs?
- Support macOS-arm architecture HOT 1
- api_internal includes and FreeBSD 12
- Use a testing framework instead of self written test harness
- Use clang-format for consistent code formatting
- Fix gpu-post not returning a found PoW when both ComputeLeafs and ComputePow are set
- Add a linter to build pipeline
- labels-count test super slow on mac m1
- Fix CI for gpu-post
- Windows CI builds using MSVC, but cgo needs a build from mingw64
- The release job for gpu-post is not working as expected
- macOS build is missing libMoltenVK.dylib from archive
- Race condition in global `g_spacemesh_api_abort_flag`
- GPU not listed as option on M1 chip, only CPU
- GPU is not available on Ubuntu 22 HOT 2
- Rename CI from M1 to AppleSilicon or Macos-ARM64 HOT 1
- Make sure that we use SSE2 / AVX exentions for scrypt on CPU provider HOT 1
- After changing to 8192 iterations and label size to 16B Vulkan api seems to be malfunctioning HOT 3
- Seems to malfunction on A100-SXM4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpu-post.