Giter Club home page Giter Club logo

gpu-post's Introduction

Warning

This repository is deprecated. Please refer to post-rs for the most recent implementation.

GPU Proof of Spacemesh Time Init (aka Smeshing Setup) Library

Build

Current functionality

A C library implementing the POST API setup method for general-purpose CPUs and for CUDA and Vulkan compute processors.

Runtime System Requirements

Windows 10/11, macOS or Linux. One or more of the following:

  • A GPU and drivers with CUDA support (minimum compute compatibility 5.0, maximum compute compatibility 9), such as a modern Nvidia GPU and Nvidia drivers version R525 or newer.
  • A GPU and drivers with Vulkan 1.3 support such as a modern AMD, Apple M1 processor, and Intel GPUs.
  • A x86-64 cpu such as AMD or Intel CPUs.
  • A ARM 64 bit cpu such as Apple Silicon or Ampere Altra
  • Both discrete and on-board GPUs are supported as long as they support the minimum CUDA or Vulkan runtime version.

We currently provide release binaries and build instructions for Windows, Mac and Ubuntu 22.04 but the library can be built on other Linux distros for usage on these systems.

GPU Memory Requirements

Minimum GPU RAM

  • 16 KiB per CUDA core for CUDA
  • 4 MiB per compute unit for Vulkan

Recommended GPU RAM

  • 2080 MiB

Runtime linux requirements

On Linux platforms with Hybrid Nvidia GPU setup please use Nvidia driver R525 or newer. Older ones are known to have compatibility issues. Non hybrid cards are confirmed to be working with R520 and older versions.


Build System Requirements

All Platforms

  • For building CUDA support: NVIDIA Cuda Toolkit 11, an NVIDIA GPU with CUDA support, and an Nvdia driver version R525 or newer.
    • If building on Linux you should refer to the distribution preferred method installation if available
  • For building Vulkan support: Vulkan SDK 1.3 and a GPU with Vulkan 1.3 runtime support.

Windows

  • Windows 10/11.
  • Microsoft Visual Studio 2022
  • You may also need to install specific versions of the Windows SDK when prompted when attempting to build the library for the first time.

Ubuntu

  • Ubuntu 22.04
  • Cmake, GCC 11+

Other linux distributions

  • Cmake, GCC 11+

macOS

  • Xcode
  • Xcode Command Line Dev Tools
  • Cmake, GCC 11+

macOS Dev Env Setup

  1. Install latest version of Xcode with the command line dev tools.
  2. Download the Vulkan 1.3 sdk installer for macOS from https://vulkan.lunarg.com/sdk/home#mac
  3. Install Vulkan SDK with the Vulkan installer.
  4. Change directory to the folder where the SDK is installed (default $ cd $HOME/VulkanSDK/1.3.xxx) and run the install script with $ sudo ./install_vulkan.py
  5. Add the Vulkan env vars to your .bash_profile file with the root location set to the sdk directory on your hard-drive. For example, if Vulkan sdk 1.2.154 is installed then the env vars should be set like this:
export VULKAN_SDK_VERSION="1.3.xxx"                   # Replace xxx with actual version
export VULKAN_ROOT_LOCATION="$HOME/VulkanSDK/1.3.xxx" # adapt to install location on your machine
export VULKAN_SDK="$VULKAN_ROOT_LOCATION/macOS"
export VK_ICD_FILENAMES="$VULKAN_SDK/share/vulkan/icd.d/MoltenVK_icd.json"
export VK_LAYER_PATH="$VULKAN_SDK/share/vulkan/explicit_layers.d"
export PATH="/usr/local/opt/python/libexec/bin:$VULKAN_SDK/bin:$PATH"
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$VULKAN_SDK/lib/"

Build Configuration

Default build configuration:

Windows and Linux

SPACEMESHCUDA   "Build with CUDA support"   default: ON
SPACEMESHVULKAN "Build with Vulkan support" default: ON

macOS Build Configuration

SPACEMESHCUDA   "Build with CUDA support"   default: OFF
SPACEMESHVULKAN "Build with Vulkan support" default: ON

Building

To build the library with full support for both CUDA and Vulkan on Windows or on Linux use a system with an Nvidia GPU and drivers. Otherwise, turn off CUDA support and build for Vulkan only. Building on macOS only supports Vulkan.

Building on Windows

  1. Open project folder into Visual Studio 2022: File -> Open -> Folder.
  2. Set x64-Release Project Settings.
  3. Build: CMake -> Rebuild All.
  4. Run test: CMake -> Debug from Build Folder -> gpu-setup-test.exe

Ubuntu or macOS

If using VULKAN, make sure to clone the zlib submodule:

git submodule update --init

Configure your build using the default configuration:

cmake -B build

To disable CUDA use:

cmake -B build -DSPACEMESHCUDA=OFF

To disable VULKAN use:

cmake -B build -DSPACEMESHVULKAN=OFF

Build the project:

cmake --build build

Run the tests:

./build/test/gpu-setup-test -t
./build/test/gpu-setup-test -u
./build/test/gpu-setup-test -b

Running the Test App

macOS Configuration

  1. Since the test app is not notarized, you may need to enable it via spctl --add /path/to/gpu-setup-test or by right-click-open it and click open.
  2. Set execute permissions if not already set, e.g., chmod a+x gpu-setup-test
  3. Add the test app's path to the dynamic lib search path, e.g., export DYLD_LIBRARY_PATH=.

Linux Configuration

  1. Set execute permissions if not already set, e.g., chmod a+x gpu-setup-test
  2. Add the test app's path to the dynamic lib search path, e.g., export LD_LIBRARY_PATH=.

Run from the console to print usage:

$ gpu-setup-test
Usage:
--list               or -l                 print available providers
--benchmark          or -b                 run benchmark
--core               or -c                 test the core library use case
--test               or -t                 run basic test
--test-vector-check                        run a CPU test and compare with test-vector
--test-pow           or -tp                test pow computation
--test-leafs-pow     or -tlp               test pow computation while computing leafs
--unit-tests         or -u                 run unit tests
--integration-tests  or -i                 run integration tests
--label-size         or -s <1-256>         set label size [1-256]
--labels-count       or -n <1-32M>         set labels count [up to 32M]
--reference-provider or -r <id>            the result of this provider will be used as a reference [default - CPU]
--print              or -p                 print detailed data comparison report for incorrect results
--pow-diff           or -d <0-256>         count of leading zero bits in target D value [default - 16]
--srand-seed         or -ss <unsigned int> set srand seed value for POW test: 0 - use zero id/seed [default], -1 - use random value
--solution-idx       or -si <unsigned int> set solution index for POW test: index will be compared to be the found solution for Pow [default - unset]

Mixing CUDA and Vulkan

By default, the library does not detect supported Vulkan GPUs if CUDA GPUs are detected. This behavior can be changed using two environment variables:

SPACEMESH_DUAL_ENABLED
 empty or 0 - default behavior
 1 - detect Vulkan GPUs even if CUDA GPUs are detected
SPACEMESH_PROVIDERS_DISABLED
 empty - default behavior
 "cuda" - do not detect CUDA GPUs
 "vulkan" - do not detect Vulkan GPUs

Runtime Providers Recommendations

The library supports multiple compute providers at runtime. For best performance, use the following providers based on your OS and GPU:

OS / GPU Windows Linux macOS
Nvidia CUDA CUDA Vulkan
AMD Vulkan Vulkan Vulkan
Intel Vulkan Vulkan Vulkan
Apple M1 Vulkan Vulkan Vulkan

API

Compute leaves and/or pow solution:

int scryptPositions(
   uint32_t provider_id,      // POST compute provider ID
   const uint8_t *id,         // 32 bytes
   uint64_t start_position,   // e.g. 0
   uint64_t end_position,     // e.g. 49,999
   uint32_t hash_len_bits,    // (1...256) for each hash output, the number of prefix bits (not bytes) to copy into the buffer
   const uint8_t *salt,       // 32 bytes
   uint32_t options,          // compute leafs and/or compute pow
   uint8_t *out,              // memory buffer large enough to include hash_len_bits * number of requested hashes
   uint32_t N,                // scrypt N
   uint32_t R,                // scrypt r
   uint32_t P,                // scrypt p
   uint8_t *D,                // Target D for the POW computation. 256 bits.
   uint64_t *idx_solution,    // index of output where output < D if POW compute was on. MAX_UINT64 otherwise.
   uint64_t *hashes_computed, // The number of hashes computed, should be equal to the number of requested hashes.
   uint64_t *hashes_per_sec   // Performance
);

Supported scrypt parameters

The api currently only supports the following N, P, R scrypt params.

  • Supported N values: 1 - 28835
  • Supported R values: 1
  • Supported P values: 1

Gets the system's GPU capabilities. E.g. CUDA and/or NVIDIA or NONE:

int stats();

Stops all GPU work and don’t fill the passed-in buffer with any more results:

int stop(
 uint32_t ms_timeout   // timeout in milliseconds
);

Returns non-zero if stop in progress:

SPACEMESHAPI int spacemesh_api_stop_inprogress();

Returns POS compute providers info:

SPACEMESHAPI int spacemesh_api_get_providers(
 PostComputeProvider *providers, // out providers info buffer, if NULL - returns count of available providers
 int max_providers// buffer size
);

Linking

  1. Download release artifacts from a github release in this repo for your platform or build the artifacts from source code.
  2. Copy all artifacts to your project resources directory. The files should be included in your app's runtime resources.
  3. Use api.h to link the library from your code.

Testing

Integration test of the basic library use case in a Spacemesh full node to generate proof of space and find a pow solution:

/build/test/.gpu-setup-test -c -n 100663296 -d 20

Community Benchmarks

Disclaimer: these are community submitted benchmarks which haven't been verified. Your milage may vary. The library is also likely to have bugs, is in alpha quality and the gpu-post algorithm is likely to change before the release of the Spacemesh 0.2 testnet.

gpu-setup-test -b -n 2000000
Date Reporter Release Compute Provider OS & CPU Type Driver mh/s
06/21/2021 Obsidian v0.1.20 Geforce RTX 2080ti 11GB @ stock (1350 mhz / 7000 mhz) Windows 10 Pro v20H2, Build 19042.985, Intel i7-6700K @ 4.6ghz (HT enabled: 4c/8t) CUDA NVIDIA 466.11 2.56
06/22/2021 Scerbera v0.1.20 Geforce RTX 2060 SUPER Windows 10 CUDA NVIDIA 466.11 1.7
06/22/2021 Scerbera v0.1.20 AMD Radeon Pro WX 7100 Windows 10 CUDA NVIDIA 466.11 0.88
06/22/2021 Scerbera v0.1.20 RX VEGA 64 - Core Clock 1500 MHz - Memory Clock 960MHz Intel i7-8700K Windows 10 Vulkan Pro 20.Q4 0.9
06/22/2021 Scerbera v0.1.20 WX7100 - Core Clock 1250MHz - Memory Clock 1700 MHz Intel i7-8700K Windows 10 Vulkan Pro 20.Q4 0.87
06/28/2021 cmoetzing v0.1.20 MSI GeForce RTX 2060 VENTUS GP OC - Core Clock 1365MHz - Memory Clock 1750 MHz Ubuntu 20.04 Core i5-11600k CUDA NVIDIA 465.19.01 1.36
06/29/2021 avive v0.1.21 GeForce RTX 3090 Ubuntu 20.04 CUDA Nvidia 460.80 4.97
06/29/2021 avive v0.1.21 GeForce RTX 3080 Ubuntu 20.04 CUDA Nvidia 460.80 4.08
06/30/2021 shanyaa v0.1.21 GeForce RTX 3070 @ 1.9 Ghz core, 6.8 Ghz mem Windows 10 / AMD Ryzen 5800X CUDA Nvidia 466.63 2.7
06/30/2021 shanyaa v0.1.21 GeForce RTX 3070 @ 2 Ghz core, 8.08 Ghz mem Windows 10 / AMD Ryzen 5800X CUDA Nvidia 466.63 3.43
07/01/2021 avive v0.1.21 Nvdia CMP 30HX Ubuntu 20.04.2 LTS CUDA Nvidia 460.80 1.45
07/01/2021 avive v0.1.21 GeForce RTX 2060 Ubuntu 20.04.2 LTS CUDA Nvidia 465.27 1.56
07/01/2021 shanyaa v0.1.21 Intel Iris Xe (integrated graphics) Windows 10 / Intel core i7 1165G7 Vulkan Intel 27.20.100.9565 0.28
07/03/2021 neodied v0.1.21 Radeon 5700XT @ 1333 MHz core, 1824 MHz mem Windows 10 / Intel core i7 9700K Vulkan AMD Radeon Software 21.6.1 1.38
07/03/2021 neodied v0.1.21 Radeon 5700XT @ 2016 MHz core, 1748 MHz mem Windows 10 / Intel core i7 9700K Vulkan AMD Radeon Software 21.6.1 1.87
12/21/2022 lane v0.1.28 Apple M1 (built-in, 8 cores, Metal 3) macOS 13.1 Vulkan N/A 0.15
01/27/2023 lane v0.1.28 Apple M2 Pro (built-in, 16 GPU cores, Metal 3) macOS 13.2 Vulkan N/A 0.56
01/27/2023 nj v0.1.28 Apple M2 Pro (built-in, 19 GPU cores, Metal 3) macOS 13.2 Vulkan N/A 0.57

Prerelease Benchmarks

Scrypt Benchmarks (n=512, r=1, p=1) 1 byte per leaf, batch size leaves per API call.

Date Reporter impl cpu / gpu Host OS notes kh/s mh/s x factor over 1 4ghz cpu native thread x factor over 12 4ghz cpu native threads
11/19/2019 ae go-scrypt mbp + Intel i9 @ 2.9ghz - 1 core OS X go scrypt crypto lib (not scrypt-jane) 7 0.01 1 1
11/19/2019 ae sm-scrypt Ryzen 5 2600x @ 4ghz - 1 core Windows 10 scrypt-jane c code 7 0.01 1 1
11/19/2019 ae sm-scrypt Nvidia Geforce RTX 2070 8GB Windows 10 pre-optimized prototype 1,920 1.92 290 24.17
11/19/2019 ae sm-scrypt AMD Radeon RX 580 Windows 10 pre-optimized prototype 500 0.50 76 6.29
11/19/2019 ar sm-scrypt Nvidia GTX 1060 6G Windows 10 pre-optimized prototype 979 0.98 148 12.32
11/19/2019 ar sm-scrypt AMD Radeon 570 4GB Windows 10 pre-optimized prototype 355 0.36 54 4.47
11/12/2019 ae sm-scrypt AMD Radeon RX 580 Windows 10 optimized prototype 926 0.93 140 11.65
11/12/2019 ae sm-scrypt AMD Radeon RX 580 Ubuntu 18.0.4.3 LTS optimized prototype 893 0.89 135 11.24
11/12/2019 ae sm-scrypt Nvidia Geforce RTX 2070 8GB Ubuntu 19.10 LTS optimized prototype 1,923 1.92 292 24.37
01/22/2020 seagiv sm-scrypt Nvidia GTX 1060 6G Windows 10 vulkan pre-optimized prototype 276
01/22/2020 seagiv sm-scrypt AMD Radeon 570 4GB Windows 10 vulkan pre-optimized prototype 269
01/27/2020 seagiv sm-scrypt Nvidia GTX 1060 6G Windows 10 vulkan optimized prototype 642
01/27/2020 seagiv sm-scrypt AMD Radeon 570 4GB Windows 10 vulkan optimized prototype 966
01/29/2020 seagiv sm-scrypt AMD Radeon Pro 555x 4GB macOS 10.14.6 vulkan optimized prototype 266
01/31/2020 avive sm-scrypt AMD Radeon Pro 560x 4GB macOS 10.14.6 vulkan optimized prototype 406
01/31/2020 avive sm-scrypt Intel(R) UHD Graphics 630 1536MB macOS 10.14.6 vulkan optimized prototype 53
05/06/2020 avive sm-scrypt AMD Radeon RX 580 Windows 10 vulkan optimized prototype 1,074 1.074
09/08/2020 avive sm-scrypt Nvidia Tesla V 100 (16GB) Ubuntu 20.04 NVIDIA-SMI 450.51.06 CUDA Version: 11.0 CUDA optimized prototype 4,166 4.166
09/08/2020 avive sm-scrypt Nvidia Tesla T4 (16GB) Ubuntu 20.04 NVIDIA-SMI 450.51.06 CUDA Version: 11.0 CUDA optimized prototype 1,252 1.252
09/08/2020 avive sm-scrypt Nvidia Tesla P100-PCIE (32GB) Ubuntu 20.04 NVIDIA-SMI 450.51.06 CUDA Version: 11.0 CUDA optimized prototype 2,083 2.083
09/08/2020 avive sm-scrypt Nvidia Tesla P4 (32GB) Ubuntu 20.04 NVIDIA-SMI 450.51.06 CUDA Version: 11.0 CUDA optimized prototype 757 0.75
04/04/2020 avive sm-scrypt Apple M1 MacOS 11.2 vulkan optimized prototype 214 0.214
04/21/2020 avive sm-scrypt Nvidia RTX 2070 Super, 8GB Ubuntu 20.04, Driver 460.73.01 CUDA optimized prototype 2,038 2.038

3rd Party Vulkan and CUDA Benchmarks

The library performance on a GPU depends on the GPU's CUDA and Vulkan performance. The following benchmarks are available from geekbench:

gpu-post's People

Contributors

andrewar2 avatar avive avatar cmoetzing avatar fasmat avatar ihortk avatar liathoffman avatar lrettig avatar menzels avatar moshababo avatar pigmej avatar poszu avatar seagiv avatar valar999 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpu-post's Issues

Cuda SDK 10.2 build dep on Linux

On Ubuntu, following Nvidia instructions on how to install Cuda SDK 10.2 cause the latest SDK 11 to get installed and the project can't build with the latest cuda sdk.
Attempted all methods mentioned here: https://developer.nvidia.com/cuda-10.2-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=debnetwork

To resolve - we need to provide clear instructions on how to setup CUDA sdk 10.2 on Ubuntu or support building the lib on CUDA SDK 11.x (ideal)

Tests fail on RTX 3080

Linux binaries from release v0.1.20.

ubuntu@xxxx:~$ ./gpu-setup-test -t
Test LEAFS: Label size: 8, count 131072, buffer 0.1M
CPU: 131072 hashes, 5059 h/s
GeForce RTX 3080: 131072 hashes, 236592057 h/s
ZEROS result
WRONG result for label size 8 from provider 0 [GeForce RTX 3080]
ubuntu@xxxx:~$ ./gpu-setup-test -u
[GeForce RTX 3080]: hash test WRONG
ubuntu@xxxx:~$ nvidia-smi
Tue Jun 29 09:13:39 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80       Driver Version: 460.80       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3080    Off  | 00000000:00:05.0 Off |                  N/A |
|  0%   29C    P0     1W / 320W |      0MiB / 10018MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Update readme

For all 3 platforms:

  • Build instructions
  • Vulkan sdk version
  • Runtime instructions

Support macOS-arm architecture

Motivation

Provide the library as a native macOS-arm lib. e.g. Macs with Mx CPU/GPU.

Overview

  • Add macOS-arm compilation target.
  • Update CI tests to run on macOS-arm arch (and not in macOS-amd emulation).
  • Update release artifacts to include macOS-arm.

Build automation Requirements

We'd like to automate building releases of the library for os x, linux and windows. Building the library for a platform requires the build to run on that platform with one ore more gpu compute sdks.

For Windows builds, we need to include the CUDA and VULKAN sdks in the build host env and support a build workflow which is pure CLI with cmake and doesn't require Visual Studio.

For OS X, we need to include the VULKAN sdk in the build env host.

Ideally a cloud build can be triggered from github. For example, by creating a new release, PR or tag or by triggering a custom Github Action.

The build system should build the library binaries and its c header for all 3 platforms using a CI service such as Github Actions or Travis, and save the binary artifacts in a location that the build system of projects that use the libraries can pull them and build releases with them.

We would like the artifacts to be saved in a github branch and as assets of a github release.

For example, a desktop gpu-init app build workflow, we should be able to specify the version or tag of the library it wants to be built with and its build system should be able to pull the appropriate binary artifacts and the header file(s) for the library so it can be linked into the app.

This is work in progress - Please share some thoughts on this.

@noamnelke @ilans @moshababo @lrettig

Requirements for Phase II (WIP)

  1. Implement the library with Vulkan compute engine for os x, linux and windows on amd, nvidia and Intel gpus.

  2. Benchmark on all supported OSs and many major cards.

  3. If Vulkan performance can be on-par with the CUDA implementation then implement with Vulkan for all platforms(os x, linux, windows), and for all Vulkan supported GPUs (including Intel GPUs), otherwise: add OS X support via Vulkan only. Optimized API implementation for (n, 1, 1) - similar to phase I requirements.

  4. Vulkan performance target is between the OpenCL and the Cuda implementation and no less than 80% of Cuda.

  5. For OS X - If Vulkan performance is not good compared to Cuda then fallback to implement Cuda for Nvidia GPUs and OpenCL for Intel + AMD GPUs.

  6. All new implementations: optimize for (n=variable, r=1, p=1) scrypt params.

References

http://www.duskborn.com/posts/a-simple-vulkan-compute-example/

Long tests in test app

It will be nice if tests that accept labelsCount in the test app will not truncate to max label size per compute iteration but will perform multiple iterations of scryptPositions() with indexes to compute the user provider labelsCount.

GPU is not available on Ubuntu 22

Originally reported here: spacemeshos/smapp#1035

The user reported that neither internal Inter GPU nor Nvidia GPU is not available on his machine (running Ubuntu 22).
Previously on Smapp 0.2.8 (go-spacemesh 0.2.18-beta.0 and gpu-post 0.1.24), it works well.
So I assume we broke something in the newer versions.

Cuda error

Platform: GC N1 instance, Nvidia Testla V 100 (16 GB). Ubuntu 20.04.
Nvidia driver installed using this info: https://cloud.google.com/compute/docs/gpus/install-drivers-gpu.
Driver version 450.51.06. Cuda version 11.0.
Error running the test app which works fine on Nvidia Geforce (using Driver 440.100, Cuda version 10.2). You can get it from the release asset.

./gpu-setup-test
[2020-09-07 09:13:13] CL Platform 0 vendor: NVIDIA Corporation
[2020-09-07 09:13:13] CL Platform 0 name: NVIDIA CUDA
[2020-09-07 09:13:13] CL Platform 0 version: OpenCL 1.2 CUDA 11.0.197
[2020-09-07 09:13:13] Platform 0 devices: 1
[2020-09-07 09:13:13] initCpu() finished.
Performance: 976 (250000 positions in 256.15s)
[2020-09-07 09:17:29] Init GPU thread for GPU 0, platform GPU 0, pci [0:4]
[2020-09-07 09:17:30] GPU #0: 8 hashes / 0.5 MB per warp.
[2020-09-07 09:17:30] GPU #0: using launch configuration t1003x4
[2020-09-07 09:17:30] initCuda() finished.
[2020-09-07 09:17:30] GPU #0: cudaError 13 (invalid device symbol) (keccak.cu line 1056)

[2020-09-07 09:17:30] GPU #0: cudaError 13 (invalid device symbol) (keccak.cu line 1059)

We need to make sure that cuda gpu-compute works with the latest nvidia drivers and cuda runtime 11 as well as older drivers with cuda runtime 10 support.

pow solution not found in expected trials (ubuntu 20.04/Cuda)

Repro: run pos-server make test on a system w multiple cuda gpu providers.

Results: the first job that uses the 2nd provider fails to find pow solution in the expected number of pow-only iteration.

Expected: find pow-solution when it is not found in leaves computation phase in the expected number of trials.

With D= 0x000000fffff... (3 zero bytes prefix) the expected number of trials to find a pow solution should be 16,777,216. p for 3 leading 0s is 0.00000005960 ((1/256)^3) so with E=1/p we get to 16,777,216 expected trials.

Crashes on MacBookPro running benchmarks

Description

Got crashes when running ./gpu-setup-test -b on two MacBook Pros.
If I run ./gpu-setup-test -l it works well.

The bug is not reproduced on M1, and on @avive MacBook Pro, which has the same configuration as the second MBP (see below), but with macOS 11.5.2. So probably something wrong with the Vulkan driver / macOS version.

Environments

Using release: https://github.com/spacemeshos/go-spacemesh/releases/tag/v0.2.2-beta.1
Against the Devnet 205.

  1. MacBook Pro (Retina, 15-inch, Mid 2014)
    macOS 10.13.6 (High Sierra)
    Intel Iris Pro 1536 MB
    NVIDIA GeForce GT 750M (unsupported. CUDA Driver Version: 418.163 GPU Driver Version: 355.11.10.10.40.102)

  2. MacBook Pro (15-inch, 2018)
    macOS 10.13.6 (High Sierra)
    Intel UHD Graphics 630 1536 MB
    Radeon Pro 560X 4096 MB

Console output

The output for both MacBooks looks pretty the same (same stack trace). So here are one of them:

$ ./gpu-setup-test -l
Available POST compute providers:
  1: [VULKAN] Intel Iris Pro Graphics
  2: [CPU] CPU

$ ./gpu-setup-test -b
Benchmark: Label size: 8, count 131072, buffer 0.1M
2021-09-29 13:55:32.084 gpu-setup-test[9404:7235970] -[MTLComputePipelineDescriptorInternal setMaxTotalThreadsPerThreadgroup:]: unrecognized selector sent to instance 0x7fd73ac36850
2021-09-29 13:55:32.092 gpu-setup-test[9404:7235970] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[MTLComputePipelineDescriptorInternal setMaxTotalThreadsPerThreadgroup:]: unrecognized selector sent to instance 0x7fd73ac36850'
*** First throw call stack:
(
        0   CoreFoundation                      0x00007fff3065fbbb __exceptionPreprocess + 171
        1   libobjc.A.dylib                     0x00007fff5797ec76 objc_exception_throw + 48
        2   CoreFoundation                      0x00007fff306f8814 -[NSObject(NSObject) doesNotRecognizeSelector:] + 132
        3   CoreFoundation                      0x00007fff305d54a0 ___forwarding___ + 1456
        4   CoreFoundation                      0x00007fff305d4e68 _CF_forwarding_prep_0 + 120
        5   libMoltenVK.dylib                   0x000000010acd51a6 _ZN18MVKComputePipelineC2EP9MVKDeviceP16MVKPipelineCacheP11MVKPipelinePK27VkComputePipelineCreateInfo + 406
        6   libMoltenVK.dylib                   0x000000010acbec87 _ZN9MVKDevice15createPipelinesI18MVKComputePipeline27VkComputePipelineCreateInfoEE8VkResultP17VkPipelineCache_TjPKT0_PK21VkAllocationCallbacksPP12VkPipeline_T + 167
        7   libMoltenVK.dylib                   0x000000010ac571d9 vkCreateComputePipelines + 313
        8   libgpu-setup.dylib                  0x0000000109fdc720 loadShader + 656
        9   libgpu-setup.dylib                  0x0000000109fdaeea _ZL14vulkan_prepareP9cgpu_infojjjjbb + 2186
        10  libgpu-setup.dylib                  0x0000000109fd93dd _ZL23vulkan_scrypt_positionsP9cgpu_infoPhyyjjS1_jjjPyP7timevalS4_S2_ + 125
        11  libgpu-setup.dylib                  0x0000000109fd5f2a scryptPositions + 394
        12  gpu-setup-test                      0x0000000109fb7d07 _Z12do_benchmarkii + 775
        13  gpu-setup-test                      0x0000000109fbb501 main + 2673
        14  libdyld.dylib                       0x00007fff5859c015 start + 1
)
libc++abi.dylib: terminating with uncaught exception of type NSException
Abort trap: 6  

hashLenBits' 1-7 bit range is inconsistent

Calling scryptPositions with various values for the 1-7 bit range for hashLenBits param produces inconsistent outputs among different providers and sometimes to process crash. This should be fixed along with the task of extending range up to 512 bits (currently limited to 8).

Unexpected number of computed hashes (CUDA)

Using a CUDA provider, when the library is used to find labels and a pow solution via the options, and a pow solution is found in a library call while also the lib computes the leaves, then the returned total hashes computed by the lib is 0 and not the number of requested leaves.
Reproduced with this release: https://github.com/spacemeshos/gpu-post/actions/runs/863441063

@moshababo - fyi - I see this on CUDA only - not on CPU or Vulkan providers. It might be the issue you reported before. As a band-aid until this is fix - set the POW D difficulty param so additional computations just for POW will be needed after the leaves computation.

After changing to 8192 iterations and label size to 16B Vulkan api seems to be malfunctioning

2023/04/03 14:32:54 	INFO	initialization: datadir: ./nj-pos_gpu_512_8192, number of units: 1, max file size: 20000000, number of labels per unit: 10
2023/04/03 14:32:54 	INFO	initialization: files layout: number of files: 1, number of labels per file: 20000000, last file number of labels: 10
2023/04/03 14:32:54 	INFO	initialization: starting to write file #0; target number of labels: 10, start position: 0
2023/04/03 14:32:54 	DEBUG	initialization: file #0 current position: 0, remaining: 10
2023/04/03 14:32:54 	INFO	initialization: file #0, found nonce: 4560
2023/04/03 14:32:54 	INFO	initialization: file #0 completed; number of labels written: 10
2023/04/03 14:32:54 	INFO	cli: initialization completed
nj ~/workspace/post develop* $ go run cmd/postcli/main.go --maxFileSize 20000000 -numUnits 1 -provider 1  -labelsPerUnit 10 -datadir ./nj-pos_gpu_512_8192_2 -commitmentAtxId 509289c0b880f6dfa1eec73ce7bfd7e382e751ea8f22364d93deef116d60be55 -id e39e21509a554a7d182efb50069fcf7f5ad1be1b65954d15952269190c6698e0
22023/04/03 14:33:10 	INFO	initialization: datadir: ./nj-pos_gpu_512_8192_, number of units: 1, max file size: 20000000, number of labels per unit: 10
2023/04/03 14:33:10 	INFO	initialization: files layout: number of files: 1, number of labels per file: 20000000, last file number of labels: 10
2023/04/03 14:33:10 	INFO	initialization: starting to write file #0; target number of labels: 10, start position: 0
2023/04/03 14:33:10 	DEBUG	initialization: file #0 current position: 0, remaining: 10
2023/04/03 14:33:10 	INFO	initialization: file #0, found nonce: 5072
2023/04/03 14:33:10 	INFO	initialization: file #0 completed; number of labels written: 10
2023/04/03 14:33:10 	INFO	cli: initialization completed
nj ~/workspace/post develop* $ hexdump -n 64 ./nj-pos_gpu_512_8192/postdata_0.bin
0000000 45ce 2770 030a 23ae 4d00
000000a
nj ~/workspace/post develop* $ hexdump -n 64 ./nj-pos_gpu_512_8192/postdata_0.bin
0000000 45ce 2770 030a 23ae 4d00
000000a
nj ~/workspace/post develop* $ hexdump -n 64 ./nj-pos_gpu_512_8192_2/postdata_0.bin
0000000 da23 dd7b a5e1 5900 ed38
000000a

All of the above should have the same result.

It works fine on the CPU.

Testapp core-dump when listing providers on ubuntu 20.04 (CUDA)

List available providers with the test app from this linux release artifacts: https://github.com/spacemeshos/gpu-post/actions/runs/863441063 on Ubuntu 20.04 w one ore more Nvidia gpu supporting CUDA.
Result: lib can't see nvidia-gpus that should be available.
Expected: be able to use gpus as providers.
Nvidia Driver Version: 460.73.01 CUDA Version: 11.2
Previous versions of the lib and the test app work on the same system.

~/latest$ echo $LD_LIBRARY_PATH
.
ls -la
total 14160
-rwxrwxr-x  1 avive avive 14117488 May 23 11:04 libgpu-setup.so
-rwxrwxr-x  1 avive avive   364784 May 23 11:04 test_app
./test_app -l
Available POST compute providers:
  0: [CPU] CPU
Segmentation fault (core dumped)

Support Linux-arm architecture

Motivation

Allow users running modern Linux-arm (w appropriate cuda or vulkan hardware and drivers) to use the library and to create post data.

Overview

  • Build the lib for linux-arm64 arch.
  • Update CI to run all builds and tests on linux-arm64.
  • Update docs regarding supported linux arm systems.

Fix CI for gpu-post

At the moment there a multiple issues with the CI of gpu-post:

  • macOS-m1 build fails because build agent seems to not be available any more
  • linux-arm64 build fails because build agent seems to not be available any more
  • windows build failing because windows 2016 isn't supported by GH any more
    • windows build failing regularly because download of CUDA & Vulkan unreliable
  • linux build(s) fails on installing outdated tools and libraries (CUDA, Vulkan, etc.)

#68 fixes the following:

  • macOS-m1 build agent was found and reactivated
  • Windows build was upgraded to windows 2022
    • Download of tools are now done with a custom script; should now be more reliable. Downloads are cached between builds
  • Linux build was updated to most recent ubuntu and up to date toos

#70 fixes the arm64 build:

  • Fix bug in install tools script
  • Fix incorrect linking order for arm64 build

Race condition in global `g_spacemesh_api_abort_flag`

g_spacemesh_api_abort_flag (defined here https://github.com/spacemeshos/gpu-post/blob/develop/src/api_internal.cpp#L15) is used as a termination condition to stop calculation of hashes early (e.g. here https://github.com/spacemeshos/gpu-post/blob/develop/src/cuda/driver-cuda.cpp#L220).

This global variable isn't protected from concurrent access, so stopping a running process is inherently unsafe at the moment. This should be fixed by protecting g_spacemesh_api_abort_flag with mutexes.

[Metrics] v0.1.20 Test Results

Posting this here as a feedback thread. Information includes system details and benchmark results for v0.1.20.
Discord nick: Obsidian (Jᴧgᴧ)

=================================
System:
OS: Windows 10 Pro v20H2, Build 19042.985
CPU: Intel i7-6700K @ 4.6ghz (HT enabled: 4c/8t)
GPU: Nvidia RTX 2080ti 11GB @ stock (1350mhz / 7000mhz)
Video Driver: Nvidia 466.11
CUDA Driver: Nvidia CUDA 11.3.70

=================================
Geekbench 5:
Compute API: CUDA
Compute Benchmark: 169601

=================================
GPU-Post v0.1.20:
Test param: .\gpu-setup-test -t
3-run median result: NVIDIA GeForce RTX 2080 Ti: 131072 hashes, 1198602 h/s

Test param: .\gpu-setup-test -b -n 2000000
5-run median result: 2560603

Initialization problems on Nvidia with CUDA

While working with testnet-03 data we realized that some proofs are invalid. More debugging shown that the post data generated by GPUs was actually invalid.

All gpus were launched with go run cmd/postcli/main.go --maxFileSize 1073741824 -provider 1 -labelsPerUnit 536870912 -numUnits 8 -datadir ~/miner_0 -commitmentAtxId 66ec687158c86167efec0fcad4b9a6f3e98db07cc065a595a5b979f8a20dc7e2 -id 8411931ece38b5abfb611f3bb1155a7eecd9c0e6c26672b567f7b1629c70a924

CPU

0000000 9e4d f14c 363c f401 661f eb93 1700 335d
0000010 eb2a a43d 8329 69b7 6b77 d4d2 9fc0 d249
0000020 a2fd 2f57 e4c9 6f6e 30c2 59ea 0a15 be16
0000030 3b1e b0ce 1513 084c d854 c0e1 c43e 39e7
0000040 4ccc fd38 f6dc 7f49 c98a ba08 5f06 9b55
0000050 2ab1 8484 599c 131d f051 684e 4b84 7607
0000060 8e2b e3f3 004e e87b cd21 34c7 7860 977f
0000070 a7f1 24d0 5b61 1128 0e5e 02a6 9e93 b842
0000080

3060

[email protected]:~/miner_0$ hexdump -n 128 postdata_0.bin
0000000 9e4d f14c 363c f401 661f eb93 1700 335d
0000010 eb2a a43d 8329 69b7 6b77 d4d2 9fc0 d249
0000020 a2fd 2f57 e4c9 6f6e 30c2 59ea 0a15 be16
0000030 3b1e b0ce 1513 084c d854 c0e1 c43e 39e7
0000040 4ccc fd38 f6dc 7f49 c98a ba08 5f06 9b55
0000050 2ab1 8484 599c 131d f051 684e 4b84 7607
0000060 8e2b e3f3 004e e87b cd21 34c7 7860 977f
0000070 a7f1 24d0 5b61 1128 0e5e 02a6 9e93 b842
0000080
[email protected]~$ md5sum miner_0/postdata_0.bin
58702380f687fb9a65ecb691d80ff9cd  miner_0/postdata_0.bin

3090

1

[email protected]:~/miner_0$ hexdump -n 128 postdata_0.bin
0000000 ac29 8238 6dcc 5f67 da12 1e2a 4e4e a3d4
0000010 0934 59aa 4df8 aad3 101d fb26 5c27 5a37
0000020 3e06 37d0 8325 0114 8ffa 0a6a f33d 8666
0000030 9be6 65e0 0d49 e80a f95c 6f46 d5f3 beb4
0000040 5857 5c2e 5371 883f 188d 5ce7 32ab 6dab
0000050 cd62 9505 50a3 01d0 fff4 54b9 d65e 00d3
0000060 27ab 8274 5593 1305 dddd 5b47 ffa7 0551
0000070 210b 081c 44cc ad48 79e5 d3b0 bfd6 1686
0000080

2


[email protected]:~/miner_0$ hexdump -n 128 postdata_0.bin
0000000 ac29 8238 6dcc 5f67 da12 1e2a 4e4e a3d4
0000010 0934 59aa 4df8 aad3 101d fb26 5c27 5a37
0000020 3e06 37d0 8325 0114 8ffa 0a6a f33d 8666
0000030 9be6 65e0 0d49 e80a f95c 6f46 d5f3 beb4
0000040 5857 5c2e 5371 883f 188d 5ce7 32ab 6dab
0000050 cd62 9505 50a3 01d0 fff4 54b9 d65e 00d3
0000060 27ab 8274 5593 1305 dddd 5b47 ffa7 0551
0000070 210b 081c 44cc ad48 79e5 d3b0 bfd6 1686
0000080

3 (batch size 1)

0000000 ac29 8238 6dcc 5f67 da12 1e2a 4e4e a3d4
0000010 0934 59aa 4df8 aad3 101d fb26 5c27 5a37
0000020 3e06 37d0 8325 0114 8ffa 0a6a f33d 8666
0000030 9be6 65e0 0d49 e80a f95c 6f46 d5f3 beb4
0000040 5857 5c2e 5371 883f 188d 5ce7 32ab 6dab
0000050 cd62 9505 50a3 01d0 fff4 54b9 d65e 00d3
0000060 27ab 8274 5593 1305 dddd 5b47 ffa7 0551
0000070 210b 081c 44cc ad48 79e5 d3b0 bfd6 1686
0000080

Some more infos. 3090 had only the same beginning of the files

[email protected]:~/miner_0$ md5sum *.bin
02f047fdedd6d95e19f07d0251e50735  postdata_0.bin
a975d3efe2f8d1d51a0379fc5b4c1ff6  postdata_1.bin
003d2562cdf7a09d73090eb06b774963  postdata_2.bin
98644b5f5be6fa282f64fdaba2fc204b  postdata_3.bin
34dfc5ede040633eb0bb88dd8203db27  postdata_4.bin
[email protected]:~/miner_0$ md5sum *.bin
605c8142c390916a7cf0a0cff1fdd9a4  postdata_0.bin
9ad584b50bd50fffa0dd83833f46ad4b  postdata_1.bin
63e9e16bfa2d486bccd9ddb4041e4f90  postdata_2.bin
f074b2a7373df283048724fc62e9b05c  postdata_3.bin
5a8e68b77da0f93aa8c33aeac47831ac  postdata_4.bin
[email protected]:~/miner_0$ md5sum postdata_0.bin
7b74a522abd41c1ae41ccf1c895e58de  postdata_0.bin

3070

[email protected]:~$ hexdump -n 128 miner_0/postdata_0.bin
0000000 9e4d f14c 363c f401 661f eb93 1700 335d
0000010 eb2a a43d 8329 69b7 6b77 d4d2 9fc0 d249
0000020 a2fd 2f57 e4c9 6f6e 30c2 59ea 0a15 be16
0000030 3b1e b0ce 1513 084c d854 c0e1 c43e 39e7
0000040 4ccc fd38 f6dc 7f49 c98a ba08 5f06 9b55
0000050 2ab1 8484 599c 131d f051 684e 4b84 7607
0000060 8e2b e3f3 004e e87b cd21 34c7 7860 977f
0000070 a7f1 24d0 5b61 1128 0e5e 02a6 9e93 b842
0000080
[email protected]:~$ md5sum miner_0/postdata_0.bin
58702380f687fb9a65ecb691d80ff9cd  miner_0/postdata_0.bin

3050 Ti laptop

0000000 9e4d f14c 363c f401 661f eb93 1700 335d
0000010 eb2a a43d 8329 69b7 6b77 d4d2 9fc0 d249
0000020 a2fd 2f57 e4c9 6f6e 30c2 59ea 0a15 be16
0000030 3b1e b0ce 1513 084c d854 c0e1 c43e 39e7
0000040 4ccc fd38 f6dc 7f49 c98a ba08 5f06 9b55
0000050 2ab1 8484 599c 131d f051 684e 4b84 7607
0000060 8e2b e3f3 004e e87b cd21 34c7 7860 977f
0000070 a7f1 24d0 5b61 1128 0e5e 02a6 9e93 b842
0000080
md5sum miner_0/postdata_0.bin
58702380f687fb9a65ecb691d80ff9cd

Quick summary

At this point it's clear that the data becomes mangled somehow.
It seem to work till 8GB VRAM on gpus.

Tests were done with different batch sizes and different systems.

OS X blocks usage of Vulkan dynamic library

When running the test app on OS X 10.15.7 I get a security warning about the Vulkan lib with only option to cancel.
After allowing the library from the security & privacy prefs it isn't displayed anymore.
Adding this as a reference - we need to do what we can to enable linking with Vulkan lib w/o users having to go through this when installed via smapp. Maybe linking this lib from a notraized smapp app might resolve this. More research is needed into this.

@IlyaVi

Screen Shot 2020-11-12 at 1 58 02 PM

GPU not listed as option on M1 chip, only CPU

RPC call SmesherService.PostSetupComputeProviders respond only CPU option
Screenshot 2022-11-21 at 10 57 40
On latest develop version (30d4f2956) of go-spacemesh we have only CPU option to select for smeshing and GPU is not in the list
steps to reproduce

  1. run node ( ./build/go-spacemesh --config ./build/config.toml -d ./build/testenetdir --pprof-server ) pre build from develop 30d4f2956
  2. check RPC call PostSetupComputeProviders

Updated docs: GPU Post on Linux with AMD cards

Reminder to update the docs about linux home smeshers with AMD cards. They must install the amd-pro driver to get Vulkan runtime support. The standard AMD gpu linux drivers don't seem to support Vulkan, at least not on Ubuntu.

Test app improvements

This is a review of the testapp in the variable output size branch:

  • Document Usage
  • Uncomment bench for cpu provider
  • Add provider compute engine id to bench output
  • Uncomment sanity test and compare results between all providers.

Use clang-format for consistent code formatting

At the moment the code is inconsistently formatted. I suggest to use a code formatting tool like clang-format to get consistently formatted code.

Maybe we can also add a test to the build pipeline to verify that the code is formatted correctly before building it.

Don't require Vulkan SDK installation

We'd like to have a script that will only install the Vulkan 1.2 runtime components which are required to run the the library Vulkan providers code on OS X. This script will be added to Smapp installation process so users don't have to install the full Vulkan SDK or run a separate installer to do a gpu-post setup using a Vulkan provider. This will be part of Smapp installation.

Motivation: The current runtime is installed via python script as part of the Vulkan SDK which installs many more artifacts besides runtime components. e.g. iOS SDK, dev tools, ~230MB of demo apps... We'd like to avoid making the Smapp install huge due to the sheer size of the Vulkan SDK installer and we don't want to slow down smapp install due to the time it takes to install all of these components which are not needed for the core functionality.

Looks like this is not going to be very complicated as the current sdk installer just copies over a bunch of files and sets some env vars: https://vulkan.lunarg.com/doc/sdk/1.2.154.0/mac/getting_started.html

@IlyaVi

Windows CI builds using MSVC, but cgo needs a build from mingw64

The CI build uses MSVC to build gpu-post and creates a *.dll and *.lib file as output. cgo as of go 1.19 cannot link against *.lib files, but needs *.so libraries built with mingw64. The reason the current build works at all is because an old build of gpu-post has been committed to the https://github.com/spacemeshos/post repository that cgo links against.

Update the CI build:

The release job for gpu-post is not working as expected

The release job defined in build.yml doesn't work as expected. It only runs when the push that triggered the execution of the Github action happened on master and included a new git tag.

Instead the release job should be triggered on: release (see https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#release), re-use the existing jobs to build the library for all supported platforms (see also https://docs.github.com/en/actions/using-workflows/reusing-workflows#calling-a-reusable-workflow) and then append the release with the artifacts produced by the build jobs.

Add a linter to build pipeline

This project is missing a linter that checks the code for issues. A commonly used linter is clang-tidy.

It should check the code during every build and prevent tests from running if it does.

Add to OS X build instructions to readme

  1. Install latest version of xcode with the command line dev tools.
  2. Download the Vulkan 1.2 sdk installer for OS X from https://vulkan.lunarg.com/sdk/home#mac
  3. Copy Vulkan SDK from the vulkan installer volume to a directory in your hard-drive.
  4. Install the SDK from your hard-drive and not from the installer volume by running $ sudo ./install_vulkan.py.
  5. Add the vulkan env vars to your .bash_profile file with the root location set to the sdk directory on your drive. For example, if Vulkan sdk 1.2.154 is installed then the env vars should be set like this:
export VULKAN_SDK_VERSION="1.2.154.0"
export VULKAN_ROOT_LOCATION="$HOME/dev/vulkan-sdk-1.2.154"
export VULKAN_SDK="$VULKAN_ROOT_LOCATION/macOS"
export VK_ICD_FILENAMES="$VULKAN_SDK/share/vulkan/icd.d/MoltenVK_icd.json"
export VK_LAYER_PATH="$VULKAN_SDK/share/vulkan/explicit_layers.d"
export PATH="/usr/local/opt/python/libexec/bin:$VULKAN_SDK/bin:$PATH"
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$VULKAN_SDK/lib/"

For more info see: https://vulkan.lunarg.com/doc/sdk/latest/mac/getting_started.html

Seems to malfunction on A100-SXM4

CPU output:

[email protected]:~/workspace/post$ hexdump -n 128 ./nj-pos_cpu/postdata_0.bin
0000000 18e6 e3e9 31f3 a10a 3f86 bd9a cddb 040c
0000010 6ba3 d9a5 c796 35c8 2b21 49d2 b856 226d
0000020 ac55 b76f 8c6c d7a2 92a2 0f6b 6302 c515
0000030 96f1 eb6d 39e1 d79a ac01 6242 eb9d 4d36
0000040 e7f6 796d a033 a2ad 08a8 e1aa b7d2 88f0
0000050 0409 94f8 2dd4 1062 2c08 e9d3 649e 38d9
0000060 1b9c 7537 b50a a75d 0085 395a e5d3 a553
0000070 3987 95e1 4801 b2db a060 505d 029b 93e8
0000080

gpu output:

0000000 0000 0000 0000 0000 0000 0000 0000 0000
*

drivers 525 cuda 12.0.1

Add tests to CI piepline

CI should run all unit and integration tests when creating a new release via github actions:

  • Phase I - use CPU provider only on build server.
  • Phase II - test CUDA and Vulkan providers using dedicated cloud instances with GPUs.

For Phase II we want to test on the following systems configurations to get good coverage:

  1. Windows CUDA
  2. Windows Vulkan
  3. Ubuntu 20.04 CUDA (Nvidia gpu)
  4. Ubuntu 20.04 Vulkan (AMD gpu)
  5. macOS Vulkan (M1 gpu)

api_internal includes and FreeBSD 12

api_internal fails to compile on FreeBSD 12.2 due to issues with included headers: alloca() is provided by stdlib.h instead of alloca.h (which doesn't exist); and the endian functions are already defined in sys/endian.h (included by stdint.h). A simple fix would be to check for the __FreeBSD__ macro and not include alloca.h or define the endianness functions when set. I can submit a PR, if that's an acceptable solution.

Out of memory error

Running unit tests (test app with -u flag) causes macOS to run out of system apps memory.
This happens on systems with 8GB or 16GB of Ram.
To repro try ./gpu-setup-test -u.

Investigate sdk support for ARM systems

Look into creating libs for Window 10, macOS Linux ARM cpus in addition to amd-64. We should add these build targets to the lib if cuda and vulkan libs support this.

Review Readme changes

Review updated readme instructions especially the CUDA 9 Configuration and CUDA_TOOLKIT_ROOT_DIR cmake options.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.