schernykh / randomx_opencl Goto Github PK

RandomX OpenCL implementation

License: GNU General Public License v3.0

C 59.18% C++ 17.52% Assembly 23.22% Makefile 0.08%

randomx_opencl's Introduction

RandomX OpenCL implementation

This repository contains full RandomX OpenCL implementation (portable code for all GPUs and optimized code AMD Vega GPUs). The latest version of RandomX (1.1.0 as of August 30th, 2019) is supported.

Note: it's only a benchmark/testing tool, not an actual miner. RandomX hashrate is expected to improve somewhat in the future thanks to further optimizations.

GPUs tested so far:

Model	CryptonightR H/S	RandomX H/S	Relative speed	Comment
AMD Radeon VII (stock)	3125	1500	48%	JIT compiled mode, 150W
AMD Vega 64 (1700/1100 MHz)	2200	1225	55.7%	JIT compiled mode, 285W
AMD Vega 64 (1100/800 MHz)	1023	845	82.6%	JIT compiled mode, 115W
AMD Vega 64 (1700/1100 MHz)	2200	163	7.4%	VM interpreted mode
AMD Vega FE (stock)	2150	980	45.6%	JIT compiled mode (intensity 4096)
AMD Radeon RX 560 4GB (1400/2200 MHz)	495	260	52.5%	JIT compiled mode (intensity 896)
AMD Radeon RX RX470/570 4GB	930-950	400-410	43%	JIT compiled mode, 50W
AMD Radeon RX RX480/580 4GB	960-1000	470	47%	JIT compiled mode, 60W
GeForce GTX 1080 Ti (2037/11800 MHz)	927	601	64.8%	VM interpreted mode

Building on Windows

Install Visual Studio 2017 Community and CLRadeonExtender
Add CLRadeonExtender's bin directory to PATH environment variable
Open .sln file in Visual Studio and build it

Building on Ubuntu

Install prerequisites sudo apt install git cmake build-essential
If you want to try JIT compiled code for Vega or Polaris GPUs, install amdgpu-pro drivers with OpenCL enabled (run the install script like this ./amdgpu-pro-install --opencl=pal)
Download CLRadeonExtender and copy clrxasm to /usr/local/bin
Then run commands:

git clone --recursive https://github.com/SChernykh/RandomX_OpenCL
cd RandomX_OpenCL/RandomX
mkdir build && cd build
cmake -DARCH=native ..
make
cd ../../RandomX_OpenCL
make

Donations

If you'd like to support further development/optimization of RandomX miners (both CPU and AMD/NVIDIA), you're welcome to send any amount of XMR to the following address:

44MnN1f3Eto8DZYUWuE5XZNUtE3vcRzt2j6PzqWpPau34e6Cf4fAxt6X2MBmrm6F9YMEiMNjN6W4Shn4pLcfNAja621jwyg

randomx_opencl's People

Contributors

Stargazers

Watchers

Forkers

androiddev77 ohgodanoob marty8mf no777 linuxperia tr002196 supernoodled

randomx_opencl's Issues

Radeon VII error -11

I get the following error on my Radeon VII, Win 10 (1903), Adrenalin 19.5.2:

Compiling base_kernels.bin...done Compiling randomx_init.bin...done Compiling randomx_run_gfx803.bin...clBuildProgram failed: error -11 Error: AMD HSA Code Object loading failed.

This are the options that I use:

RandomX_OpenCL.exe --mine --intensity 1984

Full output:

`PS C:\Users\user\Downloads\RandomX_OpenCL-windows-x64-v1.0.4-8> .\RandomX_OpenCL.exe --mine --intensity 1984
Initializing GPU #0 on OpenCL platform #0

Device name: gfx906
Device vendor: Advanced Micro Devices, Inc.
Global memory: 16192 MB
Local memory: 32 KB
Clock speed: 1802 MHz
Compute units: 60
OpenCL version: OpenCL 2.0 AMD-APP (2841.5)
Driver version: 2841.5 (PAL,HSAIL)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_planar_yuv

Compiling base_kernels.bin...done
Compiling randomx_init.bin...done
Compiling randomx_run_gfx803.bin...clBuildProgram failed: error -11
Error: AMD HSA Code Object loading failed.`

Polaris test results

Here are my initial results testing on a few RX470/570/480/580 gpus:

System 1:

OS: Win10 1903
CPU: i3-7350k
Driver: 19.5.2

System 2:

OS: Win10 1903
CPU: e5-2360L
Driver: 18.5.1

GPUs:
Various RX470/570/480/580 all with 4GB mem. Clocks varied in a range from 1150-1250 for core and 1900-2100 for mem. Voltage varied from 875mV to 950mV

CN/R hashrate:
RX470/570 ~930-950h/s (typical range, extremes down to 880h/s, up to 980h/s)
RX480/580 ~960-1000h/s
GPU only power varies from 65W to 80W (depending on the gpu)

EDIT: CN/R hashrates and power were using TeamRedMiner. XMRig-AMD hashrates are 3-5% slower and power use is 5-10% higher.

RandomX hashrate:
RX470/570: 400-410 h/s
RX480/580: 470h/s
GPU only power was around 50-60W

Results were achieved using: --intensity 832 without --dataset_host.
--dataset_host results in much lower hashrate on riser gpus (260 h/s)

Unable to run benchmark on older Nvidia 3100m (Integrated) video card

Hi,

Thanks for taking the time to read this. I realize that the video card in this notebook PC is rather old in terms of technology. The main issue that I have with this one is that the Nvidia graphics chip was pretty high end, when I got it, but it only supports SM/Compute mode <=1.2. I've tested the card with clinfo, and the drivers look fine. I'm really not sure if this old of card is supported or not, this is more of a test to see what I can do with this old chip, pushing it to the limit, I realize I'm not going to get any decent mining performance out of it. I probaly wouldn't have filed this as an issue, given the age of the card but I haven't seen a compatibility matrix yet, and the release comments seem to indicate that it should work with any GPU that supports openCL

I'm happy to provide more details if needed.

sysinfo.txt

clinfo output:
`>clinfo.exe
Number of platforms 1
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.1 CUDA 6.5.51
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts
Platform Extensions function suffix NV

Platform Name NVIDIA CUDA
Number of devices 1
Device Name NVS 3100M
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.0 CUDA
Driver Version 342.00
Device Type GPU
Device Topology (NV) <printDeviceInfo:22: get CL_DEVICE_PCI_DOMAIN_ID_NV : error -30>
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Max compute units 2
Max clock frequency 1468MHz
Compute Capability (NV) 1.2
Max work item dimensions 3
Max work item sizes 512x512x64
Max work group size 512
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 0 / 0 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 268435456 (256MiB)
Error Correction support No
Max memory allocation 134217728 (128MiB)
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type None
Image support Yes
Max number of samplers per kernel 16
Max 2D image size 4096x16383 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 16384 (16KiB)
Registers per block (NV) 16384
Max number of constant args 9
Max constant buffer size 65536 (64KiB)
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 1
Device Extensions cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [NV]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
`
I've tried using the switch "--portable" as well as without, in both cases it results in an error. The --portable' option states that it requires at least SM 1.3 or higher, which is a little more helpful.

ptxas application ptx input, line 3430; error : Instruction 'mul' requires SM 1.3 or higher, or map_f64_to_f32 directive

and the option without "--portable" seems to indicate that it's an issue with the GPU missing this cl_khr_fp64 extension:
`RandomX_OpenCL-windows-x64-v1.1.0>RandomX_OpenCL.exe --mine --validate --platform 0 --device_id 0 --dataset_host --intensity 224
Initializing GPU #0 on OpenCL platform #0

Device name: NVS 3100M
Device vendor: NVIDIA Corporation
Global memory: 256 MB
Local memory: 16 KB
Clock speed: 1468 MHz
Compute units: 2
OpenCL version: OpenCL 1.0 CUDA
Driver version: 342.00
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics

Compiling base_kernels.bin...done
Compiling randomx_init.bin...clBuildProgram failed: error -11
:81:1: error: must specify '#pragma OPENCL EXTENSION cl_khr_fp64: enable' before using 'double'
double getSmallPositiveFloatBits(const ulong entropy)
^`

hi

works to undermine monero on bcu1525

port to Linux

I just finished porting/compiling on Linux. How can I contribute the changes? A Vega 64 managed to achieve 1052 H/s with about 5% over-clocking from stock clocks. How about porting to Polaris?

Error -61

I'm getting this on my rx 550's.
clCreateBuffer failed (c:\users\user\documents\github\randomx_opencl\randomx_opencl\miner.cpp, line 112): error -61

fatal error: CL/cl.h: No such file or directory

I have installed CL for this path:

root@z820 /opt/rocm/opencl/include # tree
.
├── CL
│   ├── cl_ext.h
│   ├── cl_gl_ext.h
│   ├── cl_gl.h
│   ├── cl.h
│   ├── cl.hpp
│   ├── cl_platform.h
│   └── opencl.h
└── opencl-c.h

And work other cmake with option like that:
cmake .... -DOpenCL_INCLUDE_DIR=/opt/rocm/opencl/include -DOpenCL_LIBRARY=/opt/rocm/opencl/lib/x86_64/libOpenCL.so ....
And i got error:

root@z820 /usr/src/RandomX_OpenCL/RandomX_OpenCL # make
clrxasm GCNASM/randomx_run_gfx803.asm -o randomx_run_gfx803.bin
clrxasm GCNASM/randomx_run_gfx900.asm -o randomx_run_gfx900.bin
g++ *.cpp -O3 -lOpenCL -lpthread ../RandomX/build/librandomx.a -o opencl_test
In file included from miner.cpp:26:
opencl_helpers.h:29:10: fatal error: CL/cl.h: No such file or directory
 #include <CL/cl.h>
          ^~~~~~~~~
compilation terminated.
In file included from opencl_helpers.cpp:23:
opencl_helpers.h:29:10: fatal error: CL/cl.h: No such file or directory
 #include <CL/cl.h>
          ^~~~~~~~~
compilation terminated.
In file included from tests.cpp:24:
opencl_helpers.h:29:10: fatal error: CL/cl.h: No such file or directory
 #include <CL/cl.h>
          ^~~~~~~~~
compilation terminated.
make: *** [makefile:4: release] Error 1

How to make and put needed path to make?
Thanks.

question: hashes randomx-benchmark vs xmrig what is different?

I run benchmark and got for example this:

root@z820 /opt/xmrig # ./randomx-benchmark --jit --mine --largePages --threads 1224
RandomX benchmark v1.1.0
 - full memory mode (2080 MiB)
 - JIT compiled mode 
 - hardware AES mode
 - large pages mode
Initializing (1 thread) ...
Memory initialized in 29.6792 s
Initializing 1224 virtual machine(s) ...
Running benchmark (1000 nonces) ...
Calculated result: 10b649a3f15c7c7f88277812f2e74b337a0f20ce909af09199cccb960771cfa1
Reference result:  10b649a3f15c7c7f88277812f2e74b337a0f20ce909af09199cccb960771cfa1
Performance: 2687.36 hashes per second

And in xmrig i must get the same hashes?
And what is different between hashes in xmrig and hashes here in randomx-benchmark ?

OpenCL fails to compile on Mac M1

System is a Mac Mini M1 with 16 GB memory, building in arm64 mode

Using the XMRIG miner and activated OpenCL with RandomX. Failed with this error:

[SNIP]
[2021-05-04 12:38:53.070] cpu use argon2 implementation default
[2021-05-04 12:38:53.070] randomx init dataset algo rx/0 (8 threads) seed 7d658366894bda88...
[2021-05-04 12:38:53.070] randomx allocated 2336 MB (2080+256) huge pages 0% 0/1168 +JIT (0 ms)
[2021-05-04 12:39:01.629] randomx dataset ready (8558 ms)
[2021-05-04 12:39:01.631] opencl use profile rx (1 thread) scratchpad 2048 KB
| # | GPU | BUS ID | INTENSITY | WSIZE | MEMORY | NAME
| 0 | 0 | n/a | 128 | 8 | 256 | Apple M1
[2021-05-04 12:39:01.632] opencl GPU #0 compiling...
UNSUPPORTED (log once): buildComputeProgram: cl2Metal failed
[2021-05-04 12:39:01.821] opencl error CL_BUILD_PROGRAM_FAILURE when calling clBuildProgram
BUILD LOG:
Compilation failed:

program_source:1736:1: warning: comparison of integers of different signs: 'uint32_t' (aka 'unsigned int') and 'int' update_max(latency,(last_memory_op_slot+WORKERS_PER_HASH)/WORKERS_PER_HASH); `^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~` `program_source:1347:56: note: expanded from macro 'update_max'` `#define update_max(value, next_value) do { if ((value) < (next_value)) (value) = (next_value); } while (0)
~~~~~ ^ ~~~~~~~~~~
program_source:1759:1: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'unsigned int'
update_max(first_allowed_slot,latency*WORKERS_PER_HASH);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:1347:56: note: expanded from macro 'update_max'
#define update_max(value, next_value) do { if ((value) < (next_value)) (value) = (next_value); } while (0)
[2021-05-04 12:39:01.822] opencl thread #0 self-test failed~~~~~~
[2021-05-04 12:39:01.822] opencl disabled (failed to start threads)

My question is twofold:

The signed vs unsigned comparison doesn't appear to be significant, therefore I should be able to solve this by casting the uint32_t to int32_t without any side effects, right?
The source code module appears to be randomx_vm.cl but if I change this module to add the casts and rebuild, the new code is not used. Even if I first remove the contents of the build directory.

schernykh / randomx_opencl Goto Github PK

randomx_opencl's Introduction

RandomX OpenCL implementation

Building on Windows

Building on Ubuntu

Donations

randomx_opencl's People

Contributors

Stargazers

Watchers

Forkers

randomx_opencl's Issues

Recommend Projects

Recommend Topics

Recommend Org