preda / gpuowl Goto Github PK

View Code? Open in Web Editor NEW

121.0 16.0 34.0 13.66 MB

GPU Mersenne primality test.

License: GNU General Public License v3.0

Makefile 0.44% C++ 73.84% C 22.14% Python 3.17% Shell 0.29% CMake 0.12%

opencl gpu-computing gpgpu lucas-lehmer mersenne-numbers

gpuowl's People

Contributors

Stargazers

Watchers

gpuowl's Issues

Exception 9: gpu_error INVALID_VALUE

gpuowl/clwrap.cpp

Line 193 in ea4953e

CHECK2(err, "clCreateProgramWithSource");

multiple instances of gpuowl

When running multiple instances of gpuowl it may happen that one instance remains stuck and the only way to stop it is to reboot the machine.

Probably the first instance running gets stuck when launching the second instance.

For the precision, the two instances are launched with a different -device gpu number.

compile error

running make, I get these errors.

g++ -std=c++17 -O2 -DREV=\"`git rev-parse --short HEAD``git diff-files --quiet || echo -mod`\" -Wall Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp Primes.cpp state.cpp Signal.cpp FFTConfig.cpp -o openowl -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L.
Gpu.cpp: In function ‘void logTimeKernels(std::initializer_list<Kernel*>)’:
Gpu.cpp:153:14: error: expected unqualified-id before ‘[’ token
   for (auto& [stats, name]: infos) {
              ^
Gpu.cpp:153:14: error: expected ‘;’ before ‘[’ token
Gpu.cpp:153:15: error: ‘stats’ was not declared in this scope
   for (auto& [stats, name]: infos) {
               ^~~~~
Gpu.cpp:153:22: error: ‘name’ was not declared in this scope
   for (auto& [stats, name]: infos) {
                      ^~~~
Gpu.cpp: In lambda function:
Gpu.cpp:153:27: error: expected ‘{’ before ‘:’ token
   for (auto& [stats, name]: infos) {
                           ^
Gpu.cpp: In function ‘void logTimeKernels(std::initializer_list<Kernel*>)’:
Gpu.cpp:153:27: error: expected ‘;’ before ‘:’ token
Gpu.cpp:153:27: error: expected primary-expression before ‘:’ token
Gpu.cpp:153:27: error: expected ‘)’ before ‘:’ token
Gpu.cpp:153:27: error: expected primary-expression before ‘:’ token
Gpu.cpp: In member function ‘std::__cxx11::string Gpu::factorPM1(u32, const Args&, u32, u32)’:
Gpu.cpp:610:8: error: expected unqualified-id before ‘[’ token
   auto [block, nPrimes, allSelected] = makePm1Plan(D, B1, B2);
        ^
Gpu.cpp:611:17: error: ‘allSelected’ was not declared in this scope
   u32 nBlocks = allSelected.size();
                 ^~~~~~~~~~~
Gpu.cpp:612:70: error: ‘block’ was not declared in this scope
   log("%u P-1 stage2: %u blocks starting at block %u\n", E, nBlocks, block);
                                                                      ^~~~~
Gpu.cpp:681:39: error: unable to deduce ‘auto&&’ from ‘allSelected’
   for (const vector<bool>& selected : allSelected) {
                                       ^~~~~~~~~~~
Gpu.cpp:708:61: error: ‘nPrimes’ was not declared in this scope
         float percent = (nPrimesDone + nBlocksDone) / float(nPrimes + nBlocks) * 100;
                                                             ^~~~~~~
Task.cpp: In member function ‘bool Task::execute(const Args&)’:
Task.cpp:62:10: error: expected unqualified-id before ‘[’ token
     auto [isPrime, res64] = gpu->isPrimePRP(exponent, args);
          ^
Task.cpp:63:33: error: ‘isPrime’ was not declared in this scope
     return writeResultPRP(args, isPrime, res64, fftSize);
                                 ^~~~~~~
Task.cpp:63:42: error: ‘res64’ was not declared in this scope
     return writeResultPRP(args, isPrime, res64, fftSize);
                                          ^~~~~
Makefile:9: recipe for target 'openowl' failed
make: *** [openowl] Error 1

Runtime error: assertion failed

gpuowl: kernel.h:57: Kernel::Kernel(cl_program ... Assertion ... workSize % groupSize == 0 ... failed

I have compiled the master branch and got this error at runtime. The version displayed by gpuowl is 2.1

System Information:

Debian testing
gcc 7.3.0

Issued make tf

Usage: ./tf []
OpenCL compilation error -11 (args -DNCLASS=60060u -DSPECIAL_PRIMES=32u -DNPRIMES=262176u -DLDS_WORDS=8192u -cl-std=CL2.0 -save-temps=t0/tf -I. -cl-fast-relaxed-math -cl-std=CL2.0 )
File for dumping source cl isn't opened
error: unable to open output file 't0/tf_0_Ellesmere.i': 'No such file or directory'
1 error generated.

error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR

error -44 (sieve)

Hang at program start

If there is a problem with the gpu, then gpuowl will hang and go into a "D" state (uninterruptible sleep). In this situation the only way to stop the program is to reset the machine.

Accumulating checkpoint files *.owl

We need a way to manage checkpoint files. They accumulate in the gpuowl directory, the directory becomes huge in size after some time of continuous work. In some scenarios the directory must be copied forth and back over the network, and the growing directory size has increasing impact.
A script that removes old checkpoints based on the date could be very useful.

ROCm 2.10: warning: argument unused during compilation: '-I .'

warning: argument unused during compilation: '-I .'

and ROCm is slower than previous version.

Debian: SCons

https://wiki.debian.org/UpstreamGuide#SCons

"Please don't use SCons. It is hard to use it correctly. For instance SCons is designed to ignore environment variables such as CFLAGS (unless your add code for this). It also does not support DESTDIR out of the box. As an upstream you have to explicitly add code for that (or Debian has to patch). Support for SONAMEs (library versioning) is also absent. The general observation is that many projects, that use SCons, do not have a working install target. Since projects work around these limitations individually there is no way to just use a SCons project in Debian, but more work is required to invoke it correctly. "

compiler flag -fconcepts needed with gcc 8.1

Hi,

I compiled the code in the master branch today, with gcc 8.1.0 on Arch Linux, getting the following error, and several like it.

in file included from args.h:4,
                 from gpuowl.cpp:5:
clwrap.h:247:41: error: use of ‘auto’ in parameter declaration only available with -fconcepts [-Werror]
 void setArg(cl_kernel k, int pos, const auto &value) { CHECK(clSetKernelArg(k, pos, sizeof(value), &value)); }
                                         ^~~~

Adding -fconcepts to the g++ line solved the problem, and gpuowl appears to work, i.e. runs without errors. I don't know how other gcc versions handle this.

I have an GeForce GTX 960, and found that gpuowl works with the flag -tail split but not without, for the exponent 75000001.

Cheers,
Fredrik

ROCm 2.7 - PRP performance drop

warning: do not update to ROCm 2.7 - performance is poor (again I would say), basically radeon VII timing went from 908 to 990 us/sq.

primenet.py - when there's no blank line at the end of worktodo

the next assignment goes to the previous line like
PFactor=N/A,1,2,9255193,-1,77,2PFactor=N/A,1,2,9751933,-1,77,2

Debianization issues

There are a number of issues for debianization to be successful:

excerpt of chat from debian-welcome:

selroc> ok for debian/install, the program requires some .cl files to be in the same directory
[10:12:30] yeah, usually upstream build systems will also have an install system
[10:12:40] hmm, ok
[10:13:00] /usr/bin isn't the best location for .cl files, can it load those from a different directory?
[10:13:06] no
[10:13:12] not now
[10:13:45] I can work with the original programmer to make it so
[10:14:10] hmm, ok. it should be fine for a local package but for the package to get into Debian, you would need to be able to load them from say /usr/share/gpuowl/*.cl instead
[10:14:10] I try to get in touch with him
[10:16:04] btw, since this is C++ code, it would be a good idea for him to run cppcheck on it to find any accidental errors

Fix Makefile for Ubuntu 18.04

The <filesystem> lib not found. Fix:

CXX=g++-8

v5.0, assertion raised

gpuowl.log:

2018-11-03 09:09:37 gpuowl 5.0--mod
2018-11-03 09:10:28 gpuowl 5.0--mod
2018-11-03 09:10:28 0 -user selroc -cpu 0 -device 0
2018-11-03 09:10:28 0 756839 FFT 512K: Width 64x8, Height 64x8; 1.44 bits/word
2018-11-03 09:10:28 0 using long carry kernels
2018-11-03 09:10:29 0 gfx803-36x1360-@4a:0.0 Ellesmere [Radeon RX 470/480]
2018-11-03 09:10:30 0 OpenCL compilation in 1085 ms, with "-DEXP=756839u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-03 09:10:30 0 756839.owl not found, starting from the beginning.
2018-11-03 09:10:31 0 756839 OK 800 0.11%; 0.53 ms/sq, 0 MULs; ETA 0d 00:07; 24ac239d8eb8ffa2 (check 0.24s)
2018-11-03 09:10:36 0 756839 10000 1.32%; 0.53 ms/sq, 0 MULs; ETA 0d 00:07; e0f756a0e6b027cf
2018-11-03 09:10:41 0 756839 20000 2.64%; 0.53 ms/sq, 0 MULs; ETA 0d 00:07; c24d9712d700c29e
2018-11-03 09:10:46 0 756839 30000 3.96%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; ef92f116fa7b7853
2018-11-03 09:10:52 0 756839 40000 5.28%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 56bee347346be732
2018-11-03 09:10:57 0 756839 50000 6.60%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 88a1922073d97c57
2018-11-03 09:11:02 0 756839 60000 7.92%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 2dd5ee5cdfe0c62a
2018-11-03 09:11:08 0 756839 70000 9.24%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 70439075d84ca857
2018-11-03 09:11:13 0 756839 80000 10.57%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 953b2f1c170a9def
2018-11-03 09:11:18 0 756839 90000 11.89%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 0e275a89b9c39b27
2018-11-03 09:11:24 0 756839 100000 13.21%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 046a3e1ad36681e9
2018-11-03 09:11:29 0 756839 110000 14.53%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; fad6fff7757f9a66
2018-11-03 09:11:34 0 756839 120000 15.85%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; b9e7f5cc6fc13dc0
2018-11-03 09:11:39 0 756839 130000 17.17%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 79b53436131c503b
2018-11-03 09:11:45 0 756839 140000 18.49%; 0.53 ms/sq, 0 MULs; ETA 0d 00:05; 04d1642ce8add525
2018-11-03 09:11:50 0 756839 150000 19.81%; 0.53 ms/sq, 0 MULs; ETA 0d 00:05; dde1480d8d123ee9
2018-11-03 09:11:56 0 756839 EE 160000 21.13%; 0.53 ms/sq, 0 MULs; ETA 0d 00:05; 76f44754c8e05f8c (check 0.23s)
2018-11-03 09:11:56 0 756839.owl loaded: k 800, B1 0, block 400, res64 24ac239d8eb8ffa2, stage 1, baseBits 0
2018-11-03 09:12:01 0 756839 10000 1.32%; 0.56 ms/sq, 0 MULs; ETA 0d 00:07; e0f756a0e6b027cf
2018-11-03 09:12:06 0 756839 20000 2.64%; 0.53 ms/sq, 0 MULs; ETA 0d 00:07; c24d9712d700c29e
2018-11-03 09:12:11 0 Stopping, please wait..
2018-11-03 09:12:11 0 756839 OK 28800 3.80%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 7f586f2ac3569dbe (check 0.23s)
2018-11-03 09:12:11 0 Exiting because "stop requested"
2018-11-03 09:12:11 0 Bye
2018-11-03 09:12:23 gpuowl 5.0--mod
2018-11-03 09:12:23 0 -user selroc -fft +1 -cpu 0 -device 0
2018-11-03 09:12:23 0 756839 FFT 1024K: Width 256x4, Height 64x8; 0.72 bits/word
2018-11-03 09:12:23 0 using long carry kernels
2018-11-03 09:12:23 0 gfx803-36x1360-@4a:0.0 Ellesmere [Radeon RX 470/480]
2018-11-03 09:12:24 0 OpenCL compilation in 1031 ms, with "-DEXP=756839u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-03 09:12:24 0 756839.owl loaded: k 28800, B1 0, block 400, res64 7f586f2ac3569dbe, stage 1, baseBits 0

*** Assertion raised ***

gpuowl/state.cpp

Line 124 in d00285c

assert(w >= 0 && w < (1 << len));

World record PRP tests

gpuowl/primenet.py

Line 99 in b1aa242

PRP_WORLD_RECORD = 152

Segmentation fault gpuOwl with kernel 5.3.0-rc5

Installed new kernel 5.3.0-rc5, gpuowl gives segmentation fault, even after recompiling.

Please add cpu name or device number to output lines

Just in case one wants to run multiple instances o gpuowl, this is to distinguish the output of each instance.

Radeon VII, Severe error: probably ROCm related

2019-09-07 12:06:44 90348611    33410000 36.98%;  886 us/sq; ETA 0d 14:01; bac38bb8e27196e5
2019-09-07 12:06:53 90348611    33420000 36.99%;  886 us/sq; ETA 0d 14:01; 5dc04e6cd38ab191
2019-09-07 12:07:02 90348611    33430000 37.00%;  887 us/sq; ETA 0d 14:01; b91d6d315cae4932
Queue at 0x7f23e803a000 inactivated due to async error:
        HSA_STATUS_ERROR_ILLEGAL_INSTRUCTION:  The agent attempted to execute an illegal shader instruction.

This needs reboot.

2^31 overflow core dump when testing large exponents

Only affects exponents that are in no way practical to test so it's low priority, it only practically limits benchmarking.

amdcube@amdcube:~/gpuowl$ ~/prime/bin/gpuowl/gpuowl -prp 2147483647
2019-05-19 12:24:39 gpuowl v6.5-25-gc48d46f
2019-05-19 12:24:39 Note: no config.txt file found
2019-05-19 12:24:39 config: -prp 2147483647 
2019-05-19 12:24:39 2147483647 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word
2019-05-19 12:24:39 using long carry kernels
2019-05-19 12:24:42 OpenCL compilation in 2643 ms, with "-DEXP=2147483647u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-19 12:24:53 2147483647.owl not found, starting from the beginning.
2019-05-19 12:27:47 2147483647 OK     2000  0.00%; 40.61 ms/sq; ETA 1009d 09:59; fb12c8169932aa03 (check 43.48s)
^C2019-05-19 12:28:28 Stopping, please wait..
2019-05-19 12:29:11 2147483647 OK     3000  0.00%; 40.85 ms/sq; ETA 1015d 04:58; 81a7712dcf35f074 (check 43.54s)
2019-05-19 12:29:11 Exiting because "stop requested"
2019-05-19 12:29:11 Bye

2^31-1 works fine

amdcube@amdcube:~/gpuowl$ ~/prime/bin/gpuowl/gpuowl -prp 2147483648
2019-05-19 12:29:34 gpuowl v6.5-25-gc48d46f
2019-05-19 12:29:34 Note: no config.txt file found
2019-05-19 12:29:34 config: -prp 2147483648 
2019-05-19 12:29:34 2147483648 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word
2019-05-19 12:29:34 using long carry kernels
2019-05-19 12:29:37 OpenCL compilation in 2623 ms, with "-DEXP=2147483648u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
gpuowl: state.cpp:146: std::pair<std::vector<double>, std::vector<double> > genWeights(int, int, int): Assertion `bits == baseBits || bits == baseBits + 1' failed.
Aborted (core dumped)
amdcube@amdcube:~/gpuowl$ ~/prime/bin/gpuowl/gpuowl -prp 2147483649
2019-05-19 12:29:48 gpuowl v6.5-25-gc48d46f
2019-05-19 12:29:48 Note: no config.txt file found
2019-05-19 12:29:48 config: -prp 2147483649 
2019-05-19 12:29:48 2147483649 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word
2019-05-19 12:29:48 using long carry kernels
2019-05-19 12:29:51 OpenCL compilation in 2663 ms, with "-DEXP=2147483649u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
gpuowl: state.cpp:146: std::pair<std::vector<double>, std::vector<double> > genWeights(int, int, int): Assertion `bits == baseBits || bits == baseBits + 1' failed.
Aborted (core dumped)

2^31 and 2^31+1 fail.

Adaptive step too frequent GEC

Running on Nvidia cards

When trying gpuowl on a Nvidia GTX 960, I get the following error:

gpuOwL v1.10-41616da GPU Mersenne primality checker
GeForce GTX 960-8x1278- 

OpenCL compilation in 591 ms, with " -DEXP=51001001u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1  -I. -cl-fast-relaxed-math "
error -19
gpuowl: clwrap.h:294: std::__cxx11::string getKernelArgName(cl_kernel, int): Assertion `check(clGetKernelArgInfo(k, pos, 0x119A, sizeof(buf), buf, &size))' failed.
Aborted (core dumped)

This can be solved by adding -cl-kernel-arg-info to the compiler argument string in clwrap.h, line 213.
From here

Kernel argument information is only available if the program object associated with kernel is created with clCreateProgramWithSource and the program executable is built with the -cl-kernel-arg-info option specified in options argument to clBuildProgram or clCompileProgram.

After this change, ./gpuowl -longTail works for me (on Arch Linux). Without -longTail it fails with error -9999 (fftW), which is an nvidia code for "Illegal read or write to a buffer" - maybe the program runs out of resources.

Some output for benchmarking:

OpenCL compilation in 1 ms, with " -DEXP=51001001u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1  -I. -cl-fast-relaxed-math -cl-kernel-arg-info"
Note: using long (not-fused) carry kernels
Note: using long (not-fused) tail kernels
PRP-3: FFT 4M (1024 * 2048 * 2) of 51001001 (12.16 bits/word) [2018-02-01 15:43:27 CET]
Starting at iteration 4500
OK     4500 / 51001001 [ 0.01%], 0.00 ms/it; ETA 0d 00:00; 4f675a45d12e6787 [15:43:34]
OK     5000 / 51001001 [ 0.01%], 11.64 ms/it; ETA 6d 20:50; 9a26470bab3f13b4 [15:43:47]
OK     6000 / 51001001 [ 0.01%], 11.68 ms/it; ETA 6d 21:28; 52c02feb24eecd14 [15:44:05]
OK    10000 / 51001001 [ 0.02%], 11.66 ms/it; ETA 6d 21:12; a32d9a0f25ae04bb [15:44:58]

error: work group size exceeds the maximum default — Win10 64bit

After hassling around with g++ I finally got the executable built but launching throws an error.

.\gpuowl.exe -device 0

gpuOwL v2.0-dbc5a01-mod GPU Mersenne primality checker
Pitcairn-16x 860-@1:0.0 AMD Radeon HD 7800 Series
Note: using long carry and fused tail kernels
OpenCL compilation error -11 (args  -DEXP=2976221u  -I. -cl-fast-relaxed-math -cl-kernel-arg-info )
".\gpuowl.cl", line 34: warning: OpenCL extension is now part of core
  #pragma OPENCL EXTENSION cl_khr_fp64 : enable
                           ^

".\gpuowl.cl", line 454: error: work group size exceeds the maximum default
          value for the selected device
  KERNEL(512) fft4K(P(T2) io, Trig smallTrig) {
  ^

".\gpuowl.cl", line 619: error: work group size exceeds the maximum default
          value for the selected device
  KERNEL(512) square(P(T2) io, Trig bigTrig)  { csquare(512, 4096, 625, io, bigTrig); }
  ^

".\gpuowl.cl", line 621: error: work group size exceeds the maximum default
          value for the selected device
  KERNEL(512) multiply(P(T2) io, CP(T2) in, Trig bigTrig)  { cmul(512, 4096, 625, io, in, bigTrig); }
  ^

".\gpuowl.cl", line 663: error: work group size exceeds the maximum default
          value for the selected device
  KERNEL(512) autoConv(P(T2) io, Trig smallTrig, P(T2) bigTrig) {
  ^

4 errors detected in the compilation of "C:\Users\\AppData\Local\Temp\OCL2284T1.cl".
Frontend phase failed compilation.


Bye

It does seem to work on my Intel HD Graphics though (I had it run for 1 Minute because of experimenting with -device) but obviously I want to run it on a proper graphics card.

Could you help me to get your program to hunt for a prime?

build eror on msys2/windows

`GmpUtil.cpp: In function 'mpz_class {anonymous}::powerSmooth(u32, u32)':
GmpUtil.cpp:26:28: error: call of overloaded '__gmp_expr()' is ambiguous
26 | mpz_class a{u64(exp) << 8}; // boost 2s.
| ^
In file included from GmpUtil.h:6,
from GmpUtil.cpp:3:
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(double)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(float)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(long unsigned int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(long int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(short unsigned int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(short int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(unsigned int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(unsigned char)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(signed char)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1492:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(__gmp_expr<__mpz_struct [1], __mpz_struct [1]>&&)'
1492 | __gmp_expr(__gmp_expr &&z)
| ^~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1490:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(const __gmp_expr<__mpz_struct [1], __mpz_struct [1]>&)'
1490 | __gmp_expr(const __gmp_expr &z) { mpz_init_set(mp, z.mp); }
| ^~~~~~~~~~
make: *** [Makefile:30: GmpUtil.o] Error 1

self-test is removed from gpuowl, what else can be tested, please provide proper instructions

Hi @preda

I have gone through readme file and wish to know more details about different validations I can do.
Please modify the readme file with more clear instructions.

Assertion failed against prime 521, 0.00 bits/word

gpuowl-OpenCL 3.4--mod
FFT 512K: Width 512 (64x8), Height 512 (64x8); 0.00 bits/word
Note: using long carry kernels
Ellesmere-36x1360-@A:0.0 Radeon RX 580 Series
OpenCL compilation in 952 ms, with " -DEXP=521u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=1u -I. -cl-fast-relaxed-math "
[2018-07-21 17:23:47 CEST] PRP M(521), FFT 512K, 0.00 bits/word

openowl: LowGpu.h:67: ....... failed.
Aborted.

Code for gpu information

Not really an issue, maybe an addition, I have adapted this chunk of code from another library, it returns various gpu properties.
https://github.com/valeriob01/gpuinfo

The sense of this addition is towards making gpuowl a trusted client, by returning exact GPU names and other information, so that they can appear inside the JSON result sent back to the server.

PM1 stalls just before completion

Please see attached log files.
Both run on Vega 56 (two different cards) and i5-3550 (two different CPUs) with amdgpu-pro 19.20 on Ubuntu 18.04 with recent or latest GpuOwl.
PM1-143791129.log
PM1-143792009.log

P.S. -- running the same software and Vega 56 instead on Xeon X5675, the first P-1 run always completes but the second hangs at the end of Stage 1. My workaround in this case is to put only one entry at a time in worktodo.txt and restart GpuOwl from scratch for each exponent.

gpuowl-wrap.cl is not generated automatically

It need now to issue "make gpuowl-wrap.cl" before "make".

The line with "load" information disappeared

Gpuowl does not show the line with the "load" information.

Memory leak P-1

Each time a new test starts it uses a little system memory and doesn't seem to free it afterwards. Encountered when doing many small P-1 tests, it took ~150 tests to fill 16GB of memory so it's unlikely to be encountered under normal use.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

OpenCL build program failure on Radeon V

gpuowl -dir /home/sel/gpuowl -use MERGED_MIDDLE -user selroc -block 1000 -log 10000 -cpu R7c -device 0
2019-12-10 14:46:11 gpuowl v6.11-82-gdb9ce44
2019-12-10 14:46:11 Note: no config.txt file found
2019-12-10 14:46:11 config: -dir /home/sel/gpuowl -use MERGED_MIDDLE -user selroc -block 1000 -log 10000 -cpu R7c -device 0 
2019-12-10 14:46:11 98563771 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.09 bits/word
2019-12-10 14:46:11 OpenCL args "-DEXP=98563771u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0x1.e0dea836fdc34p+0 -DIWEIGHT_STEP=0x1.1092a0edb09cep-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DAMDGPU=1 -DMERGED_MIDDLE=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
/tmp/AMD_2175_18/t_2175_20.cl:13:9: warning: GpuOwl requires OpenCL 200, found 120 [-W#pragma-messages]
#pragma message "GpuOwl requires OpenCL 200, found " STR(__OPENCL_VERSION__)
        ^
/tmp/AMD_2175_18/t_2175_20.cl:14:2: error: OpenCL >= 2.0 required
#error OpenCL >= 2.0 required
 ^
1 warning and 1 error generated.
2019-12-10 14:46:11 OpenCL compilation error -11 (args -DEXP=98563771u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0x1.e0dea836fdc34p+0 -DIWEIGHT_STEP=0x1.1092a0edb09cep-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DAMDGPU=1 -DMERGED_MIDDLE=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-12-10 14:46:11 Error: Failed to compile opencl source (from CL to LLVM IR).

2019-12-10 14:46:11 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:234 build
2019-12-10 14:46:11 Bye

Sometimes lines are skipped when calcing the next M

Description

The file 'worktodo.txt' recorded lines.
After completing the calculation of the first number, the program can delete in next line the first character in the line (I noted for the "DoubleCheck=") or the second character in the line (I noted for the "Test="), further which the program skips the working line and proceeds to the next line.

Version

gpuOwL version: v1.9-

OS type [version]: Windows x64 [6.1.7601]

Lines are skipped

worktodo.txt line '[Worker #1]
worktodo.txt line 'Tst=77002949
worktodo.txt line 'oubleCheck=0,60004433,76,1

Proposal for improvement of checkpoint recovery mechanism

gpuowl/checkpoint.cpp

Line 61 in f34ad18

If the checkpoint is invalid, load *-prev.owl, and overwrite the last checkpoint file.

invalid savefiles after blackout

2019-12-09 03:42:18 OpenCL compilation in 1.56 s
2019-12-09 03:42:18 '/home/xxx/gpuowl/98563771/98563771.owl' invalid
2019-12-09 03:42:18 '/home/xxx/gpuowl/98563771/98563771-old.owl' invalid
2019-12-09 03:42:18 Exiting because "invalid savefiles found, investigate why

I thought this is a extremely rare condition but it happened again.

Error on Ubuntu/amdgpu-pro 18.50

2019-05-23 14:40:05 Note: no config.txt file found
2019-05-23 14:40:05 config: -prp 82589933 -device 0
2019-05-23 14:40:05 82589933 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.50 bits/word
2019-05-23 14:40:05 using short carry kernels
2019-05-23 14:40:09 OpenCL args "-DEXP=82589933u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DFRAC=9280343354015947889ul -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-23 14:40:10 OpenCL compilation error -11 (args -DEXP=82589933u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DFRAC=9280343354015947889ul -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-05-23 14:40:10 /tmp/OCL1986T0.cl:183:3: error: implicit declaration of function '__asm' is invalid in C99
X2(u[0], u[2]);
^
/tmp/OCL1986T0.cl:150:2: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x));
^
/tmp/OCL1986T0.cl:183:3: error: expected ')'
/tmp/OCL1986T0.cl:150:35: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x));
^
/tmp/OCL1986T0.cl:183:3: note: to match this '('
/tmp/OCL1986T0.cl:150:7: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x));
^
/tmp/OCL1986T0.cl:183:3: error: expected ')'
X2(u[0], u[2]);
^
/tmp/OCL1986T0.cl:151:35: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y));
^
/tmp/OCL1986T0.cl:183:3: note: to match this '('
/tmp/OCL1986T0.cl:151:7: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y));
^
/tmp/OCL1986T0.cl:184:3: error: expected ')'
X2_mul_t4(u[1], u[3]);
^
/tmp/OCL1986T0.cl:172:35: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x));
^
/tmp/OCL1986T0.cl:184:3: note: to match this '('
/tmp/OCL1986T0.cl:172:7: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x));
^
/tmp/OCL1986T0.cl:184:3: error: expected ')'
X2_mul_t4(u[1], u[3]);
^
/tmp/OCL1986T0.cl:173:35: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.y), "v" (b.y));
^
/tmp/OCL1986T0.cl:184:3: note: to match this '('
/tmp/OCL1986T0.cl:173:7: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.y), 2019-05-23 14:40:10 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:215 build
2019-05-23 14:40:10 Bye

Some Sanity check fails

cppcheck --enable=all --quiet .

[GCD.cpp:38]: (warning) Assert statement calls a function which may have desired side effects: 'isOngoing'.
[GCD.h:11]: (style) The class 'GCD' does not have a constructor.
[kernel.h:31]: (warning) Member variable 'Kernel::timeSum' is not initialized in the constructor.
[kernel.h:31]: (warning) Member variable 'Kernel::nCalls' is not initialized in the constructor.
[clwrap.h:78]: (style) Class 'Queue' has a constructor with 1 argument that is not explicit.
[Primes.h:18]: (style) Class 'Primes' has a constructor with 1 argument that is not explicit.
[./Result.cpp:24]: (information) Skipping configuration 'REV' since the value of 'REV' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[Worktodo.cpp:26]: (warning) %d in format string (no. 2) requires 'int *' but the argument type is 'unsigned int *'.
[common.cpp:16]: (warning) Return value of function fopen() is not used.
[common.cpp:16]: (error) Return value of allocation function 'fopen' is not stored.
[./gpuowl.cpp:13]: (information) Skipping configuration 'REV' since the value of 'REV' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
(information) Cppcheck cannot find all the include files (use --check-config for details)

README.md out of sync

It states that gpuowl uses only 8M and 16M FFT lengths but this is no more the case.

No version string

/gpuowl# make
echo "git describe --long --dirty" > version.inc
fatal: No names found, cannot describe anything.
echo Version: cat version.inc
Version: ""
g++ -Wall -O2 -std=c++17 -Wall Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp state.cpp Signal.cpp FFTConfig.cpp -o gpuowl -lOpenCL -lgmp -lstdc++fs -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L.

FFT selection

An exponent > 86.5M started with 4M FFT produced errors at around 20%. Same exponent restarted with 5M FFT now appears going OK.

warning: array index 1 is past the end of the array

gpuowl -prp 44497 -device 0
2019-06-28 09:21:57 gpuowl v6.5-82-g77b45a4
2019-06-28 09:21:57 Note: no config.txt file found
2019-06-28 09:21:57 config: -prp 44497 -device 0
2019-06-28 09:21:57 44497 FFT 8K: Width 8x8, Height 8x8; 5.43 bits/word
2019-06-28 09:21:57 using long carry kernels
2019-06-28 09:21:58 OpenCL args "-DEXP=44497u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=1u -DWEIGHT_STEP=0x1.7b92f0a414e05p+0 -DIWEIGHT_STEP=0x1.59503de66e177p-1 -DWEIGHT_BIGSTEP=0x1.d5818dcfba487p+0 -DIWEIGHT_BIGSTEP=0x1.172b83c7d517bp-1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
/tmp/AMD_1217_35/t_1217_37.cl:1267:34: warning: array index 1 is past the end of the array (which contains 1 element) [-Warray-bounds]
steps[i] = mul(steps[i-1], steps[1]);
^ ~
/tmp/AMD_1217_35/t_1217_37.cl:1262:3: note: array 'steps' declared here
T2 steps[MIDDLE];
^
1 warning generated.
2019-06-28 09:21:59 OpenCL compilation in 1422 ms
2019-06-28 09:21:59 44497.owl loaded: k 2000, block 1000, res64 020904e660c53abb
2019-06-28 09:22:00 44497 OK 4000 8.89%; 121 us/sq; ETA 0d 00:00; 4d7b13d03f9c5720 (check 0.12s)
2019-06-28 09:22:02 44497 20000 44.44%; 121 us/sq; ETA 0d 00:00; e0fc41c8eadc4e96
2019-06-28 09:22:04 44497 40000 88.89%; 121 us/sq; ETA 0d 00:00; 9b4920985d079c24
2019-06-28 09:22:05 PP 44496 / 44497, fffffffffffffffc
2019-06-28 09:22:05 44497 OK 45000 100.00%; 121 us/sq; ETA 0d 00:00; 5ad3f1cd9c12bc86 (check 0.12s)
2019-06-28 09:22:05 {"exponent":"44497", "worktype":"PRP-3", "status":"P", "program":{"name":"gpuowl", "version":"v6.5-82-g77b45a4"}, "timestamp":"2019-06-28 07:22:05 UTC", "fft-length":8192, "res64":"fffffffffffffffc", "residue-type":4}
2019-06-28 09:22:05 Bye

Please edit README.md for v1.9

I saw the application on the forum. Unfortunately, it is not clear which parameters are required to run the application.

The "selftest" parameter remained in the description, but it was removed from the program:

gpuowl/args.h

Lines 54 to 55 in cb09cb2

"-selftest : perform self tests from 'selftest.txt'\n"

" Self-test mode does not load/save checkpoints, worktodo.txt or results.txt.\n"

It would be great if they could update the description.
Does the errors affect the calculations or can I ignore it? The parameter "legacy" produces the same error (But the end of the calculated runs through 6 days. That for 1 day is less than without the parameter.)

gpuOwL v1.9- GPU Mersenne primality checker
AMD Radeon (TM) R7 370 Series 16 @1:0.0, Pitcairn 1015MHz [win7-x64]
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEX
P=77002949u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 )
An invalid option was specified.

".\gpuowl.cl", line 67: warning: OpenCL extension is now part of core
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
OpenCL compilation in 1762 ms, with "-I. -cl-fast-relaxed-math -DEXP=77002949u
-DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "

Version string

./gpuowl -h
2019-04-14 13:01:47 gpuowl 005297a
2019-04-14 13:01:47 config: -h

ROCm smi lib function to get gpu UUID

https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/51a91da5820ce16ef8e6ce14e4086af80f33ecc0/src/rocm_smi.cc#L2368

That's it.

Wrong results for small primes

Tested 44497 and 86243.
The result is "C".

Status of RX5700XT tests

Now that the firmware files are in place, finally the GPU is working. The Radeon RX5700XT does not work with gpuOwl for doing PRP, but works with Mfakto for doing Trial Factoring. GpuOwl doing PRP always gives error (EE lines) and after 3 consecutive errors it quits.

automating work fetch

Hello I would like to know how I can automate the work fetch in gpuowl.

primenet.py error

gpuowl/primenet.py

Line 85 in 5e5b30d

 worktype = workTypes[options.work] if options.work in workTypes else int(options.work) 

TypeError: int() argument must be a string

High CPU usage with cudaowl

Hi,

I'm running cudaowl with Arch Linux and Cuda 9.20, on a GTX 960. The CPU usage of cudaowl stays close to 100% constantly.

I found that the CPU usage can be reduced significantly, to 2.6%, by adding a cudaDeviceSynchronize(); call in CudaGpu.h, line 225. This is at the end of the for loop in modSqLoop(). I guess this has something to do with cuda busy-waiting for the kernels to finish. With an explicit synchronization call, the CPU code goes to sleep instead (you set the cudaDeviceScheduleBlockingSync flag).

The synchronization has a small performance impact however, time per iteration increases from 18.85 ms to 19.10 ms. On the other hand, the power consumption of the whole computer drops by about 30W (no other significant CPU load), so the impact seems worth it for me. I'm testing M(90000881), FFT 4860K, 18.08 bits/word.

Greetings,
Fredrik

	"-selftest : perform self tests from 'selftest.txt'\n"
	" Self-test mode does not load/save checkpoints, worktodo.txt or results.txt.\n"