preda / gpuowl Goto Github PK
View Code? Open in Web Editor NEWGPU Mersenne primality test.
License: GNU General Public License v3.0
GPU Mersenne primality test.
License: GNU General Public License v3.0
Line 193 in ea4953e
When running multiple instances of gpuowl it may happen that one instance remains stuck and the only way to stop it is to reboot the machine.
Probably the first instance running gets stuck when launching the second instance.
For the precision, the two instances are launched with a different -device gpu number.
Line 29 in b1aa242
Shouldn't it be 'int' ?
running make, I get these errors.
g++ -std=c++17 -O2 -DREV=\"`git rev-parse --short HEAD``git diff-files --quiet || echo -mod`\" -Wall Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp Primes.cpp state.cpp Signal.cpp FFTConfig.cpp -o openowl -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L.
Gpu.cpp: In function ‘void logTimeKernels(std::initializer_list<Kernel*>)’:
Gpu.cpp:153:14: error: expected unqualified-id before ‘[’ token
for (auto& [stats, name]: infos) {
^
Gpu.cpp:153:14: error: expected ‘;’ before ‘[’ token
Gpu.cpp:153:15: error: ‘stats’ was not declared in this scope
for (auto& [stats, name]: infos) {
^~~~~
Gpu.cpp:153:22: error: ‘name’ was not declared in this scope
for (auto& [stats, name]: infos) {
^~~~
Gpu.cpp: In lambda function:
Gpu.cpp:153:27: error: expected ‘{’ before ‘:’ token
for (auto& [stats, name]: infos) {
^
Gpu.cpp: In function ‘void logTimeKernels(std::initializer_list<Kernel*>)’:
Gpu.cpp:153:27: error: expected ‘;’ before ‘:’ token
Gpu.cpp:153:27: error: expected primary-expression before ‘:’ token
Gpu.cpp:153:27: error: expected ‘)’ before ‘:’ token
Gpu.cpp:153:27: error: expected primary-expression before ‘:’ token
Gpu.cpp: In member function ‘std::__cxx11::string Gpu::factorPM1(u32, const Args&, u32, u32)’:
Gpu.cpp:610:8: error: expected unqualified-id before ‘[’ token
auto [block, nPrimes, allSelected] = makePm1Plan(D, B1, B2);
^
Gpu.cpp:611:17: error: ‘allSelected’ was not declared in this scope
u32 nBlocks = allSelected.size();
^~~~~~~~~~~
Gpu.cpp:612:70: error: ‘block’ was not declared in this scope
log("%u P-1 stage2: %u blocks starting at block %u\n", E, nBlocks, block);
^~~~~
Gpu.cpp:681:39: error: unable to deduce ‘auto&&’ from ‘allSelected’
for (const vector<bool>& selected : allSelected) {
^~~~~~~~~~~
Gpu.cpp:708:61: error: ‘nPrimes’ was not declared in this scope
float percent = (nPrimesDone + nBlocksDone) / float(nPrimes + nBlocks) * 100;
^~~~~~~
Task.cpp: In member function ‘bool Task::execute(const Args&)’:
Task.cpp:62:10: error: expected unqualified-id before ‘[’ token
auto [isPrime, res64] = gpu->isPrimePRP(exponent, args);
^
Task.cpp:63:33: error: ‘isPrime’ was not declared in this scope
return writeResultPRP(args, isPrime, res64, fftSize);
^~~~~~~
Task.cpp:63:42: error: ‘res64’ was not declared in this scope
return writeResultPRP(args, isPrime, res64, fftSize);
^~~~~
Makefile:9: recipe for target 'openowl' failed
make: *** [openowl] Error 1
gpuowl: kernel.h:57: Kernel::Kernel(cl_program ... Assertion ... workSize % groupSize == 0 ... failed
I have compiled the master branch and got this error at runtime. The version displayed by gpuowl is 2.1
System Information:
Usage: ./tf []
OpenCL compilation error -11 (args -DNCLASS=60060u -DSPECIAL_PRIMES=32u -DNPRIMES=262176u -DLDS_WORDS=8192u -cl-std=CL2.0 -save-temps=t0/tf -I. -cl-fast-relaxed-math -cl-std=CL2.0 )
File for dumping source cl isn't opened
error: unable to open output file 't0/tf_0_Ellesmere.i': 'No such file or directory'
1 error generated.
error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR
error -44 (sieve)
If there is a problem with the gpu, then gpuowl will hang and go into a "D" state (uninterruptible sleep). In this situation the only way to stop the program is to reset the machine.
We need a way to manage checkpoint files. They accumulate in the gpuowl directory, the directory becomes huge in size after some time of continuous work. In some scenarios the directory must be copied forth and back over the network, and the growing directory size has increasing impact.
A script that removes old checkpoints based on the date could be very useful.
warning: argument unused during compilation: '-I .'
and ROCm is slower than previous version.
https://wiki.debian.org/UpstreamGuide#SCons
"Please don't use SCons. It is hard to use it correctly. For instance SCons is designed to ignore environment variables such as CFLAGS (unless your add code for this). It also does not support DESTDIR out of the box. As an upstream you have to explicitly add code for that (or Debian has to patch). Support for SONAMEs (library versioning) is also absent. The general observation is that many projects, that use SCons, do not have a working install target. Since projects work around these limitations individually there is no way to just use a SCons project in Debian, but more work is required to invoke it correctly. "
Hi,
I compiled the code in the master branch today, with gcc 8.1.0 on Arch Linux, getting the following error, and several like it.
in file included from args.h:4,
from gpuowl.cpp:5:
clwrap.h:247:41: error: use of ‘auto’ in parameter declaration only available with -fconcepts [-Werror]
void setArg(cl_kernel k, int pos, const auto &value) { CHECK(clSetKernelArg(k, pos, sizeof(value), &value)); }
^~~~
Adding -fconcepts
to the g++ line solved the problem, and gpuowl appears to work, i.e. runs without errors. I don't know how other gcc versions handle this.
I have an GeForce GTX 960, and found that gpuowl works with the flag -tail split
but not without, for the exponent 75000001.
Cheers,
Fredrik
warning: do not update to ROCm 2.7 - performance is poor (again I would say), basically radeon VII timing went from 908 to 990 us/sq.
the next assignment goes to the previous line like
PFactor=N/A,1,2,9255193,-1,77,2PFactor=N/A,1,2,9751933,-1,77,2
There are a number of issues for debianization to be successful:
excerpt of chat from debian-welcome:
selroc> ok for debian/install, the program requires some .cl files to be in the same directory
[10:12:30] yeah, usually upstream build systems will also have an install system
[10:12:40] hmm, ok
[10:13:00] /usr/bin isn't the best location for .cl files, can it load those from a different directory?
[10:13:06] no
[10:13:12] not now
[10:13:45] I can work with the original programmer to make it so
[10:14:10] hmm, ok. it should be fine for a local package but for the package to get into Debian, you would need to be able to load them from say /usr/share/gpuowl/*.cl instead
[10:14:10] I try to get in touch with him
[10:16:04] btw, since this is C++ code, it would be a good idea for him to run cppcheck on it to find any accidental errors
The <filesystem> lib not found. Fix:
CXX=g++-8
2018-11-03 09:09:37 gpuowl 5.0--mod
2018-11-03 09:10:28 gpuowl 5.0--mod
2018-11-03 09:10:28 0 -user selroc -cpu 0 -device 0
2018-11-03 09:10:28 0 756839 FFT 512K: Width 64x8, Height 64x8; 1.44 bits/word
2018-11-03 09:10:28 0 using long carry kernels
2018-11-03 09:10:29 0 gfx803-36x1360-@4a:0.0 Ellesmere [Radeon RX 470/480]
2018-11-03 09:10:30 0 OpenCL compilation in 1085 ms, with "-DEXP=756839u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-03 09:10:30 0 756839.owl not found, starting from the beginning.
2018-11-03 09:10:31 0 756839 OK 800 0.11%; 0.53 ms/sq, 0 MULs; ETA 0d 00:07; 24ac239d8eb8ffa2 (check 0.24s)
2018-11-03 09:10:36 0 756839 10000 1.32%; 0.53 ms/sq, 0 MULs; ETA 0d 00:07; e0f756a0e6b027cf
2018-11-03 09:10:41 0 756839 20000 2.64%; 0.53 ms/sq, 0 MULs; ETA 0d 00:07; c24d9712d700c29e
2018-11-03 09:10:46 0 756839 30000 3.96%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; ef92f116fa7b7853
2018-11-03 09:10:52 0 756839 40000 5.28%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 56bee347346be732
2018-11-03 09:10:57 0 756839 50000 6.60%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 88a1922073d97c57
2018-11-03 09:11:02 0 756839 60000 7.92%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 2dd5ee5cdfe0c62a
2018-11-03 09:11:08 0 756839 70000 9.24%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 70439075d84ca857
2018-11-03 09:11:13 0 756839 80000 10.57%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 953b2f1c170a9def
2018-11-03 09:11:18 0 756839 90000 11.89%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 0e275a89b9c39b27
2018-11-03 09:11:24 0 756839 100000 13.21%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 046a3e1ad36681e9
2018-11-03 09:11:29 0 756839 110000 14.53%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; fad6fff7757f9a66
2018-11-03 09:11:34 0 756839 120000 15.85%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; b9e7f5cc6fc13dc0
2018-11-03 09:11:39 0 756839 130000 17.17%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 79b53436131c503b
2018-11-03 09:11:45 0 756839 140000 18.49%; 0.53 ms/sq, 0 MULs; ETA 0d 00:05; 04d1642ce8add525
2018-11-03 09:11:50 0 756839 150000 19.81%; 0.53 ms/sq, 0 MULs; ETA 0d 00:05; dde1480d8d123ee9
2018-11-03 09:11:56 0 756839 EE 160000 21.13%; 0.53 ms/sq, 0 MULs; ETA 0d 00:05; 76f44754c8e05f8c (check 0.23s)
2018-11-03 09:11:56 0 756839.owl loaded: k 800, B1 0, block 400, res64 24ac239d8eb8ffa2, stage 1, baseBits 0
2018-11-03 09:12:01 0 756839 10000 1.32%; 0.56 ms/sq, 0 MULs; ETA 0d 00:07; e0f756a0e6b027cf
2018-11-03 09:12:06 0 756839 20000 2.64%; 0.53 ms/sq, 0 MULs; ETA 0d 00:07; c24d9712d700c29e
2018-11-03 09:12:11 0 Stopping, please wait..
2018-11-03 09:12:11 0 756839 OK 28800 3.80%; 0.53 ms/sq, 0 MULs; ETA 0d 00:06; 7f586f2ac3569dbe (check 0.23s)
2018-11-03 09:12:11 0 Exiting because "stop requested"
2018-11-03 09:12:11 0 Bye
2018-11-03 09:12:23 gpuowl 5.0--mod
2018-11-03 09:12:23 0 -user selroc -fft +1 -cpu 0 -device 0
2018-11-03 09:12:23 0 756839 FFT 1024K: Width 256x4, Height 64x8; 0.72 bits/word
2018-11-03 09:12:23 0 using long carry kernels
2018-11-03 09:12:23 0 gfx803-36x1360-@4a:0.0 Ellesmere [Radeon RX 470/480]
2018-11-03 09:12:24 0 OpenCL compilation in 1031 ms, with "-DEXP=756839u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-03 09:12:24 0 756839.owl loaded: k 28800, B1 0, block 400, res64 7f586f2ac3569dbe, stage 1, baseBits 0
*** Assertion raised ***
Line 124 in d00285c
Line 99 in b1aa242
PRP_WORLD_RECORD = 152
Installed new kernel 5.3.0-rc5, gpuowl gives segmentation fault, even after recompiling.
Just in case one wants to run multiple instances o gpuowl, this is to distinguish the output of each instance.
2019-09-07 12:06:44 90348611 33410000 36.98%; 886 us/sq; ETA 0d 14:01; bac38bb8e27196e5
2019-09-07 12:06:53 90348611 33420000 36.99%; 886 us/sq; ETA 0d 14:01; 5dc04e6cd38ab191
2019-09-07 12:07:02 90348611 33430000 37.00%; 887 us/sq; ETA 0d 14:01; b91d6d315cae4932
Queue at 0x7f23e803a000 inactivated due to async error:
HSA_STATUS_ERROR_ILLEGAL_INSTRUCTION: The agent attempted to execute an illegal shader instruction.
This needs reboot.
Only affects exponents that are in no way practical to test so it's low priority, it only practically limits benchmarking.
amdcube@amdcube:~/gpuowl$ ~/prime/bin/gpuowl/gpuowl -prp 2147483647
2019-05-19 12:24:39 gpuowl v6.5-25-gc48d46f
2019-05-19 12:24:39 Note: no config.txt file found
2019-05-19 12:24:39 config: -prp 2147483647
2019-05-19 12:24:39 2147483647 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word
2019-05-19 12:24:39 using long carry kernels
2019-05-19 12:24:42 OpenCL compilation in 2643 ms, with "-DEXP=2147483647u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-19 12:24:53 2147483647.owl not found, starting from the beginning.
2019-05-19 12:27:47 2147483647 OK 2000 0.00%; 40.61 ms/sq; ETA 1009d 09:59; fb12c8169932aa03 (check 43.48s)
^C2019-05-19 12:28:28 Stopping, please wait..
2019-05-19 12:29:11 2147483647 OK 3000 0.00%; 40.85 ms/sq; ETA 1015d 04:58; 81a7712dcf35f074 (check 43.54s)
2019-05-19 12:29:11 Exiting because "stop requested"
2019-05-19 12:29:11 Bye
2^31-1 works fine
amdcube@amdcube:~/gpuowl$ ~/prime/bin/gpuowl/gpuowl -prp 2147483648
2019-05-19 12:29:34 gpuowl v6.5-25-gc48d46f
2019-05-19 12:29:34 Note: no config.txt file found
2019-05-19 12:29:34 config: -prp 2147483648
2019-05-19 12:29:34 2147483648 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word
2019-05-19 12:29:34 using long carry kernels
2019-05-19 12:29:37 OpenCL compilation in 2623 ms, with "-DEXP=2147483648u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
gpuowl: state.cpp:146: std::pair<std::vector<double>, std::vector<double> > genWeights(int, int, int): Assertion `bits == baseBits || bits == baseBits + 1' failed.
Aborted (core dumped)
amdcube@amdcube:~/gpuowl$ ~/prime/bin/gpuowl/gpuowl -prp 2147483649
2019-05-19 12:29:48 gpuowl v6.5-25-gc48d46f
2019-05-19 12:29:48 Note: no config.txt file found
2019-05-19 12:29:48 config: -prp 2147483649
2019-05-19 12:29:48 2147483649 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word
2019-05-19 12:29:48 using long carry kernels
2019-05-19 12:29:51 OpenCL compilation in 2663 ms, with "-DEXP=2147483649u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
gpuowl: state.cpp:146: std::pair<std::vector<double>, std::vector<double> > genWeights(int, int, int): Assertion `bits == baseBits || bits == baseBits + 1' failed.
Aborted (core dumped)
2^31 and 2^31+1 fail.
When trying gpuowl on a Nvidia GTX 960, I get the following error:
gpuOwL v1.10-41616da GPU Mersenne primality checker
GeForce GTX 960-8x1278-
OpenCL compilation in 591 ms, with " -DEXP=51001001u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 -I. -cl-fast-relaxed-math "
error -19
gpuowl: clwrap.h:294: std::__cxx11::string getKernelArgName(cl_kernel, int): Assertion `check(clGetKernelArgInfo(k, pos, 0x119A, sizeof(buf), buf, &size))' failed.
Aborted (core dumped)
This can be solved by adding -cl-kernel-arg-info
to the compiler argument string in clwrap.h, line 213.
From here
Kernel argument information is only available if the program object associated with kernel is created with clCreateProgramWithSource and the program executable is built with the -cl-kernel-arg-info option specified in options argument to clBuildProgram or clCompileProgram.
After this change, ./gpuowl -longTail
works for me (on Arch Linux). Without -longTail
it fails with error -9999 (fftW)
, which is an nvidia code for "Illegal read or write to a buffer" - maybe the program runs out of resources.
Some output for benchmarking:
OpenCL compilation in 1 ms, with " -DEXP=51001001u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 -I. -cl-fast-relaxed-math -cl-kernel-arg-info"
Note: using long (not-fused) carry kernels
Note: using long (not-fused) tail kernels
PRP-3: FFT 4M (1024 * 2048 * 2) of 51001001 (12.16 bits/word) [2018-02-01 15:43:27 CET]
Starting at iteration 4500
OK 4500 / 51001001 [ 0.01%], 0.00 ms/it; ETA 0d 00:00; 4f675a45d12e6787 [15:43:34]
OK 5000 / 51001001 [ 0.01%], 11.64 ms/it; ETA 6d 20:50; 9a26470bab3f13b4 [15:43:47]
OK 6000 / 51001001 [ 0.01%], 11.68 ms/it; ETA 6d 21:28; 52c02feb24eecd14 [15:44:05]
OK 10000 / 51001001 [ 0.02%], 11.66 ms/it; ETA 6d 21:12; a32d9a0f25ae04bb [15:44:58]
After hassling around with g++ I finally got the executable built but launching throws an error.
.\gpuowl.exe -device 0
gpuOwL v2.0-dbc5a01-mod GPU Mersenne primality checker
Pitcairn-16x 860-@1:0.0 AMD Radeon HD 7800 Series
Note: using long carry and fused tail kernels
OpenCL compilation error -11 (args -DEXP=2976221u -I. -cl-fast-relaxed-math -cl-kernel-arg-info )
".\gpuowl.cl", line 34: warning: OpenCL extension is now part of core
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^
".\gpuowl.cl", line 454: error: work group size exceeds the maximum default
value for the selected device
KERNEL(512) fft4K(P(T2) io, Trig smallTrig) {
^
".\gpuowl.cl", line 619: error: work group size exceeds the maximum default
value for the selected device
KERNEL(512) square(P(T2) io, Trig bigTrig) { csquare(512, 4096, 625, io, bigTrig); }
^
".\gpuowl.cl", line 621: error: work group size exceeds the maximum default
value for the selected device
KERNEL(512) multiply(P(T2) io, CP(T2) in, Trig bigTrig) { cmul(512, 4096, 625, io, in, bigTrig); }
^
".\gpuowl.cl", line 663: error: work group size exceeds the maximum default
value for the selected device
KERNEL(512) autoConv(P(T2) io, Trig smallTrig, P(T2) bigTrig) {
^
4 errors detected in the compilation of "C:\Users\\AppData\Local\Temp\OCL2284T1.cl".
Frontend phase failed compilation.
Bye
It does seem to work on my Intel HD Graphics though (I had it run for 1 Minute because of experimenting with -device) but obviously I want to run it on a proper graphics card.
Could you help me to get your program to hunt for a prime?
`GmpUtil.cpp: In function 'mpz_class {anonymous}::powerSmooth(u32, u32)':
GmpUtil.cpp:26:28: error: call of overloaded '__gmp_expr()' is ambiguous
26 | mpz_class a{u64(exp) << 8}; // boost 2s.
| ^
In file included from GmpUtil.h:6,
from GmpUtil.cpp:3:
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(double)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(float)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(long unsigned int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(long int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(short unsigned int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(short int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(unsigned int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(int)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(unsigned char)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1502:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(signed char)'
1502 | __GMPXX_DEFINE_ARITHMETIC_CONSTRUCTORS
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1492:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(__gmp_expr<__mpz_struct [1], __mpz_struct [1]>&&)'
1492 | __gmp_expr(__gmp_expr &&z)
| ^~~~~~~~~~
C:/msys64/mingw64/include/gmpxx.h:1490:3: note: candidate: '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>::__gmp_expr(const __gmp_expr<__mpz_struct [1], __mpz_struct [1]>&)'
1490 | __gmp_expr(const __gmp_expr &z) { mpz_init_set(mp, z.mp); }
| ^~~~~~~~~~
make: *** [Makefile:30: GmpUtil.o] Error 1
`
Hi @preda
I have gone through readme file and wish to know more details about different validations I can do.
Please modify the readme file with more clear instructions.
gpuowl-OpenCL 3.4--mod
FFT 512K: Width 512 (64x8), Height 512 (64x8); 0.00 bits/word
Note: using long carry kernels
Ellesmere-36x1360-@A:0.0 Radeon RX 580 Series
OpenCL compilation in 952 ms, with " -DEXP=521u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=1u -I. -cl-fast-relaxed-math "
[2018-07-21 17:23:47 CEST] PRP M(521), FFT 512K, 0.00 bits/word
openowl: LowGpu.h:67: ....... failed.
Aborted.
Not really an issue, maybe an addition, I have adapted this chunk of code from another library, it returns various gpu properties.
https://github.com/valeriob01/gpuinfo
The sense of this addition is towards making gpuowl a trusted client, by returning exact GPU names and other information, so that they can appear inside the JSON result sent back to the server.
Please see attached log files.
Both run on Vega 56 (two different cards) and i5-3550 (two different CPUs) with amdgpu-pro 19.20 on Ubuntu 18.04 with recent or latest GpuOwl.
PM1-143791129.log
PM1-143792009.log
P.S. -- running the same software and Vega 56 instead on Xeon X5675, the first P-1 run always completes but the second hangs at the end of Stage 1. My workaround in this case is to put only one entry at a time in worktodo.txt and restart GpuOwl from scratch for each exponent.
It need now to issue "make gpuowl-wrap.cl" before "make".
Each time a new test starts it uses a little system memory and doesn't seem to free it afterwards. Encountered when doing many small P-1 tests, it took ~150 tests to fill 16GB of memory so it's unlikely to be encountered under normal use.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
gpuowl -dir /home/sel/gpuowl -use MERGED_MIDDLE -user selroc -block 1000 -log 10000 -cpu R7c -device 0
2019-12-10 14:46:11 gpuowl v6.11-82-gdb9ce44
2019-12-10 14:46:11 Note: no config.txt file found
2019-12-10 14:46:11 config: -dir /home/sel/gpuowl -use MERGED_MIDDLE -user selroc -block 1000 -log 10000 -cpu R7c -device 0
2019-12-10 14:46:11 98563771 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.09 bits/word
2019-12-10 14:46:11 OpenCL args "-DEXP=98563771u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0x1.e0dea836fdc34p+0 -DIWEIGHT_STEP=0x1.1092a0edb09cep-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DAMDGPU=1 -DMERGED_MIDDLE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
/tmp/AMD_2175_18/t_2175_20.cl:13:9: warning: GpuOwl requires OpenCL 200, found 120 [-W#pragma-messages]
#pragma message "GpuOwl requires OpenCL 200, found " STR(__OPENCL_VERSION__)
^
/tmp/AMD_2175_18/t_2175_20.cl:14:2: error: OpenCL >= 2.0 required
#error OpenCL >= 2.0 required
^
1 warning and 1 error generated.
2019-12-10 14:46:11 OpenCL compilation error -11 (args -DEXP=98563771u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0x1.e0dea836fdc34p+0 -DIWEIGHT_STEP=0x1.1092a0edb09cep-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DAMDGPU=1 -DMERGED_MIDDLE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-12-10 14:46:11 Error: Failed to compile opencl source (from CL to LLVM IR).
2019-12-10 14:46:11 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:234 build
2019-12-10 14:46:11 Bye
The file 'worktodo.txt' recorded lines.
After completing the calculation of the first number, the program can delete in next line the first character in the line (I noted for the "DoubleCheck=") or the second character in the line (I noted for the "Test="), further which the program skips the working line and proceeds to the next line.
gpuOwL version: v1.9-
OS type [version]: Windows x64 [6.1.7601]
worktodo.txt line '[Worker #1]
worktodo.txt line 'Tst=77002949
worktodo.txt line 'oubleCheck=0,60004433,76,1
Line 61 in f34ad18
If the checkpoint is invalid, load *-prev.owl, and overwrite the last checkpoint file.
2019-12-09 03:42:18 OpenCL compilation in 1.56 s
2019-12-09 03:42:18 '/home/xxx/gpuowl/98563771/98563771.owl' invalid
2019-12-09 03:42:18 '/home/xxx/gpuowl/98563771/98563771-old.owl' invalid
2019-12-09 03:42:18 Exiting because "invalid savefiles found, investigate why
I thought this is a extremely rare condition but it happened again.
2019-05-23 14:40:05 Note: no config.txt file found
2019-05-23 14:40:05 config: -prp 82589933 -device 0
2019-05-23 14:40:05 82589933 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.50 bits/word
2019-05-23 14:40:05 using short carry kernels
2019-05-23 14:40:09 OpenCL args "-DEXP=82589933u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DFRAC=9280343354015947889ul -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-23 14:40:10 OpenCL compilation error -11 (args -DEXP=82589933u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DFRAC=9280343354015947889ul -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-05-23 14:40:10 /tmp/OCL1986T0.cl:183:3: error: implicit declaration of function '__asm' is invalid in C99
X2(u[0], u[2]);
^
/tmp/OCL1986T0.cl:150:2: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x));
^
/tmp/OCL1986T0.cl:183:3: error: expected ')'
/tmp/OCL1986T0.cl:150:35: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x));
^
/tmp/OCL1986T0.cl:183:3: note: to match this '('
/tmp/OCL1986T0.cl:150:7: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x));
^
/tmp/OCL1986T0.cl:183:3: error: expected ')'
X2(u[0], u[2]);
^
/tmp/OCL1986T0.cl:151:35: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y));
^
/tmp/OCL1986T0.cl:183:3: note: to match this '('
/tmp/OCL1986T0.cl:151:7: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y));
^
/tmp/OCL1986T0.cl:184:3: error: expected ')'
X2_mul_t4(u[1], u[3]);
^
/tmp/OCL1986T0.cl:172:35: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x));
^
/tmp/OCL1986T0.cl:184:3: note: to match this '('
/tmp/OCL1986T0.cl:172:7: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x));
^
/tmp/OCL1986T0.cl:184:3: error: expected ')'
X2_mul_t4(u[1], u[3]);
^
/tmp/OCL1986T0.cl:173:35: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.y), "v" (b.y));
^
/tmp/OCL1986T0.cl:184:3: note: to match this '('
/tmp/OCL1986T0.cl:173:7: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.y), 2019-05-23 14:40:10 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:215 build
2019-05-23 14:40:10 Bye
[GCD.cpp:38]: (warning) Assert statement calls a function which may have desired side effects: 'isOngoing'.
[GCD.h:11]: (style) The class 'GCD' does not have a constructor.
[kernel.h:31]: (warning) Member variable 'Kernel::timeSum' is not initialized in the constructor.
[kernel.h:31]: (warning) Member variable 'Kernel::nCalls' is not initialized in the constructor.
[clwrap.h:78]: (style) Class 'Queue' has a constructor with 1 argument that is not explicit.
[Primes.h:18]: (style) Class 'Primes' has a constructor with 1 argument that is not explicit.
[./Result.cpp:24]: (information) Skipping configuration 'REV' since the value of 'REV' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
[Worktodo.cpp:26]: (warning) %d in format string (no. 2) requires 'int *' but the argument type is 'unsigned int *'.
[common.cpp:16]: (warning) Return value of function fopen() is not used.
[common.cpp:16]: (error) Return value of allocation function 'fopen' is not stored.
[./gpuowl.cpp:13]: (information) Skipping configuration 'REV' since the value of 'REV' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly.
(information) Cppcheck cannot find all the include files (use --check-config for details)
It states that gpuowl uses only 8M and 16M FFT lengths but this is no more the case.
/gpuowl# make
echo "git describe --long --dirty
" > version.inc
fatal: No names found, cannot describe anything.
echo Version: cat version.inc
Version: ""
g++ -Wall -O2 -std=c++17 -Wall Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp state.cpp Signal.cpp FFTConfig.cpp -o gpuowl -lOpenCL -lgmp -lstdc++fs -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L.
An exponent > 86.5M started with 4M FFT produced errors at around 20%. Same exponent restarted with 5M FFT now appears going OK.
gpuowl -prp 44497 -device 0
2019-06-28 09:21:57 gpuowl v6.5-82-g77b45a4
2019-06-28 09:21:57 Note: no config.txt file found
2019-06-28 09:21:57 config: -prp 44497 -device 0
2019-06-28 09:21:57 44497 FFT 8K: Width 8x8, Height 8x8; 5.43 bits/word
2019-06-28 09:21:57 using long carry kernels
2019-06-28 09:21:58 OpenCL args "-DEXP=44497u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=1u -DWEIGHT_STEP=0x1.7b92f0a414e05p+0 -DIWEIGHT_STEP=0x1.59503de66e177p-1 -DWEIGHT_BIGSTEP=0x1.d5818dcfba487p+0 -DIWEIGHT_BIGSTEP=0x1.172b83c7d517bp-1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
/tmp/AMD_1217_35/t_1217_37.cl:1267:34: warning: array index 1 is past the end of the array (which contains 1 element) [-Warray-bounds]
steps[i] = mul(steps[i-1], steps[1]);
^ ~
/tmp/AMD_1217_35/t_1217_37.cl:1262:3: note: array 'steps' declared here
T2 steps[MIDDLE];
^
1 warning generated.
2019-06-28 09:21:59 OpenCL compilation in 1422 ms
2019-06-28 09:21:59 44497.owl loaded: k 2000, block 1000, res64 020904e660c53abb
2019-06-28 09:22:00 44497 OK 4000 8.89%; 121 us/sq; ETA 0d 00:00; 4d7b13d03f9c5720 (check 0.12s)
2019-06-28 09:22:02 44497 20000 44.44%; 121 us/sq; ETA 0d 00:00; e0fc41c8eadc4e96
2019-06-28 09:22:04 44497 40000 88.89%; 121 us/sq; ETA 0d 00:00; 9b4920985d079c24
2019-06-28 09:22:05 PP 44496 / 44497, fffffffffffffffc
2019-06-28 09:22:05 44497 OK 45000 100.00%; 121 us/sq; ETA 0d 00:00; 5ad3f1cd9c12bc86 (check 0.12s)
2019-06-28 09:22:05 {"exponent":"44497", "worktype":"PRP-3", "status":"P", "program":{"name":"gpuowl", "version":"v6.5-82-g77b45a4"}, "timestamp":"2019-06-28 07:22:05 UTC", "fft-length":8192, "res64":"fffffffffffffffc", "residue-type":4}
2019-06-28 09:22:05 Bye
I saw the application on the forum. Unfortunately, it is not clear which parameters are required to run the application.
The "selftest" parameter remained in the description, but it was removed from the program:
Lines 54 to 55 in cb09cb2
Does the errors affect the calculations or can I ignore it? The parameter "legacy" produces the same error (But the end of the calculated runs through 6 days. That for 1 day is less than without the parameter.)
gpuOwL v1.9- GPU Mersenne primality checker
AMD Radeon (TM) R7 370 Series 16 @1:0.0, Pitcairn 1015MHz [win7-x64]
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEX
P=77002949u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 )
An invalid option was specified.
".\gpuowl.cl", line 67: warning: OpenCL extension is now part of core
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
OpenCL compilation in 1762 ms, with "-I. -cl-fast-relaxed-math -DEXP=77002949u
-DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
./gpuowl -h
2019-04-14 13:01:47 gpuowl 005297a
2019-04-14 13:01:47 config: -h
Tested 44497 and 86243.
The result is "C".
Now that the firmware files are in place, finally the GPU is working. The Radeon RX5700XT does not work with gpuOwl for doing PRP, but works with Mfakto for doing Trial Factoring. GpuOwl doing PRP always gives error (EE lines) and after 3 consecutive errors it quits.
Hello I would like to know how I can automate the work fetch in gpuowl.
Line 85 in 5e5b30d
TypeError: int() argument must be a string
Hi,
I'm running cudaowl with Arch Linux and Cuda 9.20, on a GTX 960. The CPU usage of cudaowl stays close to 100% constantly.
I found that the CPU usage can be reduced significantly, to 2.6%, by adding a cudaDeviceSynchronize();
call in CudaGpu.h, line 225. This is at the end of the for loop in modSqLoop()
. I guess this has something to do with cuda busy-waiting for the kernels to finish. With an explicit synchronization call, the CPU code goes to sleep instead (you set the cudaDeviceScheduleBlockingSync flag).
The synchronization has a small performance impact however, time per iteration increases from 18.85 ms to 19.10 ms. On the other hand, the power consumption of the whole computer drops by about 30W (no other significant CPU load), so the impact seems worth it for me. I'm testing M(90000881), FFT 4860K, 18.08 bits/word.
Greetings,
Fredrik
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.