Comments (16)
2019-09-07 12:06:44 90348611 33410000 36.98%; 886 us/sq; ETA 0d 14:01; bac38bb8e27196e5 2019-09-07 12:06:53 90348611 33420000 36.99%; 886 us/sq; ETA 0d 14:01; 5dc04e6cd38ab191 2019-09-07 12:07:02 90348611 33430000 37.00%; 887 us/sq; ETA 0d 14:01; b91d6d315cae4932 Queue at 0x7f23e803a000 inactivated due to async error: HSA_STATUS_ERROR_ILLEGAL_INSTRUCTION: The agent attempted to execute an illegal shader instruction.
This needs reboot.
I don't know if gpuowl registers an error in this case, I don't think so. This is a severe error that blocks the program. The nErrors indication can only capture certain events, I would say "less severe than this one".
from gpuowl.
Hi, I've never encountered this error myself; probably I'll have to wait until I can repro.
from gpuowl.
Hi Mihai, this was a one-time error, never reproduced myself, but I have 2 radeon7 and both show the same computation errors including the all-zero residue error. I have tested them also on separate and different mainboards and on Debian and on Ubuntu, the computation errors are common. It seems to me that the dealer got a batch of buggy Radeon VII.
from gpuowl.
I don't know, I also don't see the all-zero.. Could be many things causing it... we need more information.
from gpuowl.
One thing I can say is that on all-zero occurrence corresponds a page fault.
from gpuowl.
Also, more information here: On occurrence of all-zero error, the error is repeated over and over until the next Gerbicz Check, which fails, then on reload the error may disappear. Then it may reappear randomly. I have also seen 3 consecutive errors, which make gpuowl exit. I have observed scrupulously this behaviour, the error rate tend to increase with temperature. By cooling the gpu very well I can keep this error to a minimum of occurrences. But still, I cannot eliminate it reliably.
Tested on two different mainboards, and cpus, ram, hard disk, with two different Radeon VII.
from gpuowl.
Just happened again, on the dual radeon 7 system, the gpu in error is at rest now, gpuowl has been killed, but the other gpu is still working and computing. I thought the error was more severe, but I need to reboot to restart the gpu in error.
from gpuowl.
Are you using PCIe raisers?
from gpuowl.
Are you using PCIe raisers?
No. ROCm doesn't support pci risers. Risers are a thing of the past for me.
Maybe the source of errors is some other component involved in the computation.
from gpuowl.
Are you using PCIe raisers?
No. ROCm doesn't support pci risers. Risers are a thing of the past for me.
Maybe the source of errors is some other component involved in the computation.
However, Radeon VII is the only cpu model to see these errors. Other gpus I have, RX580 and Verga64 never seen a single error...
from gpuowl.
I typed an r in excess, that's Vega64 !
Well, I will investigate if the RAM is suffering from being too near the CPU cooler fan.
This is a new account I created to divide my work.
from gpuowl.
I went on and installed Debian 10.1 with ROCm 2.8, this seems to have reduced the errors a great amount, and the all-zero residue error has not occurred until now.
from gpuowl.
I typed an r in excess, that's Vega64 !
Well, I will investigate if the RAM is suffering from being too near the CPU cooler fan.
This is a new account I created to divide my work.
I will just use mprime stress test to verify the RAM.
from gpuowl.
from gpuowl.
Are you overclocking the GPU RAM, or undervolting? if so, maybe that is too aggressive.
from gpuowl.
Are you overclocking the GPU RAM, or undervolting? if so, maybe that is too aggressive.
The irony is that I never touch voltage/clock settings, it is just that I have found a way to cool the gpu very well. With Debian 10.1 things are going better, the number of errors has reduced by 90%
from gpuowl.
Related Issues (20)
- README.md includes CLI options (-pm1) that have been removed
- Gpuowl cannot run with ROCm 4.1.0/Navi 10 (Radeon RX 5700 XT) HOT 3
- Trial Factoring with gpuOwl HOT 13
- NTT2 branch (Radeon Pro VII) HOT 1
- Does gpuowl support integrated AMD gpu? HOT 2
- Error compiling cudaowl HOT 7
- Wiki Typo HOT 2
- Error compiling on Linux HOT 1
- KERNEL_INVALID - Kriesel's mingw64 guide from mersenneforum.org for Windows Compile Version: "v7.2-91-g9c22195" HOT 3
- Performance degradation on Nvidia GPUs HOT 5
- Cannot compile version 7.2 on Arch based Linuxes HOT 6
- Pm1Plan missing on master ? HOT 2
- Performance regression on Ubuntu 22.0 with ROCm 5.4.3/5.4.5/5.5 and latest gpuOwl version, exponent 114710069 HOT 8
- Progress spinner disappeared HOT 1
- Are we ever going to get a cute gpuOwl logo? HOT 2
- Support rusticl runtime HOT 9
- error: variable has address space that is not supported in program scope declaration when running gpuowl HOT 4
- P-1 factoring is not working
- Document work-{n}.txt work files HOT 2
- Fails to compile on g++ 14.1 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpuowl.