Comments (6)
Hi, we require version of 0.1.69 jaxlib to be able to use CUDA unified memory for running long sequences. If you don't need this you can probably run with 0.1.68, but that might be related to the illegal address error that you see. How long was the sequence you were trying to run?
Some of the other open issues about CUDA versions might also be of help.
from alphafold.
I was benchmarking with the CASP14 targets. T1026 (172 residues) raised the issue.
I realized that some of the targets still have issues of the CUDA_ERROR_ILLEGAL_ADDRESS, even though I used jax==0.2.17 and jaxlib==0.1.68+cuda110. Those targets were running okay on CPUs.
For my system information,
- NVIDIA driver: 450.36.06
- CUDA version: 11.0
- jax: 0.1.68
- jaxlib: 0.1.68+cuda110
- tensorflow: 2.5.0
from alphafold.
That's a very small protein, so I'm surprised it's an issue. What GPU are you using? Is it possible to try using the Dockerfile?
You could try disabling unified memory by commenting out these two lines in your script, if you have them:
https://github.com/deepmind/alphafold/blob/main/docker/run_docker.py#L171-L172
from alphafold.
I tested with Quadro RTX 6000 and RTX 2080Ti.
I have tested with
(1) jaxlib==0.1.68+cuda110, jax==0.2.17, cudatoolkit=11.0.3 for my custom non-Docker version
(2) jaxlib==0.1.69+cuda110, jax==0.2.17, cudatoolkit=11.0.3 for my custom non-Docker version
(3) (1) or (2) + commenting out the two lines for the unified memory
(4) the same as (2), but with a docker container (the original one)
There was no issue with the (4)... So, there may be some differences between my non-Docker version and the original Docker version... (I thought I implemented my custom non-Docker version with the exact same version of libraries...) I will try it again.
from alphafold.
@huhlim Did you solve your CUDA_ERROR_ILLEGAL_ADDRESS problems? I just ran ~100 proteins from an internal sample, and this cropped up for me in some cases. As I investigate, it would be helpful if you follow-up here with anything you learned and/or how you resolved your problem. (I am using Docker at an A100)
from alphafold.
@christroat I could not fully resolve the issue. When I turned off the jax.jit compilation of models (initialization of the RunModel class in alphafold/model/model.py), it reduced the chance of the error but did not resolve the issue. I have not had the issue with my Docker system, so I guess my problem is related to our cluster setup... Unfortunately, I gave up to tackle the issue.
from alphafold.
Related Issues (20)
- AlphaFold uses the wrong resolution field during structure parsing HOT 2
- --model_preset=multimer: command not found
- How do I know when to update my local database?
- Issue with missing templates during Multimer Mode running
- RuntimeError: HHblits failed. Unrecognized HMM file format in '468479486'
- Minimizing failing due to CUDA error
- AttributeError: module 'haiku' has no attribute 'custom_creator' HOT 1
- ValueError: Minimization failed after 100 attempts. HOT 3
- AttributeError: module 'jax' has no attribute 'linear_util' HOT 4
- I am unable to run the fourth step normally, the following is the error code HOT 5
- Problem with step four in colabpro
- Database Installation
- How to get iPAE
- The definition of bins in Predicted Aligned Error Head(PAE) may be wrong
- Can I use a custom PDB file for multimer prediction
- An error free installation package HOT 1
- FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpzszd_m92/output.a3m'
- Open Access
- Downloaded cif file from open access doesn't have secondary structure assignments HOT 2
- Specifying Disulphide bonds to AlphaFold HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alphafold.