kalininalab / alphafold_non_docker Goto Github PK
View Code? Open in Web Editor NEWAlphaFold2 non-docker setup
AlphaFold2 non-docker setup
Dear author:
I followed the steps to configure the environment(CPU), but at the end run_alphafold.sh reported an error:
/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/flags/validators.py:203: UserWarning: Flag --preset has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
I0811 03:41:17.217879 140270426367808 templates.py:837] Using precomputed obsolete pdbs ./DOWNLOAD_DIR/pdb_mmcif/obsolete.dat.
I0811 03:41:18.356215 140270426367808 tpu_client.py:54] Starting the local TPU driver.
I0811 03:41:18.357059 140270426367808 xla_bridge.py:214] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local://
2021-08-11 03:41:18.358851: W external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-08-11 03:41:18.358934: W external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
I0811 03:41:18.359137 140270426367808 xla_bridge.py:214] Unable to initialize backend 'gpu': Failed precondition: No visible GPU devices.
I0811 03:41:18.359338 140270426367808 xla_bridge.py:214] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
W0811 03:41:18.359463 140270426367808 xla_bridge.py:217] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I0811 03:41:19.314316 140270426367808 run_alphafold.py:260] Have 1 models: ['model_1']
I0811 03:41:19.314738 140270426367808 run_alphafold.py:273] Using random seed 4975129475860990710 for the data pipeline
I0811 03:41:19.336503 140270426367808 jackhmmer.py:130] Launching subprocess "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmp_1s6fhn/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ./example/query.fasta ./DOWNLOAD_DIR/uniref90/uniref90.fasta"
I0811 03:41:19.413159 140270426367808 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0811 03:50:05.036837 140270426367808 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 525.623 seconds
I0811 03:50:05.040448 140270426367808 jackhmmer.py:130] Launching subprocess "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpqrhjvvgw/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ./example/query.fasta ./DOWNLOAD_DIR/mgnify/mgy_clusters.fa"
I0811 03:50:05.241362 140270426367808 utils.py:36] Started Jackhmmer (mgy_clusters.fa) query
I0811 04:01:40.166194 140270426367808 utils.py:40] Finished Jackhmmer (mgy_clusters.fa) query in 694.879 seconds
I0811 04:01:40.621608 140270426367808 hhsearch.py:76] Launching subprocess "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/bin/hhsearch -i /tmp/tmp996yhgxj/query.a3m -o /tmp/tmp996yhgxj/output.hhr -maxseq 1000000 -d ./DOWNLOAD_DIR/pdb70/pdb70"
I0811 04:01:40.838742 140270426367808 utils.py:36] Started HHsearch query
I0811 04:12:59.336633 140270426367808 utils.py:40] Finished HHsearch query in 678.436 seconds
I0811 04:12:59.917971 140270426367808 hhblits.py:128] Launching subprocess "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/bin/hhblits -i ./example/query.fasta -cpu 4 -oa3m /tmp/tmpyemalf6z/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d ./DOWNLOAD_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d ./DOWNLOAD_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0811 04:13:00.089679 140270426367808 utils.py:36] Started HHblits query
I0811 04:13:23.778954 140270426367808 utils.py:40] Finished HHblits query in 23.689 seconds
E0811 04:13:23.779619 140270426367808 hhblits.py:138] HHblits failed. HHblits stderr begin:
E0811 04:13:23.779794 140270426367808 hhblits.py:141] - 04:13:23.681 ERROR: Could find neither hhm_db nor a3m_db!
E0811 04:13:23.779950 140270426367808 hhblits.py:142] HHblits stderr end
Traceback (most recent call last):
File "/lustre/user/lulab/gaojd/whr/alphafold/run_alphafold.py", line 303, in
app.run(main)
File "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/lustre/user/lulab/gaojd/whr/alphafold/run_alphafold.py", line 277, in main
predict_structure(
File "/lustre/user/lulab/gaojd/whr/alphafold/run_alphafold.py", line 127, in predict_structure
feature_dict = data_pipeline.process(
File "/lustre/user/lulab/gaojd/whr/alphafold/alphafold/data/pipeline.py", line 170, in process
hhblits_bfd_uniclust_result = self.hhblits_bfd_uniclust_runner.query(
File "/lustre/user/lulab/gaojd/whr/alphafold/alphafold/data/tools/hhblits.py", line 143, in query
raise RuntimeError('HHblits failed\nstdout:\n%s\n\nstderr:\n%s\n' % (
RuntimeError: HHblits failed
stdout:
stderr:
I don't know what caused this result, so I want some help.
Thanks!
Hi,
There was a recent release of Alphafold, v.2.3.0, which amongst other things, improves the GPU VMEM efficiency of some parts of the computation.
https://github.com/deepmind/alphafold/releases/tag/v2.3.0
There don't seem to be any (?) changes in the parameters for the CLI, thus would a simple 'git pull' suffice for having the alphafold_non_docker installation updated?
The only parameter update I can see is:
Thx in advance,
I am recieving the following error when trying to run the example. Do you have nay idea? ./example/query.fasta
is definitely a string.
alphafold$ bash run_alphafold.sh -d ./alphafold_data/ -o ./dummy_test/ -m model_1 -f ./example/query.fasta -t 2021-07-27
File "/home/user/alphafold/run_alphafold.py", line 96
fasta_path: str,
^
SyntaxError: invalid syntax
Hi! I installed alphafold following the non_docker option using the reduced version of the databases (reduced_dbs mode), and I have this error:
bash run_alphafold.sh -d /home/k.ruiz/alphafold_data -o /home/k.ruiz/rnaseq/alphafold/output -f /home/k.ruiz/rnaseq/alphafold/input/MSTRG.4643.1_3_RBP3.fasta -t 2020-05-14
I0725 12:53:28.340466 140062189004608 templates.py:857] Using precomputed obsolete pdbs /home/k.ruiz/alphafold_data/pdb_mmcif/obsolete.dat.
E0725 12:53:28.343733 140062189004608 hhblits.py:82] Could not find HHBlits database /home/k.ruiz/alphafold_data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
Traceback (most recent call last):
File "/home/k.ruiz/alphafold-2.2.0/run_alphafold.py", line 422, in <module>
app.run(main)
File "/home/k.ruiz/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/k.ruiz/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/k.ruiz/alphafold-2.2.0/run_alphafold.py", line 338, in main
monomer_data_pipeline = pipeline.DataPipeline(
File "/home/k.ruiz/alphafold-2.2.0/alphafold/data/pipeline.py", line 138, in __init__
self.hhblits_bfd_uniclust_runner = hhblits.HHBlits(
File "/home/k.ruiz/alphafold-2.2.0/alphafold/data/tools/hhblits.py", line 83, in __init__
raise ValueError(f'Could not find HHBlits database {database_path}')
ValueError: Could not find HHBlits database /home/k.ruiz/alphafold_data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
However, when I try to edit the run_alphafold.py file following this thread, lines 76 and 77 look different from the ones mentioned there, as my run_alphafold.py looks like:
flags.DEFINE_string('uniclust30_database_path', None, 'Path to the Uniclust30 '
'database for use by HHblits.')
Is there any other solution?
Thanks!
Hello, Kalininalab support member!
I am trying alphafold 2.1.1 and this non docker version of set up. Thank you very much for providing this non-docker version, it helps us to set up on HPC cluster.
The testing I experienced is can't find CIFs,
(alphafold-2.1.1) [user@cdragon096 alphafold]$ bash run_alphafold.sh -d $HOME/alphafold/alphafold_data -o ./dummy_test/ -m model_1 -f $HOME/alphafold/alphafold_non_docker/example/query.fasta -t 2020-05-14 -g False
Unknown model preset! Using default ('monomer')
E1118 12:13:41.684854 46912496434880 templates.py:837] Could not find CIFs in $HOME/alphafold/alphafold_data/pdb_mmcif/mmcif_files
Traceback (most recent call last):
File "/home/ryao/alphafold/run_alphafold.py", line 427, in
app.run(main)
File "/risapps/rhel7/python/3.7.3/envs/alphafold-2.1.1/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/risapps/rhel7/python/3.7.3/envs/alphafold-2.1.1/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/ryao/alphafold/run_alphafold.py", line 341, in main
template_featurizer = templates.HhsearchHitFeaturizer(
File "$HOME/alphafold/alphafold/data/templates.py", line 838, in init
raise ValueError(f'Could not find CIFs in {self._mmcif_dir}')
ValueError: Could not find CIFs in $HOME/alphafold_data/pdb_mmcif/mmcif_files
(alphafold-2.1.1)
Would you please advise if I have missing anything?
Regards,
Rong
Please change Download MGnify database
in download_db.sh
to the following
# Download MGnify database
echo "Downloading MGnify database"
mgnify_filename="mgy_clusters_2018_12.fa.gz"
wget -P "$mgnify" "https://storage.googleapis.com/alphafold-databases/casp14_versions/${mgnify_filename}"
(cd "$mgnify" && gunzip "$mgnify/$mgnify_filename")
File "/home/ngayatri/alphafold/alphafold/data/templates.py", line 137, in _parse_obsolete
with open(obsolete_file_path) as f:
IsADirectoryError: [Errno 21] Is a directory: '/home/ngayatri/mmcif_ob1/rsync.rcsb.org/pub/pdb/data/structures/obsolete/mmCIF'
I have downloaded the data with wget command from this website rsync.rcsb.org/pub/pdb/data/structures/obsolete/mmCIF
can you help me with error what exactly is it trying to find
As was pointed out in the twitter aria2c is a non-standard package, especially in HPC environments. Adding it in conda env and modifying the manual in a way, such as DB download would already be in the activated conda env might be a nice move.
Hello,
I had a problem with running alphafold. The first two hours are very smooth, and I think the MSA part is finished in these two hours. However, when it showd:
I0905 13:06:56.466166 140453353674560 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (691, 691, 64)}, 'experimentally_resolved': {'logits': (691, 37)}, 'masked_msa': {'logits': (252, 691, 22)}, 'predicted_aligned_error': (691, 691), 'predicted_lddt': {'logits': (691, 50)}, 'structure_module': {'final_atom_mask': (691, 37), 'final_atom_positions': (691, 37, 3)}, 'plddt': (691,), 'aligned_confidence_probs': (691, 691, 64), 'max_predicted_aligned_error': (), 'ptm': (), 'iptm': (), 'ranking_confidence': ()}
I0905 13:06:56.467109 140453353674560 run_alphafold.py:202] Total JAX model model_1_multimer_v2_pred_0 on VHVL predict time (includes compilation time, see --benchmark): 246.2s
This step takes forever. I checked the CPU usage, memory usage, and the GPU usage and they are:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
35488 dell 20 0 69.9g 4.8g 594148 R 100.0 3.8 1591:11 python /h+
total used free shared buff/cache available
Mem: 128357 6557 1730 106 120069 121081
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:3B:00.0 Off | N/A |
| 30% 33C P2 101W / 320W | 5886MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:5E:00.0 Off | N/A |
| 30% 25C P0 88W / 320W | 0MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:B1:00.0 Off | N/A |
| 30% 25C P0 89W / 320W | 0MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... Off | 00000000:D9:00.0 Off | N/A |
| 30% 25C P0 94W / 320W | 0MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 35488 C python 1020MiB |
+-----------------------------------------------------------------------------+
The GPU memory is not very high since I saw some people's A100 had a menory usage with over 20000MiB. What's more, the GPU-Util is only 0-1%. I'm not sure whether it's because the graphic driver/CUDA/CUDNN/JAX versions are not matched (driver version: 515.43.04, CUDA version: 11.7, CUDNN version: 8.4.1.50, jaxlib version: 0.3.15+cuda11.cudnn82, python version: 3.8). I didn't see any error log, but it just didn't move on for over 30 hours. I also used 'conda activate alphafold' and tested in python3:
import torch
print(torch.cuda.is_available())
True
from torch.backends import cudnn
print(cudnn.is_available())
True
It seems that the CUDA and CUDNN works. So I'm confused and did anyone have this problem before and could you please kindly teach me how to solve it? Thanks a lot for your kind guide.
Who could do me a favor?
when I run AF2, came up with HHblits error.
script:
python ./run_alphafold.py --fasta_paths=XXX.fas
results:
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --output_dir has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --model_names has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --data_dir has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --preset has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --uniref90_database_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --mgnify_database_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --uniclust30_database_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --bfd_database_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --pdb70_database_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --template_mmcif_dir has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --max_template_date has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
/home/linlab/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --obsolete_pdbs_path has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
I0827 14:21:24.269093 140066848117952 templates.py:880] Using precomputed obsolete pdbs /data1/AF2_Database/pdb_mmcif/obsolete.dat.
I0827 14:21:26.650146 140066848117952 xla_bridge.py:236] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker:
I0827 14:21:27.050851 140066848117952 xla_bridge.py:236] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
I0827 14:21:38.779491 140066848117952 run_alphafold.py:293] Have 5 models: ['model_1', 'model_2', 'model_3', 'model_4', 'model_5']
I0827 14:21:38.779775 140066848117952 run_alphafold.py:306] Using random seed 8466485706823161682 for the data pipeline
I0827 14:21:38.780414 140066848117952 pipeline.py:130] query uniref90
I0827 14:21:38.780694 140066848117952 jackhmmer.py:119] Launching subprocess "jackhmmer -o /dev/null -A /tmp/tmpng40rd11/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 GARS_insertion.fas /data1/AF2_Database/uniref90/uniref90.fasta"
I0827 14:21:38.840389 140066848117952 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0827 14:28:35.052657 140066848117952 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 416.212 seconds
I0827 14:28:35.053834 140066848117952 pipeline.py:141] query mgnify
I0827 14:28:35.054065 140066848117952 jackhmmer.py:119] Launching subprocess "jackhmmer -o /dev/null -A /tmp/tmp48u78l3n/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 GARS_insertion.fas /data1/AF2_Database/mgnify/mgy_clusters.fa"
I0827 14:28:35.103244 140066848117952 utils.py:36] Started Jackhmmer (mgy_clusters.fa) query
I0827 14:36:06.724052 140066848117952 utils.py:40] Finished Jackhmmer (mgy_clusters.fa) query in 451.621 seconds
I0827 14:36:06.727063 140066848117952 pipeline.py:153] query mgnify
I0827 14:36:06.727788 140066848117952 hhblits.py:128] Launching subprocess "hhblits -i GARS_insertion.fas -cpu 4 -oa3m /tmp/tmptyl5pi44/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /data1/AF2_Database/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /data1/AF2_Database/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0827 14:36:06.858779 140066848117952 utils.py:36] Started HHblits query
I0827 14:38:43.267431 140066848117952 utils.py:40] Finished HHblits query in 156.408 seconds
E0827 14:38:43.267613 140066848117952 hhblits.py:138] HHblits failed. HHblits stderr begin:
E0827 14:38:43.267664 140066848117952 hhblits.py:141] - 14:36:31.573 INFO: Searching 65983866 column state sequences.
E0827 14:38:43.267698 140066848117952 hhblits.py:141] - 14:36:32.498 INFO: Searching 15161831 column state sequences.
E0827 14:38:43.267728 140066848117952 hhblits.py:141] - 14:36:32.569 INFO: GARS_insertion.fas is in A2M, A3M or FASTA format
E0827 14:38:43.267756 140066848117952 hhblits.py:141] - 14:36:32.569 INFO: Iteration 1
E0827 14:38:43.267784 140066848117952 hhblits.py:141] - 14:36:32.607 INFO: Prefiltering database
E0827 14:38:43.267811 140066848117952 hhblits.py:141] - 14:37:27.399 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 735240
E0827 14:38:43.267838 140066848117952 hhblits.py:141] - 14:38:40.391 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 198439
E0827 14:38:43.267866 140066848117952 hhblits.py:141] - 14:38:41.184 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 2000
E0827 14:38:43.267893 140066848117952 hhblits.py:141] - 14:38:41.184 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 2000
E0827 14:38:43.267920 140066848117952 hhblits.py:141] - 14:38:41.184 INFO: Scoring 2000 HMMs using HMM-HMM Viterbi alignment
E0827 14:38:43.267946 140066848117952 hhblits.py:141] - 14:38:41.286 INFO: Alternative alignment: 0
E0827 14:38:43.267974 140066848117952 hhblits.py:142] HHblits stderr end
Traceback (most recent call last):
File "../run_alphafold.py", line 338, in
app.run(main)
File "/home/linlab/.local/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/linlab/.local/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "../run_alphafold.py", line 310, in main
predict_structure(
File "../run_alphafold.py", line 170, in predict_structure
feature_dict = data_pipeline.process(
File "/home/linlab/Applications/alphafold/alphafold/data/pipeline.py", line 154, in process
hhblits_bfd_uniclust_result = self.hhblits_bfd_uniclust_runner.query(
File "/home/linlab/Applications/alphafold/alphafold/data/tools/hhblits.py", line 143, in query
raise RuntimeError('HHblits failed\nstdout:\n%s\n\nstderr:\n%s\n' % (
RuntimeError: HHblits failed
stdout:
stderr:
14:36:31.573 INFO: Searching 65983866 column state sequences.
14:36:32.498 INFO: Searching 15161831 column state sequences.
14:36:32.569 INFO: GARS_insertion.fas is in A2M, A3M or FASTA format
14:36:32.569 INFO: Iteration 1
14:36:32.607 INFO: Prefiltering database
14:37:27.399 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 735240
14:38:40.391 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 198439
14:38:41.184 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 2000
14:38:41.184 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 2000
14:38:41.184 INFO: Scoring 2000 HMMs using HMM-HMM Viterbi alignment
14:38:41.286 INFO: Alternative alignment: 0
Hi~Thanks for your work. And I have run alphafold without docker successfully following your tutorial. But the output PDB file may be something wrong:
The B-factor field, which is stored pLDDT confidence measure, of the output PDB file is always 0 in my test work based on the example query.fasta.
And I saw the same result in your dummy_test/query/anked_0.pdb.
Then I tried to use the AlphaFold Colab notebook demo for validation. The result from AlphaFold Colab notebook seems no problem.
So is there any way to fix it ?
Thanks a lot.
Hi,
I am trying alphafold_non_docker on a small-ish GPU (2Gb) with a small test protein (same as in ColabFold). I am getting this indecipherable error, hopefully someone can illuminate what's happening:
I1116 11:22:05.455623 139818632742720 model.py:131] Running predict with shape(feat) = {'aatype': (4, 59), 'residue_index': (4, 59), 'seq_length': (4,), 'template_aatype': (4, 4, 59), 'template_all_atom_masks': (4, 4, 59, 37), 'template_all_atom_positions': (4, 4, 59, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 59), 'msa_mask': (4, 508, 59), 'msa_row_mask': (4
, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 59, 3), 'template_pseudo_beta_mask': (4, 4, 59), 'atom14_atom_exists': (4, 59, 14), 'residx_atom14_to_atom37': (4, 59, 14), 'residx_atom37_to_atom14': (4, 59, 37), 'atom37_atom_exists': (4, 59, 37), 'extra_msa': (4, 5120, 59), 'extra_msa_mask': (4, 5120, 59), 'extra_msa_row_mask': (4, 5120), 'bert_m
ask': (4, 508, 59), 'true_msa': (4, 508, 59), 'extra_has_deletion': (4, 5120, 59), 'extra_deletion_value': (4, 5120, 59), 'msa_feat': (4, 508, 59, 49), 'target_feat': (4, 59, 22)}
Traceback (most recent call last):
File "/home/user/alphafold/run_alphafold.py", line 310, in <module>
app.run(main)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/user/alphafold/run_alphafold.py", line 284, in main
predict_structure(
File "/home/user/alphafold/run_alphafold.py", line 149, in predict_structure
prediction_result = model_runner.predict(processed_feature_dict)
File "/home/user/alphafold/alphafold/model/model.py", line 133, in predict
result = self.apply(self.params, jax.random.PRNGKey(0), feat)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/transform.py", line 125, in apply_fn
out, state = f.apply(params, {}, *args, **kwargs)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/transform.py", line 313, in apply_fn
out = f(*args, **kwargs)
File "/home/user/alphafold/alphafold/model/model.py", line 59, in _forward_fn
return model(
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/module.py", line 428, in wrapped
out = f(*args, **kwargs)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/module.py", line 279, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/user/alphafold/alphafold/model/modules.py", line 376, in __call__
_, prev = hk.while_loop(
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/stateful.py", line 610, in while_loop
val, state = jax.lax.while_loop(pure_cond_fun, pure_body_fun, init_val)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/stateful.py", line 605, in pure_body_fun
val = body_fun(val)
File "/home/user/alphafold/alphafold/model/modules.py", line 369, in <lambda>
get_prev(do_call(x[1], recycle_idx=x[0],
File "/home/user/alphafold/alphafold/model/modules.py", line 337, in do_call
return impl(
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/module.py", line 428, in wrapped
out = f(*args, **kwargs)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/module.py", line 279, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/user/alphafold/alphafold/model/modules.py", line 161, in __call__
representations = evoformer_module(batch0, is_training)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/module.py", line 428, in wrapped
out = f(*args, **kwargs)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/module.py", line 279, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/user/alphafold/alphafold/model/modules.py", line 1764, in __call__
template_pair_representation = TemplateEmbedding(c.template, gc)(
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/module.py", line 428, in wrapped
out = f(*args, **kwargs)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/module.py", line 279, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/user/alphafold/alphafold/model/modules.py", line 2059, in __call__
template_pair_representation = mapping.sharded_map(map_fn, in_axes=0)(
File "/home/user/alphafold/alphafold/model/mapping.py", line 182, in mapped_fn
outputs, _ = hk.scan(scan_iteration, outputs, slice_starts)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/stateful.py", line 504, in scan
(carry, state), ys = jax.lax.scan(
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/stateful.py", line 487, in stateful_fun
carry, out = f(carry, x)
File "/home/user/alphafold/alphafold/model/mapping.py", line 171, in scan_iteration
new_outputs = compute_shard(outputs, i, shard_size)
File "/home/user/alphafold/alphafold/model/mapping.py", line 165, in compute_shard
slice_out = apply_fun_to_slice(slice_start, slice_size)
File "/home/user/alphafold/alphafold/model/mapping.py", line 138, in apply_fun_to_slice
return fun(*input_slice)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/stateful.py", line 567, in mapped_fun
out, state = mapped_pure_fun(args, state)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/stateful.py", line 558, in pure_fun
out = fun(*args)
File "/home/user/alphafold/alphafold/model/modules.py", line 2057, in map_fn
return template_embedder(query_embedding, batch, mask_2d, is_training)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/module.py", line 428, in wrapped
out = f(*args, **kwargs)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/module.py", line 279, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/user/alphafold/alphafold/model/modules.py", line 1963, in __call__
quaternion=quat_affine.rot_to_quat(rot, unstack_inputs=True),
File "/home/user/alphafold/alphafold/model/quat_affine.py", line 113, in rot_to_quat
_, qs = jnp.linalg.eigh(k)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/linalg.py", line 313, in eigh
v, w = lax_linalg.eigh(a, lower=lower, symmetrize_input=symmetrize_input)
jax._src.source_info_util.JaxStackTraceBeforeTransformation: RuntimeError: cuSolver internal error
The preceding stack trace is the source of the JAX operation that, once transformed by JAX, triggered the following exception.
Hello,it can run with no problem,I have a problem,How i can run it with all 5 Model.
For the new alphafold version, parameters must be downloaded from:
https://storage.googleapis.com/alphafold/alphafold_params_2022-03-02.tar
It should be corrected in:
https://github.com/kalininalab/alphafold_non_docker/blob/main/download_db.sh
Thanks!
Dear author:
I followed the README file an the following command (a cpu version)
$ conda activate alphafold
(alphafold) [ryao@cdragon267 ryao]$ cd alphafold
(alphafold) [ryao@cdragon267 alphafold]$ bash run_alphafold.sh -d ./alphafold_data -o ./dummy_test/ -m model_1 -f ./alphafold_non_docker/example/query.fasta -t 2020-05-14 -g False
/risapps/rhel7/python/3.7.3/envs/alphafold/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --preset has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
I0810 15:31:03.155832 46912496434880 templates.py:836] Using precomputed obsolete pdbs ./alphafold_data/pdb_mmcif/obsolete.dat.
I0810 15:31:03.363498 46912496434880 tpu_client.py:54] Starting the local TPU driver.
I0810 15:31:03.373189 46912496434880 xla_bridge.py:231] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local://
2021-08-10 15:31:03.374934: W external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/local/apps/gcc/7.2.0/lib:/cm/local/apps/gcc/7.2.0/lib64:/rissched/lsf/10.1/linux3.10-glibc2.17-x86_64/lib
2021-08-10 15:31:03.374958: W external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
I0810 15:31:03.375049 46912496434880 xla_bridge.py:231] Unable to initialize backend 'gpu': Failed precondition: No visible GPU devices.
I0810 15:31:03.375171 46912496434880 xla_bridge.py:231] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
W0810 15:31:03.375225 46912496434880 xla_bridge.py:234] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I0810 15:31:03.970467 46912496434880 run_alphafold.py:259] Have 1 models: ['model_1']
I0810 15:31:03.970602 46912496434880 run_alphafold.py:272] Using random seed 2888980253009115914 for the data pipeline
I0810 15:31:03.976739 46912496434880 jackhmmer.py:130] Launching subprocess "/risapps/rhel7/python/3.7.3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpg1fput7i/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ./alphafold_non_docker/example/query.fasta ./alphafold_data/uniref90/uniref90.fasta"
I0810 15:31:03.989789 46912496434880 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0810 15:38:11.871857 46912496434880 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 427.882 seconds
I0810 15:38:11.872416 46912496434880 jackhmmer.py:130] Launching subprocess "/risapps/rhel7/python/3.7.3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpslj920ny/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ./alphafold_non_docker/example/query.fasta ./alphafold_data/mgnify/mgy_clusters.fa"
I0810 15:38:11.894569 46912496434880 utils.py:36] Started Jackhmmer (mgy_clusters.fa) query
I0810 15:47:25.491852 46912496434880 utils.py:40] Finished Jackhmmer (mgy_clusters.fa) query in 553.597 seconds
I0810 15:47:25.492514 46912496434880 hhsearch.py:76] Launching subprocess "/risapps/rhel7/python/3.7.3/envs/alphafold/bin/hhsearch -i /tmp/tmplmbbdtny/query.a3m -o /tmp/tmplmbbdtny/output.hhr -maxseq 1000000 -d ./alphafold_data/pdb70/pdb70"
I0810 15:47:25.510776 46912496434880 utils.py:36] Started HHsearch query
I0810 15:48:42.909016 46912496434880 utils.py:40] Finished HHsearch query in 77.398 seconds
I0810 15:48:42.939602 46912496434880 hhblits.py:128] Launching subprocess "/risapps/rhel7/python/3.7.3/envs/alphafold/bin/hhblits -i ./alphafold_non_docker/example/query.fasta -cpu 4 -oa3m /tmp/tmp5sk1ch3o/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d ./alphafold_data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d ./alphafold_data/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0810 15:48:42.958906 46912496434880 utils.py:36] Started HHblits query
(alphafold) [ryao@cdragon267 alphafold]$ Traceback (most recent call last):
File "/rsrch3/home/itops/ryao/alphafold/run_alphafold.py", line 302, in
app.run(main)
File "/risapps/rhel7/python/3.7.3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/risapps/rhel7/python/3.7.3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/rsrch3/home/itops/ryao/alphafold/run_alphafold.py", line 276, in main
predict_structure(
File "/rsrch3/home/itops/ryao/alphafold/run_alphafold.py", line 126, in predict_structure
feature_dict = data_pipeline.process(
File "/rsrch3/home/itops/ryao/alphafold/alphafold/data/pipeline.py", line 173, in process
hhblits_bfd_uniclust_result = self.hhblits_bfd_uniclust_runner.query(
File "/rsrch3/home/itops/ryao/alphafold/alphafold/data/tools/hhblits.py", line 133, in query
stdout, stderr = process.communicate()
File "/risapps/rhel7/python/3.7.3/envs/alphafold/lib/python3.8/subprocess.py", line 1024, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/risapps/rhel7/python/3.7.3/envs/alphafold/lib/python3.8/subprocess.py", line 1866, in _communicate
ready = selector.select(timeout)
File "/risapps/rhel7/python/3.7.3/envs/alphafold/lib/python3.8/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
It exited. I run this command HPC environment on a compute node. May you suggest a possible cause for this situation?
Thanks!
Is any Jax version compatible with CUDA 10.2?
Dear author,
I have been getting an error in HHblits and I'm wondering if you might understand what is wrong. I tried to run the script with use_gpu=False (although I couldn't work out how this information has been passed on in the shell script).
Here is the log.
I0816 11:44:49.701374 47500092145344 hhblits.py:128] Launching subprocess "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/
conda/alphafold/bin/hhblits -i /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/data/example_PB1F2/AFH41240.1.fasta -cpu 16
-oa3m /tmp/tmps6dyg_bz/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/me
mbers/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/mem
bers/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/databases/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0816 11:44:49.762680 47500092145344 utils.py:36] Started HHblits query
I0816 20:54:02.312248 47500092145344 utils.py:40] Finished HHblits query in 32952.549 seconds
E0816 20:54:02.322368 47500092145344 hhblits.py:138] HHblits failed. HHblits stderr begin:
E0816 20:54:02.322453 47500092145344 hhblits.py:141] - 11:45:38.035 INFO: Searching 65983866 column state sequences.
E0816 20:54:02.322493 47500092145344 hhblits.py:141] - 11:45:38.954 INFO: Searching 15161831 column state sequences.
E0816 20:54:02.322529 47500092145344 hhblits.py:141] - 11:45:39.035 INFO: /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/d
ata/example_PB1F2/AFH41240.1.fasta is in A2M, A3M or FASTA format
E0816 20:54:02.322567 47500092145344 hhblits.py:141] - 11:45:39.041 INFO: Iteration 1
E0816 20:54:02.322600 47500092145344 hhblits.py:141] - 11:45:39.072 INFO: Prefiltering database
E0816 20:54:02.322632 47500092145344 hhblits.py:141] - 19:15:38.345 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 378332
E0816 20:54:02.322664 47500092145344 hhblits.py:141] - 20:54:00.555 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 141755
E0816 20:54:02.322696 47500092145344 hhblits.py:141] - 20:54:00.751 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 2000
E0816 20:54:02.322729 47500092145344 hhblits.py:141] - 20:54:00.751 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 2000
E0816 20:54:02.322766 47500092145344 hhblits.py:141] - 20:54:00.751 INFO: Scoring 2000 HMMs using HMM-HMM Viterbi alignment
E0816 20:54:02.322798 47500092145344 hhblits.py:141] - 20:54:01.122 INFO: Alternative alignment: 0
E0816 20:54:02.322830 47500092145344 hhblits.py:142] HHblits stderr end
Traceback (most recent call last):
File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/alphafold/run_alphafold.py", line 302, in <module>
app.run(main)
File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/conda/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/conda/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/alphafold/run_alphafold.py", line 276, in main
predict_structure(
File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/alphafold/run_alphafold.py", line 126, in predict_structure
feature_dict = data_pipeline.process(
File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/alphafold/alphafold/data/pipeline.py", line 178, in process
hhblits_bfd_uniclust_result = self.hhblits_bfd_uniclust_runner.query(
File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/alphafold/alphafold/data/tools/hhblits.py", line 143, in query
raise RuntimeError('HHblits failed\nstdout:\n%s\n\nstderr:\n%s\n' % (
RuntimeError: HHblits failed
stdout:
stderr:
- 11:45:38.035 INFO: Searching 65983866 column state sequences.
- 11:45:38.954 INFO: Searching 15161831 column state sequences.
- 11:45:39.035 INFO: /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/data/example_PB1F2/AFH41240.1.fasta is in A2M, A3M or FASTA format
- 11:45:39.041 INFO: Iteration 1
- 11:45:39.072 INFO: Prefiltering database
- 19:15:38.345 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 378332
- 20:54:00.555 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 141755
- 20:54:00.751 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 2000
- 20:54:00.751 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 2000
- 20:54:00.751 INFO: Scoring 2000 HMMs using HMM-HMM Viterbi alignment
- 20:54:01.122 INFO: Alternative alignment: 0
Thank you for your help and for making your non-docker alphafold solution!
Hi, it seems like the index file of jaxlib-cuda has been changed to https://storage.googleapis.com/jax-releases/jax_cuda_releases.html)](https://storage.googleapis.com/jax-releases/jax_cuda_releases.html according to google/jax#11087 (comment)
Since a non-docker setup doesn't have the cgroups limiting capabilities, it would be nice to have finer control over the relaxation step performed with OpenMM.
The default AF script uses OpenMM with a CPU platform which by default consumes all CPUs. The behavior can be tuned with the OPENMM_CPU_THREADS
environment variable which can be setup similar to the GPU selection process.
For our local version I added following lines:
c)
openmm_threads=$OPTARG
;;
<...>
if [[ "$openmm_threads" ]] ; then
export OPENMM_CPU_THREADS=${openmm_threads}
fi
Hi all,
I am facing the below error while running alphafold.
File "/mnt/Alphafold/No_doc/alphafold/run_alphafold.py", line 427, in
app.run(main)
File "/home/skhatri/anaconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/skhatri/anaconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/mnt/Alphafold/No_doc/alphafold/run_alphafold.py", line 403, in main
predict_structure(
File "/mnt/Alphafold/No_doc/alphafold/run_alphafold.py", line 166, in predict_structure
feature_dict = data_pipeline.process(
File "/mnt/Alphafold/No_doc/alphafold/alphafold/data/pipeline_multimer.py", line 266, in process
chain_features = self._process_single_chain(
File "/mnt/Alphafold/No_doc/alphafold/alphafold/data/pipeline_multimer.py", line 212, in _process_single_chain
chain_features = self._monomer_data_pipeline.process(
File "/mnt/Alphafold/No_doc/alphafold/alphafold/data/pipeline.py", line 170, in process
msa_for_templates = parsers.deduplicate_stockholm_msa(
File "/mnt/Alphafold/No_doc/alphafold/alphafold/data/parsers.py", line 350, in deduplicate_stockholm_msa
query_align = next(iter(sequence_dict.values()))
StopIteration
I don't understand where the issue is, as it runs fine with other sequences in alphafold multimer . I use the default input method for multimer .
Please help!
I'm wondering why you use pip instead of conda/mamba for the following install:
pip install absl-py==0.13.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 numpy==1.19.5 scipy==1.7.0 tensorflow==2.5.0 pandas==1.3.4 tensorflow-cpu==2.5.0
Have you tried: https://conda-forge.org/blog/posts/2021-11-03-tensorflow-gpu/
Hello, I run the monomer prediction without any problems, but in the compound prediction, there was a mistake, I have checked my directory structure is consistent with the official. However, I made an error when downloading to the uniprot file when using download _ all _ data.sh, so I later downloaded uniprot and pdb _ seqres separately. Don 't know if there is a reason for this?
The following is the error content.
(alphafold) bash run_alphafold.sh -d /home/fsd/afdata/ -o /home/fsd/afoutput/ -f /h
I1023 16:33:13.358484 47290920877760 templates.py:857] Using precomputed obsolete pd
I1023 16:33:14.708593 47290920877760 tpu_client.py:54] Starting the local TPU driver
I1023 16:33:14.709099 47290920877760 xla_bridge.py:212] Unable to initialize backend
I1023 16:33:15.603161 47290920877760 xla_bridge.py:212] Unable to initialize backend
I1023 16:33:23.785693 47290920877760 run_alphafold.py:376] Have 25 models: ['model_1pred_3', 'model_1_multimer_v2_pred_4', 'model_2_multimer_v2_pred_0', 'model_2_multim, 'model_3_multimer_v2_pred_0', 'model_3_multimer_v2_pred_1', 'model_3_multimer_v2_pl_4_multimer_v2_pred_1', 'model_4_multimer_v2_pred_2', 'model_4_multimer_v2_pred_3',timer_v2_pred_2', 'model_5_multimer_v2_pred_3', 'model_5_multimer_v2_pred_4']
I1023 16:33:23.785909 47290920877760 run_alphafold.py:393] Using random seed 2717477
I1023 16:33:23.786309 47290920877760 run_alphafold.py:161] Predicting multimer
I1023 16:33:23.959514 47290920877760 pipeline_multimer.py:210] Running monomer pipel
I1023 16:33:23.960077 47290920877760 jackhmmer.py:133] Launching subprocess "/home/f05 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmpmu_z1046.fasta
I1023 16:33:23.991682 47290920877760 utils.py:36] Started Jackhmmer (uniref90.fasta)
I1023 16:39:41.744481 47290920877760 utils.py:40] Finished Jackhmmer (uniref90.fasta
I1023 16:39:42.184468 47290920877760 jackhmmer.py:133] Launching subprocess "/home/f05 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmpmu_z1046.fasta
I1023 16:39:42.207394 47290920877760 utils.py:36] Started Jackhmmer (mgy_clusters_20
I1023 16:46:42.247396 47290920877760 utils.py:40] Finished Jackhmmer (mgy_clusters_2
I1023 16:46:43.379640 47290920877760 hmmbuild.py:121] Launching subprocess ['/home/fpjatb3j5u/query.msa']
I1023 16:46:43.421558 47290920877760 utils.py:36] Started hmmbuild query
I1023 16:46:44.075075 47290920877760 hmmbuild.py:128] hmmbuild stdout:
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.3.2 (Nov 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file: /tmp/tmpjatb3j5u/query.msa
# output HMM file: /tmp/tmpjatb3j5u/output.hmm
# input alignment is asserted as: protein
# model architecture construction: hand-specified by RF annotation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# idx name nseq alen mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1 query 9218 799 191 7.65 0.590
# CPU time: 0.58u 0.07s 00:00:00.64 Elapsed: 00:00:00.64
stderr:
I1023 16:46:44.075536 47290920877760 utils.py:40] Finished hmmbuild query in 0.654 s
I1023 16:46:44.081163 47290920877760 hmmsearch.py:103] Launching sub-process ['/home-F3', '0.1', '--incE', '100', '-E', '100', '--domE', '100', '--incdomE', '100', '-A'.txt']
I1023 16:46:44.144675 47290920877760 utils.py:36] Started hmmsearch (pdb_seqres.txt)
I1023 16:46:49.798766 47290920877760 utils.py:40] Finished hmmsearch (pdb_seqres.txt
Traceback (most recent call last):
File "/home/fsd/alphafold-2.2.0/run_alphafold.py", line 422, in <module>
app.run(main)
File "/home/fsd/anaconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py",
_run_main(main, args)
File "/home/fsd/anaconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py",
sys.exit(main(argv))
File "/home/fsd/alphafold-2.2.0/run_alphafold.py", line 398, in main
predict_structure(
File "/home/fsd/alphafold-2.2.0/run_alphafold.py", line 172, in predict_structure
feature_dict = data_pipeline.process(
File "/home/fsd/alphafold-2.2.0/alphafold/data/pipeline_multimer.py", line 264, in
chain_features = self._process_single_chain(
File "/home/fsd/alphafold-2.2.0/alphafold/data/pipeline_multimer.py", line 212, in
chain_features = self._monomer_data_pipeline.process(
File "/home/fsd/alphafold-2.2.0/alphafold/data/pipeline.py", line 185, in process
pdb_templates_result = self.template_searcher.query(msa_for_templates)
File "/home/fsd/alphafold-2.2.0/alphafold/data/tools/hmmsearch.py", line 79, in qu
return self.query_with_hmm(hmm)
File "/home/fsd/alphafold-2.2.0/alphafold/data/tools/hmmsearch.py", line 112, in q
raise RuntimeError(
RuntimeError: hmmsearch failed:
stdout:
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3.2 (Nov 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file: /tmp/tmp2_sps4u7/query.hmm
# target sequence database: /home/fsd/afdata//pdb_seqres/pdb_seqres.txt
# MSA of all hits saved to file: /tmp/tmp2_sps4u7/output.sto
# show alignments in output: no
# sequence reporting threshold: E-value <= 100
# domain reporting threshold: E-value <= 100
# sequence inclusion threshold: E-value <= 100
# domain inclusion threshold: E-value <= 100
# MSV filter P threshold: <= 0.1
# Vit filter P threshold: <= 0.1
# Fwd filter P threshold: <= 0.1
# number of worker threads: 8
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query: query [M=191]
stderr:
Parse failed (sequence file /home/fsd/afdata//pdb_seqres/pdb_seqres.txt):
Line 1364572: illegal character 0
Facing issue while running alpha fold v2.2 and using jax==0.2.14 and jaxlib==0.3.10 and dm-haikuu==0.0.4
Traceback (most recent call last):
File "/home/datafiles/alphafold_data/alphafold/run_alphafold.py", line 33, in
from alphafold.model import data
File "/home/datafiles/alphafold_data/alphafold/alphafold/model/data.py", line 21, in
import haiku as hk
File "/home/conda1/anaconda3/envs/alphafold/lib/python3.8/site-packages/haiku/init.py", line 17, in
from haiku import data_structures
File "/home/conda1/anaconda3/envs/alphafold/lib/python3.8/site-packages/haiku/data_structures.py", line 17, in
from haiku._src.data_structures import to_immutable_dict
File "/home/conda1/anaconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/data_structures.py", line 30, in
from haiku._src import utils
File "/home/conda1/anaconda3/envs/alphafold/lib/python3.8/site-packages/haiku/_src/utils.py", line 24, in
import jax
File "/home/conda1/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/init.py", line 108, in
from .experimental.maps import soft_pmap
File "/home/conda1/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/experimental/maps.py", line 25, in
from .. import numpy as jnp
File "/home/conda1/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/numpy/init.py", line 16, in
from . import fft
File "/home/conda1/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/numpy/fft.py", line 17, in
from jax._src.numpy.fft import (
File "/home/conda1/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/fft.py", line 19, in
from jax import lax
File "/home/conda1/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/lax/init.py", line 330, in
from jax._src.lax.fft import (
File "/home/conda1/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/lax/fft.py", line 144, in
xla.backend_specific_translations['cpu'][fft_p] = pocketfft.pocketfft
AttributeError: module 'jaxlib.pocketfft' has no attribute 'pocketfft'
Please help
Hi,
We are running the bash run_alphafold.sh script in our non-docker installation via this repo (thanks for the efforts, if you leave us a name, address and t-shirt size, we will send you a t-shirt!).
Question: in the older version of the bash script, I was able to specify "model_1" for AF to run only 1 of the 5 models. Can I do the same in v2.2.0? What name is model_4? Thanks in advance.
Hi guys,
This is not per se' an issue but I've just used your framework to run non-docker AlphaFold2 with "full_dbs" preset for which I was expected in the output per-model .pkl files the PAE matrix and predicted TM-score which I did not get. My understanding was that running "full_dbs" preset runs the ptm network instead of casp14?!
Can anyone check this or confirm how to run the ptm version in AlphaFold2 with your main python script please?!
Thanks,
David
Everything was followed by the guideline except for the jax installation, becaue it would throw out the error of ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74.
. We therefore use pip3 install --upgrade jax jaxlib>=0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html
to update it.
Then python run_alphafold_test.py
is no problem, however bash run_alphafold.sh -d ../database-dir/ -o ../work-dir/ -f ../work-dir/T1050.fasta -t 2020-05-14
threw out a error as shown below:
$ bash run_alphafold.sh -d ../database-dir/ -o ../work-dir/ -f ../work-dir/T1050.fasta -t 2020-05-14
2022-01-24 11:28:29.659453: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
I0124 11:28:31.304372 140384586483520 templates.py:857] Using precomputed obsolete pdbs ../database-dir//pdb_mmcif/obsolete.dat.
I0124 11:28:31.476207 140384586483520 xla_bridge.py:244] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I0124 11:28:31.641233 140384586483520 xla_bridge.py:244] Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.
I0124 11:28:37.127628 140384586483520 run_alphafold.py:384] Have 5 models: ['model_1', 'model_2', 'model_3', 'model_4', 'model_5']
I0124 11:28:37.127787 140384586483520 run_alphafold.py:397] Using random seed 324245886155445948 for the data pipeline
I0124 11:28:37.127987 140384586483520 run_alphafold.py:150] Predicting T1050
I0124 11:28:37.128321 140384586483520 jackhmmer.py:130] Launching subprocess "/home/aaron/bin/miniconda3/envs/alphafold_conda/bin/jackhmmer -o /dev/null -A /tmp/tmpomb2yx3m/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ../work-dir/T1050.fasta ../database-dir//uniref90/uniref90.fasta"
I0124 11:28:37.304794 140384586483520 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0124 11:33:27.010351 140384586483520 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 289.705 seconds
I0124 11:33:35.188940 140384586483520 jackhmmer.py:130] Launching subprocess "/home/aaron/bin/miniconda3/envs/alphafold_conda/bin/jackhmmer -o /dev/null -A /tmp/tmp_s62488h/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ../work-dir/T1050.fasta ../database-dir//mgnify/mgy_clusters_2018_12.fa"
I0124 11:33:35.408562 140384586483520 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0124 11:38:51.483255 140384586483520 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 316.074 seconds
I0124 11:39:20.663236 140384586483520 hhsearch.py:85] Launching subprocess "/home/aaron/bin/miniconda3/envs/alphafold_conda/bin/hhsearch -i /tmp/tmp_t0wr8md/query.a3m -o /tmp/tmp_t0wr8md/output.hhr -maxseq 1000000 -d ../database-dir//pdb70/pdb70"
I0124 11:39:20.892679 140384586483520 utils.py:36] Started HHsearch query
I0124 11:40:41.805876 140384586483520 utils.py:40] Finished HHsearch query in 80.913 seconds
I0124 11:42:19.997202 140384586483520 hhblits.py:128] Launching subprocess "/home/aaron/bin/miniconda3/envs/alphafold_conda/bin/hhblits -i ../work-dir/T1050.fasta -cpu 4 -oa3m /tmp/tmpdlpm19oi/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d ../database-dir//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d ../database-dir//uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0124 11:42:20.263012 140384586483520 utils.py:36] Started HHblits query
I0124 12:37:39.983841 140384586483520 utils.py:40] Finished HHblits query in 3319.720 seconds
I0124 12:37:40.498109 140384586483520 templates.py:878] Searching for template for: MASQSYLFKHLEVSDGLSNNSVNTIYKDRDGFMWFGTTTGLNRYDGYTFKIYQHAENEPGSLPDNYITDIVEMPDGRFWINTARGYVLFDKERDYFITDVTGFMKNLESWGVPEQVFVDREGNTWLSVAGEGCYRYKEGGKRLFFSYTEHSLPEYGVTQMAECSDGILLIYNTGLLVCLDRATLAIKWQSDEIKKYIPGGKTIELSLFVDRDNCIWAYSLMGIWAYDCGTKSWRTDLTGIWSSRPDVIIHAVAQDIEGRIWVGKDYDGIDVLEKETGKVTSLVAHDDNGRSLPHNTIYDLYADRDGVMWVGTYKKGVSYYSESIFKFNMYEWGDITCIEQADEDRLWLGTNDHGILLWNRSTGKAEPFWRDAEGQLPNPVVSMLKSKDGKLWVGTFNGGLYCMNGSQVRSYKEGTGNALASNNVWALVEDDKGRIWIASLGGGLQCLEPLSGTFETYTSNNSALLENNVTSLCWVDDNTLFFGTASQGVGTMDMRTREIKKIQGQSDSMKLSNDAVNHVYKDSRGLVWIATREGLNVYDTRRHMFLDLFPVVEAKGNFIAAITEDQERNMWVSTSRKVIRVTVASDGKGSYLFDSRAYNSEDGLQNCDFNQRSIKTLHNGIIAIGGLYGVNIFAPDHIRYNKMLPNVMFTGLSLFDEAVKVGQSYGGRVLIEKELNDVENVEFDYKQNIFSVSFASDNYNLPEKTQYMYKLEGFNNDWLTLPVGVHNVTFTNLAPGKYVLRVKAINSDGYVGIKEATLGIVVNPPFKLAAALQHHHHHH
I0124 12:37:42.310647 140384586483520 templates.py:267] Found an exact template match 4a2m_B.
I0124 12:37:44.813723 140384586483520 templates.py:267] Found an exact template match 4a2l_F.
I0124 12:37:46.607977 140384586483520 templates.py:267] Found an exact template match 3v9f_B.
I0124 12:37:47.402505 140384586483520 templates.py:267] Found an exact template match 3va6_A.
I0124 12:37:48.522100 140384586483520 templates.py:267] Found an exact template match 3ott_B.
I0124 12:37:48.918042 140384586483520 templates.py:267] Found an exact template match 5m11_A.
I0124 12:37:48.945314 140384586483520 templates.py:267] Found an exact template match 4a2m_B.
I0124 12:37:48.974633 140384586483520 templates.py:267] Found an exact template match 4a2l_F.
I0124 12:37:49.003993 140384586483520 templates.py:267] Found an exact template match 4a2m_B.
I0124 12:37:49.033177 140384586483520 templates.py:267] Found an exact template match 4a2l_F.
I0124 12:37:49.062488 140384586483520 templates.py:267] Found an exact template match 5m11_A.
I0124 12:37:49.089267 140384586483520 templates.py:267] Found an exact template match 3v9f_B.
I0124 12:37:49.118885 140384586483520 templates.py:267] Found an exact template match 3ott_B.
I0124 12:37:49.148563 140384586483520 templates.py:267] Found an exact template match 3va6_A.
I0124 12:37:49.178556 140384586483520 templates.py:267] Found an exact template match 3ott_B.
I0124 12:37:49.207692 140384586483520 templates.py:267] Found an exact template match 3va6_A.
I0124 12:37:49.237412 140384586483520 templates.py:267] Found an exact template match 5m11_A.
I0124 12:37:49.264744 140384586483520 templates.py:267] Found an exact template match 4a2m_B.
I0124 12:37:49.293939 140384586483520 templates.py:267] Found an exact template match 4a2l_F.
I0124 12:37:49.322692 140384586483520 templates.py:267] Found an exact template match 3v9f_B.
I0124 12:37:51.470793 140384586483520 pipeline.py:221] Uniref90 MSA size: 10000 sequences.
I0124 12:37:51.470931 140384586483520 pipeline.py:222] BFD MSA size: 4966 sequences.
I0124 12:37:51.470967 140384586483520 pipeline.py:223] MGnify MSA size: 501 sequences.
I0124 12:37:51.471006 140384586483520 pipeline.py:224] Final (deduplicated) MSA size: 15406 sequences.
I0124 12:37:51.471178 140384586483520 pipeline.py:226] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.
I0124 12:37:52.176468 140384586483520 run_alphafold.py:185] Running model model_1 on T1050
2022-01-24 12:37:54.502811: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-01-24 12:37:54.504019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: RTX A6000 computeCapability: 8.6
coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s
2022-01-24 12:37:54.504059: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-01-24 12:37:54.505758: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-01-24 12:37:54.505845: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-01-24 12:37:54.505870: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2022-01-24 12:37:54.506045: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2022-01-24 12:37:54.507774: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2022-01-24 12:37:54.508212: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2022-01-24 12:37:54.508344: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2022-01-24 12:37:54.510542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-01-24 12:37:54.559168: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-24 12:37:54.563017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: RTX A6000 computeCapability: 8.6
coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s
2022-01-24 12:37:54.565146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2022-01-24 12:37:54.565185: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-01-24 12:37:54.626365: E tensorflow/core/common_runtime/session.cc:91] Failed to create session: Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: unrecognized error code
2022-01-24 12:37:54.626381: E tensorflow/c/c_api.cc:2193] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: unrecognized error code
Traceback (most recent call last):
File "/mnt/disk4T/alphafold-project/alphafold_conda/run_alphafold.py", line 427, in <module>
app.run(main)
File "/home/aaron/bin/miniconda3/envs/alphafold_conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/aaron/bin/miniconda3/envs/alphafold_conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/mnt/disk4T/alphafold-project/alphafold_conda/run_alphafold.py", line 403, in main
predict_structure(
File "/mnt/disk4T/alphafold-project/alphafold_conda/run_alphafold.py", line 188, in predict_structure
processed_feature_dict = model_runner.process_features(
File "/mnt/disk4T/alphafold-project/alphafold_conda/alphafold/model/model.py", line 131, in process_features
return features.np_example_to_features(
File "/mnt/disk4T/alphafold-project/alphafold_conda/alphafold/model/features.py", line 101, in np_example_to_features
with tf.Session(graph=tf_graph) as sess:
File "/home/aaron/bin/miniconda3/envs/alphafold_conda/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1596, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/home/aaron/bin/miniconda3/envs/alphafold_conda/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 711, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: unrecognized error code
The code is run on ubuntu 20.04, and nvidia-smi information is :
Mon Jan 24 15:10:12 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 RTX A6000 On | 00000000:02:00.0 On | Off |
| 30% 30C P8 23W / 300W | 206MiB / 48676MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1778 G /usr/lib/xorg/Xorg 147MiB |
| 0 N/A N/A 13768 G /usr/bin/gnome-shell 32MiB |
| 0 N/A N/A 969147 G ...nlogin/bin/sunloginclient 6MiB |
| 0 N/A N/A 1048268 G ...AAAAAAAAA= --shared-files 17MiB |
+-----------------------------------------------------------------------------+
This could be unrelated to this repo and instead be just some sort of drivers issue, but I'll post the error just in case someone can help.
We've installed this repo in an Ubuntu 21.04 Laptop with Thunderbolt and an eGPU with 2 Nvidia Quadro P1000 cards.
We kick off two parallel jobs, one on node 0 and another one on node 1, and they mostly go well, but after a few minutes/hours, sometimes one of the jobs gets stuck with the error below:
Any ideas wellcomed, thanks
I0930 07:04:52.753646 140368369108800 model.py:131] Running predict with shape(feat) = {'aatype': (4, 245), 'residue_index': (4, 245), 'seq_length': (4,), 'template_aatype': (4, 4, 245), 'template_all_atom_masks': (4, 4, 245, 37), 'template_all_atom_positions': (4, 4, 245, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 245), 'msa_mask': (4, 508, 245), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 245, 3), 'template_pseudo_beta_mask': (4, 4, 245), 'atom14_atom_exists': (4, 245, 14), 'residx_atom14_to_atom37': (4, 245, 14), 'residx_atom37_to_atom14': (4, 245, 37), 'atom37_atom_exists': (4, 245, 37), 'extra_msa': (4, 5120, 245), 'extra_msa_mask': (4, 5120, 245), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 245), 'true_msa': (4, 508, 245), 'extra_has_deletion': (4, 5120, 245), 'extra_deletion_value': (4, 5120, 245), 'msa_feat': (4, 508, 245, 49), 'target_feat': (4, 245, 22)}
2021-09-30 07:06:51.976947: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2040] Execution of replica 0 failed: Internal: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "/home/user/alphafold/run_alphafold.py", line 310, in <module>
app.run(main)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/user/alphafold/run_alphafold.py", line 284, in main
predict_structure(
File "/home/user/alphafold/run_alphafold.py", line 149, in predict_structure
prediction_result = model_runner.predict(processed_feature_dict)
File "/home/user/alphafold/alphafold/model/model.py", line 133, in predict
result = self.apply(self.params, jax.random.PRNGKey(0), feat)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/api.py", line 411, in cache_miss
out_flat = xla.xla_call(
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1618, in bind
return call_bind(self, fun, *args, **params)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1609, in call_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1621, in process
return trace.process_call(self, fun, tracers, params)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 615, in process_call
return primitive.impl(f, *tracers, **params)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 625, in _xla_call_impl
out = compiled_fun(*args)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 960, in _execute_compiled
out_bufs = compiled.execute(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: Internal: CUBLAS_STATUS_EXECUTION_FAILED
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/alphafold/run_alphafold.py", line 310, in <module>
app.run(main)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/user/alphafold/run_alphafold.py", line 284, in main
predict_structure(
File "/home/user/alphafold/run_alphafold.py", line 149, in predict_structure
prediction_result = model_runner.predict(processed_feature_dict)
File "/home/user/alphafold/alphafold/model/model.py", line 133, in predict
result = self.apply(self.params, jax.random.PRNGKey(0), feat)
File "/home/user/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 960, in _execute_compiled
out_bufs = compiled.execute(input_bufs)
RuntimeError: Internal: CUBLAS_STATUS_EXECUTION_FAILED
2021-09-30 07:06:53.025348: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1039] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
PyDict_SetItem
_PyModule_ClearDict
PyImport_Cleanup
Py_FinalizeEx
Py_RunMain
Py_BytesMain
__libc_start_main
*** End stack trace ***
2021-09-30 07:06:53.025585: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_executable.cc:99] Check failed: pair.first->SynchronizeAllActivity()
Fatal Python error: Aborted
this line:
pip install --upgrade jax==0.2.14 jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html
generates an error
this issue was solved here:
google-deepmind/alphafold#510
with this:
pip install --upgrade jax==0.2.14 jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
Shuld not be changed?
I've set up alphafold_non_docker and it appears to be running properly, but my tests have taken over 5 days without finishing yet, so I think something's probably going wrong - hopefully someone will have an idea! One is a multimer run of a relatively small antibody variable region (129 H residues, 108 L residues), and one is a test of a single chain antigen test with ~500 residues. I'm running on nodes with K80 GPUs and 125G memory. As an example, I'll attach the output of the antibody run to this post rather than copying it over (since it's quite long).
The corresponding command for the antibody run is: ./run_alphafold.sh -d /dartfs/rc/lab/G/Grigoryanlab/library/AlphaFoldEtc/alphafold_DBs/ -o /dartfs/rc/lab/G/Grigoryanlab/home/coy/Dartmouth_PhD_Repo/antibodyTestMC4/ -f /dartfs/rc/lab/G/Grigoryanlab/library/AlphaFoldEtc/antibodyTestMC.fasta -t 2021-10-04 -m multimer
My best guess as to why it's taking so long is that it's calling model.py more than it's supposed to? As far as I can tell, the README from alphafold indicates there should be 5 models (one from each seed) in the output, but I'm getting output that seems to indicate model.py has been called 23 times already, and there will probably be 25 models total when it finishes, according to the pattern of the models being produced. Here's a ls of my output directory for the antibody test:
features.pkl relaxed_model_5_multimer_v2_pred_4.pdb
msas result_model_1_multimer_v2_pred_0.pkl
ranked_0.pdb result_model_1_multimer_v2_pred_1.pkl
ranked_10.pdb result_model_1_multimer_v2_pred_2.pkl
ranked_11.pdb result_model_1_multimer_v2_pred_3.pkl
ranked_12.pdb result_model_1_multimer_v2_pred_4.pkl
ranked_13.pdb result_model_2_multimer_v2_pred_0.pkl
ranked_14.pdb result_model_2_multimer_v2_pred_1.pkl
ranked_15.pdb result_model_2_multimer_v2_pred_2.pkl
ranked_16.pdb result_model_2_multimer_v2_pred_3.pkl
ranked_17.pdb result_model_2_multimer_v2_pred_4.pkl
ranked_18.pdb result_model_3_multimer_v2_pred_0.pkl
ranked_19.pdb result_model_3_multimer_v2_pred_1.pkl
ranked_1.pdb result_model_3_multimer_v2_pred_2.pkl
ranked_20.pdb result_model_3_multimer_v2_pred_3.pkl
ranked_21.pdb result_model_3_multimer_v2_pred_4.pkl
ranked_22.pdb result_model_4_multimer_v2_pred_0.pkl
ranked_23.pdb result_model_4_multimer_v2_pred_1.pkl
ranked_24.pdb result_model_4_multimer_v2_pred_2.pkl
ranked_2.pdb result_model_4_multimer_v2_pred_3.pkl
ranked_3.pdb result_model_4_multimer_v2_pred_4.pkl
ranked_4.pdb result_model_5_multimer_v2_pred_0.pkl
ranked_5.pdb result_model_5_multimer_v2_pred_1.pkl
ranked_6.pdb result_model_5_multimer_v2_pred_2.pkl
ranked_7.pdb result_model_5_multimer_v2_pred_3.pkl
ranked_8.pdb result_model_5_multimer_v2_pred_4.pkl
ranked_9.pdb timings.json
ranking_debug.json unrelaxed_model_1_multimer_v2_pred_0.pdb
relaxed_model_1_multimer_v2_pred_0.pdb unrelaxed_model_1_multimer_v2_pred_1.pdb
relaxed_model_1_multimer_v2_pred_1.pdb unrelaxed_model_1_multimer_v2_pred_2.pdb
relaxed_model_1_multimer_v2_pred_2.pdb unrelaxed_model_1_multimer_v2_pred_3.pdb
relaxed_model_1_multimer_v2_pred_3.pdb unrelaxed_model_1_multimer_v2_pred_4.pdb
relaxed_model_1_multimer_v2_pred_4.pdb unrelaxed_model_2_multimer_v2_pred_0.pdb
relaxed_model_2_multimer_v2_pred_0.pdb unrelaxed_model_2_multimer_v2_pred_1.pdb
relaxed_model_2_multimer_v2_pred_1.pdb unrelaxed_model_2_multimer_v2_pred_2.pdb
relaxed_model_2_multimer_v2_pred_2.pdb unrelaxed_model_2_multimer_v2_pred_3.pdb
relaxed_model_2_multimer_v2_pred_3.pdb unrelaxed_model_2_multimer_v2_pred_4.pdb
relaxed_model_2_multimer_v2_pred_4.pdb unrelaxed_model_3_multimer_v2_pred_0.pdb
relaxed_model_3_multimer_v2_pred_0.pdb unrelaxed_model_3_multimer_v2_pred_1.pdb
relaxed_model_3_multimer_v2_pred_1.pdb unrelaxed_model_3_multimer_v2_pred_2.pdb
relaxed_model_3_multimer_v2_pred_2.pdb unrelaxed_model_3_multimer_v2_pred_3.pdb
relaxed_model_3_multimer_v2_pred_3.pdb unrelaxed_model_3_multimer_v2_pred_4.pdb
relaxed_model_3_multimer_v2_pred_4.pdb unrelaxed_model_4_multimer_v2_pred_0.pdb
relaxed_model_4_multimer_v2_pred_0.pdb unrelaxed_model_4_multimer_v2_pred_1.pdb
relaxed_model_4_multimer_v2_pred_1.pdb unrelaxed_model_4_multimer_v2_pred_2.pdb
relaxed_model_4_multimer_v2_pred_2.pdb unrelaxed_model_4_multimer_v2_pred_3.pdb
relaxed_model_4_multimer_v2_pred_3.pdb unrelaxed_model_4_multimer_v2_pred_4.pdb
relaxed_model_4_multimer_v2_pred_4.pdb unrelaxed_model_5_multimer_v2_pred_0.pdb
relaxed_model_5_multimer_v2_pred_0.pdb unrelaxed_model_5_multimer_v2_pred_1.pdb
relaxed_model_5_multimer_v2_pred_1.pdb unrelaxed_model_5_multimer_v2_pred_2.pdb
relaxed_model_5_multimer_v2_pred_2.pdb unrelaxed_model_5_multimer_v2_pred_3.pdb
relaxed_model_5_multimer_v2_pred_3.pdb unrelaxed_model_5_multimer_v2_pred_4.pdb
As you can see, the pattern for the output files is like: "relaxed_model_{0-4}multimer_v2_pred{0-4}.pdb". I'm not sure what the numbers 0-4 indicate; I'd assume one of them indicates models that come from the same starting seed, but I'm not sure what the other set of 0-4 would indicate / which one of the two places is the one that indicates a shared seed. Apologies if this is indicated somewhere and I've missed it! Thanks so much for any help on how to make this run faster / if the output is correct or not.
EDIT: I've since tried running the antibody test with CPUs only (no GPUs, so using the -e false -g false flags appended to the aforementioned command) and it takes ~16 hours. The GPU test recently finished and took 6 days in total! I was able to request 200 GB memory for the CPU test and only 125 GB memory for the GPU test, which might indicate memory is the limiting factor? I also updated the ls of the output dir above to have the final output files.
EDIT2: Large complexes (~2000 aa long) take a very long amount of time on CPU - about a month - and longer on GPU. Even with the same amount of memory, the GPU runs take longer than the CPU runs.
At the moment, the export of CUDA_VISIBLE_DEVICES=0 is hardcoded in the script,
alphafold_non_docker/run_alphafold.sh
Line 97 in 46ee72c
which is quite misleading in conjunction with the
alphafold_non_docker/run_alphafold.sh
Line 18 in 46ee72c
Moreover, the NVIDIA_VISIBLE_DEVICES environment variable applies only to containers and is not needed in the non-docker setup.
alphafold_non_docker/run_alphafold.sh
Line 99 in 46ee72c
This part can be modified like this to work as initially intended.
# Export ENVIRONMENT variables (change me if required)
if [[ "$use_gpu" == true ]] ; then
export CUDA_VISIBLE_DEVICES=0
if [[ "$gpu_devices" ]] ; then
export CUDA_VISIBLE_DEVICES=$gpu_devices
fi
fi
I hope this will be helpful!
I installed alphafold_non_docker step by step, but I found an error as the following:
(alphafold) [root@ecs alphafold-2.2.0]# bash run_alphafold.sh -d ./alphafold_data -o ./dummy_test/ -f ./example/query.fasta -t 2020-05-14 -g False
E0704 11:23:00.766234 139973713471296 hhsearch.py:56] Could not find HHsearch database ./alphafold_data//pdb70/pdb70
Traceback (most recent call last):
File "/root/alphafold-2.2.0/run_alphafold.py", line 422, in <module>
app.run(main)
File "/root/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/root/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/root/alphafold-2.2.0/run_alphafold.py", line 327, in main
template_searcher = hhsearch.HHSearch(
File "/root/alphafold-2.2.0/alphafold/data/tools/hhsearch.py", line 57, in __init__
raise ValueError(f'Could not find HHsearch database {database_path}')
ValueError: Could not find HHsearch database ./alphafold_data/pdb70/pdb70
How to resolve this problem, I sincerely need u help. Thank u~
It seemed it was running.. until it wasnt.. although it produced some data meantime..
I was running it in an AWS instance with 60 cores, 477GiB RAM and 8 GPU
I have pasted the outputed logs below...
Any idea what problem could be?
Thank you
I runned it with this sample fasta file as query.fasta
T1050 A7LXT1, Bacteroides Ovatus, 779 residues|
MASQSYLFKHLEVSDGLSNNSVNTIYKDRDGFMWFGTTTGLNRYDGYTFKIYQHAENEPGSLPDNYITDIVEMPDGRFWINTARGYVLFDKERDYFITDVTGFMKNLESWGVPEQVFVDREGNTWLSVAGEGCYRYKEGGKRLFFSYTEHSLPEYGVTQMAECSDGILLIYNTGLLVCLDRATLAIKWQSDEIKKYIPGGKTIELSLFVDRDNCIWAYSLMGIWAYDCGTKSWRTDLTGIWSSRPDVIIHAVAQDIEGRIWVGKDYDGIDVLEKETGKVTSLVAHDDNGRSLPHNTIYDLYADRDGVMWVGTYKKGVSYYSESIFKFNMYEWGDITCIEQADEDRLWLGTNDHGILLWNRSTGKAEPFWRDAEGQLPNPVVSMLKSKDGKLWVGTFNGGLYCMNGSQVRSYKEGTGNALASNNVWALVEDDKGRIWIASLGGGLQCLEPLSGTFETYTSNNSALLENNVTSLCWVDDNTLFFGTASQGVGTMDMRTREIKKIQGQSDSMKLSNDAVNHVYKDSRGLVWIATREGLNVYDTRRHMFLDLFPVVEAKGNFIAAITEDQERNMWVSTSRKVIRVTVASDGKGSYLFDSRAYNSEDGLQNCDFNQRSIKTLHNGIIAIGGLYGVNIFAPDHIRYNKMLPNVMFTGLSLFDEAVKVGQSYGGRVLIEKELNDVENVEFDYKQNIFSVSFASDNYNLPEKTQYMYKLEGFNNDWLTLPVGVHNVTFTNLAPGKYVLRVKAINSDGYVGIKEATLGIVVNPPFKLAAALQHHHHHH
The data generated:
ubuntu@run-62387ab63902662cbe274d7c-4d7kq:/mnt$ tree -sh /mnt/example/
/mnt/example/
โโโ [4.0K] dummy_test
โ โโโ [4.0K] query
โ โโโ [4.0K] msas
โ โโโ [3.4M] mgnify_hits.sto
โ โโโ [ 72M] uniref90_hits.sto
โโโ [ 830] query.fasta_
3 directories, 3 files
ubuntu@run-62387ab63902662cbe274d7c-4d7kq:/app/alphafold$ sudo ./run_alphafold.sh -d /domino/datasets/af_download_data/ -o /mnt/example/dummy_test -f /mnt/example/query.fasta -t 2022-03-21
/opt/conda/lib/python3.7/site-packages/absl/flags/_validators.py:206: UserWarning: Flag --use_gpu_relax has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
'command line!' % flag_name)
I0321 13:29:02.076551 139820712200000 templates.py:857] Using precomputed obsolete pdbs /domino/datasets/af_download_data//pdb_mmcif/obsolete.dat.
I0321 13:29:03.170220 139820712200000 tpu_client.py:54] Starting the local TPU driver.
I0321 13:29:03.171494 139820712200000 xla_bridge.py:212] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local://
I0321 13:29:05.166625 139820712200000 xla_bridge.py:212] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
I0321 13:29:21.223274 139820712200000 run_alphafold.py:384] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0321 13:29:21.223868 139820712200000 run_alphafold.py:400] Using random seed 1019557854010524627 for the data pipeline
I0321 13:29:21.224538 139820712200000 run_alphafold.py:168] Predicting query
I0321 13:29:21.225994 139820712200000 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpa75vmfip/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/example/query.fasta /domino/datasets/af_download_data//uniref90/uniref90.fasta"
I0321 13:29:21.309449 139820712200000 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0321 13:37:13.182801 139820712200000 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 471.873 seconds
I0321 13:37:19.727575 139820712200000 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpq_3sjpki/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/example/query.fasta /domino/datasets/af_download_data//mgnify/mgy_clusters_2018_12.fa"
I0321 13:37:19.829512 139820712200000 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0321 13:44:58.966890 139820712200000 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 459.137 seconds
I0321 13:45:22.831639 139820712200000 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpxw1gqa3o/query.a3m -o /tmp/tmpxw1gqa3o/output.hhr -maxseq 1000000 -d /domino/datasets/af_download_data//pdb70/pdb70"
I0321 13:45:22.918177 139820712200000 utils.py:36] Started HHsearch query
I0321 13:45:23.270786 139820712200000 utils.py:40] Finished HHsearch query in 0.352 seconds
Traceback (most recent call last):
File "/app/alphafold/run_alphafold.py", line 429, in
app.run(main)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/app/alphafold/run_alphafold.py", line 413, in main
random_seed=random_seed)
File "/app/alphafold/run_alphafold.py", line 181, in predict_structure
msa_output_dir=msa_output_dir)
File "/app/alphafold/alphafold/data/pipeline.py", line 188, in process
pdb_templates_result = self.template_searcher.query(uniref90_msa_as_a3m)
File "/app/alphafold/alphafold/data/tools/hhsearch.py", line 96, in query
stdout.decode('utf-8'), stderr[:100_000].decode('utf-8')))
RuntimeError: HHSearch failed:
stdout:
stderr:
I'm consistently running into this issue when trying to run the non-docker AP2 install. Followed directions and everything installed just fine. Downloaded run_alphafold.sh and tried to run the script, I consistently got the following error:
FATAL Flags parsing error: flag --use_gpu_relax=None: Flag --use_gpu_relax must have a value other than None.
Pass --helpshort or --helpfull to see help on flags.
I don't know where this setting is coming from, since it's not part of the script. I'm running Ubuntu 20.04.3 LTS, cuda version 11.5 v11.5.119, and nvidia-smi returns the following:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A4000 On | 00000000:19:00.0 Off | Off |
| 41% 34C P8 8W / 140W | 13MiB / 16117MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A4000 On | 00000000:1A:00.0 Off | Off |
| 41% 36C P8 7W / 140W | 13MiB / 16117MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA RTX A4000 On | 00000000:67:00.0 Off | Off |
| 41% 34C P8 6W / 140W | 13MiB / 16117MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA RTX A4000 On | 00000000:68:00.0 On | Off |
| 41% 37C P8 9W / 140W | 352MiB / 16116MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Hi there, I tried to run multimer mode (monomer mode worked well), but encountered the following errors:
context 0x5618aefca000: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2022-02-07 22:20:55.956803: E external/org_tensorflow/tensorflow/stream_executor/stream.cc:310] failed to allocate stream during initialization
2022-02-07 22:20:55.956815: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:618] unable to add host callback: CUDA_ERROR_INVALID_HANDLE: invalid resource handle
2022-02-07 22:20:55.956811: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:618] unable to add host callback: CUDA_ERROR_INVALID_HANDLE: invalid resource handle
2022-02-07 22:20:55.956826: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2022-02-07 22:20:55.956834: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:699] could not allocate CUDA stream for context 0x5618aefca000: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
I checked tensorflow, CUDA, etc. and it all seems to have been installed correctly. One thing I did notice is that only GPU 0 was used - even though I specified 0,1,2,3 in the commands, and also all 4 GPUs were visible. Are these issues related, and how do I resolve them?
Hello author,
In file run_alphafold.sh, database path section:
should
pdb70_database_path="$data_dir/pdb70/pdb70"
uniclust30_database_path="$data_dir/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
be:
pdb70_database_path="$data_dir/pdb70"
uniclust30_database_path="$data_dir/uniclust30/uniclust30_2018_08"
Thank you!
Rong
i was able to run alphafold on cpu successfully.
but when i am trying to run the same on gpu i am getting the below error
Traceback (most recent call last):
File "/home/ngayatri/alphafold/run_alphafold.py", line 310, in
app.run(main)
File "/home/ngayatri/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/ngayatri/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/ngayatri/alphafold/run_alphafold.py", line 284, in main
predict_structure(
File "/home/ngayatri/alphafold/run_alphafold.py", line 149, in predict_structure
prediction_result = model_runner.predict(processed_feature_dict)
File "/home/ngayatri/alphafold/alphafold/model/model.py", line 133, in predict
result = self.apply(self.params, jax.random.PRNGKey(0), feat)
File "/home/ngayatri/miniconda3/envs/alphafold/lib/python3.8/site-packages/jaxlib/cusolver.py", line 281, in syevd
lwork, opaque = cusolver_kernels.build_syevj_descriptor(
RuntimeError: cuSolver internal error
i have checked with cuda version also. it is present
thank you
I have installed alphafold2 using the non-docker method on HPC, I am running the script using GPU (V100 with 16 GB of memory).
For a sequence of around 200 amino acids, It is taking around 8 hours for structure determination.
In Alphafold2 paper (https://www.nature.com/articles/s41586-021-03819-2.pdf), it is quoted that "Representative timings for the neural network using a single model on V100 GPU are 4.8 min with 256 residues, 9.2 min with 384 residues and 18 h at 2,500 residues".
I have pasted the outputed logs below.
Any idea why it is running very slow in my case although I am using the same GPU?
I0412 09:13:32.372322 140040161777472 tpu_client.py:54] Starting the local TPU driver.
I0412 09:13:32.539564 140040161777472 xla_bridge.py:212] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local://
I0412 09:13:32.890057 140040161777472 xla_bridge.py:212] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
I0412 09:13:38.549945 140040161777472 run_alphafold.py:376] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0412 09:13:38.550210 140040161777472 run_alphafold.py:393] Using random seed 1060063058774185674 for the data pipeline
I0412 09:13:38.550483 140040161777472 run_alphafold.py:161] Predicting A0A016TJD3
I0412 09:13:38.566384 140040161777472 jackhmmer.py:133] Launching subprocess "/home/laddhadi/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpgcj2ht0b/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /scratch/laddhadi/fasta_files/A0A016TJD3.txt /scratch/laddhadi/alphafold_data//uniref90/uniref90.fasta"
I0412 09:13:38.641884 140040161777472 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0412 09:20:02.565351 140040161777472 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 383.923 seconds
I0412 09:20:02.754225 140040161777472 jackhmmer.py:133] Launching subprocess "/home/laddhadi/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpzagxg5ib/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /scratch/laddhadi/fasta_files/A0A016TJD3.txt /scratch/laddhadi/alphafold_data//mgnify/mgy_clusters_2018_12.fa"
I0412 09:20:02.801197 140040161777472 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0412 09:27:44.650721 140040161777472 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 461.849 seconds
I0412 09:27:46.027261 140040161777472 hhsearch.py:85] Launching subprocess "/home/laddhadi/.conda/envs/alphafold/bin/hhsearch -i /tmp/tmp7u0wxikv/query.a3m -o /tmp/tmp7u0wxikv/output.hhr -maxseq 1000000 -d /scratch/laddhadi/alphafold_data//pdb70/pdb70"
I0412 09:27:46.121132 140040161777472 utils.py:36] Started HHsearch query
I0412 09:34:32.186318 140040161777472 utils.py:40] Finished HHsearch query in 406.065 seconds
I0412 09:34:32.942785 140040161777472 hhblits.py:128] Launching subprocess "/home/laddhadi/.conda/envs/alphafold/bin/hhblits -i /scratch/laddhadi/fasta_files/A0A016TJD3.txt -cpu 4 -oa3m /tmp/tmp_07sbvqa/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /scratch/laddhadi/alphafold_data//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /scratch/laddhadi/alphafold_data//uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0412 09:34:33.030554 140040161777472 utils.py:36] Started HHblits query
I0409 18:56:30.223437 139865793787712 utils.py:40] Finished HHblits query in 16437.296 seconds
I0409 18:56:30.833186 139865793787712 templates.py:878] Searching for template for: MEAGGVADSLLSGACVLFTLGMFSSGLSDLRHMRMTRSVDNVQFLPFLTTDINNLSWLSYGALKGDGTLIIVNSVGAMLQTLYILVYLHYCPRKRGVLLQTAALLGVLLLGFGYFWLLVPDLEARLQWLGLFCSVFTISMYLSPLADLAKVIQTKSAQHFSFSLTIATLLASASWTLYGFRLKDPYITVPNFPGIVTSFIRLWLFWKYSQKPARNSQLLQT
I0409 18:56:30.946721 139865793787712 templates.py:267] Found an exact template match 5xpd_A.
I0409 18:56:31.633340 139865793787712 templates.py:267] Found an exact template match 5ctg_B.
I0409 18:56:31.643278 139865793787712 templates.py:267] Found an exact template match 5ctg_B.
I0409 18:56:31.652832 139865793787712 templates.py:267] Found an exact template match 5xpd_A.
I0409 18:56:31.664968 139865793787712 templates.py:267] Found an exact template match 5ctg_B.
I0409 18:56:31.674515 139865793787712 templates.py:267] Found an exact template match 5xpd_A.
I0409 18:56:32.157502 139865793787712 templates.py:267] Found an exact template match 4rng_D.
I0409 18:56:32.281505 139865793787712 templates.py:267] Found an exact template match 4x5m_B.
I0409 18:56:32.459235 139865793787712 templates.py:267] Found an exact template match 4qnd_A.
I0409 18:56:32.464134 139865793787712 templates.py:267] Found an exact template match 4qnd_A.
I0409 18:56:32.468961 139865793787712 templates.py:267] Found an exact template match 4rng_D.
I0409 18:56:32.473401 139865793787712 templates.py:267] Found an exact template match 4x5m_B.
I0409 18:56:32.609357 139865793787712 templates.py:267] Found an exact template match 5uhq_A.
I0409 18:56:32.688276 139865793787712 templates.py:267] Found an exact template match 4qnc_B.
I0409 18:56:32.692711 139865793787712 templates.py:267] Found an exact template match 5uhq_A.
I0409 18:56:32.697443 139865793787712 templates.py:267] Found an exact template match 4qnc_B.
I0409 18:56:32.701975 139865793787712 templates.py:267] Found an exact template match 5uhq_D.
I0409 18:56:32.706474 139865793787712 templates.py:267] Found an exact template match 5uhq_D.
I0409 18:56:32.710904 139865793787712 templates.py:718] hit 5j4i_B did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.04524886877828054.
I0409 18:56:32.710995 139865793787712 templates.py:912] Skipped invalid hit 5J4I_B Arginine/agmatine antiporter; AdiC, Transporter, Membrane Protein, Transport; 2.207A {Escherichia coli O157:H7}, error: None, warning: None
I0409 18:56:32.711062 139865793787712 templates.py:718] hit 3ob6_B did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.04072398190045249.
I0409 18:56:32.711109 139865793787712 templates.py:912] Skipped invalid hit 3OB6_B AdiC protein; Amino acid antiporter, Arginine, Membrane; HET: ARG; 3.0A {Escherichia coli}, error: None, warning: None
I0409 18:56:33.124143 139865793787712 pipeline.py:234] Uniref90 MSA size: 5963 sequences.
I0409 18:56:33.124355 139865793787712 pipeline.py:235] BFD MSA size: 1450 sequences.
I0409 18:56:33.124414 139865793787712 pipeline.py:236] MGnify MSA size: 135 sequences.
I0409 18:56:33.124469 139865793787712 pipeline.py:237] Final (deduplicated) MSA size: 7472 sequences.
I0409 18:56:33.124686 139865793787712 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 18.
I0409 18:56:33.153060 139865793787712 run_alphafold.py:190] Running model model_1_pred_0 on sweet_metazoan_F7D9S0
2022-04-09 18:56:37.475985: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-04-09 18:56:37.493497: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
I0409 18:56:43.118809 139865793787712 model.py:165] Running predict with shape(feat) = {'aatype': (4, 221), 'residue_index': (4, 221), 'seq_length': (4,), 'template_aatype': (4, 4, 221), 'template_all_atom_masks': (4, 4, 221, 37), 'template_all_atom_positions': (4, 4, 221, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 221), 'msa_mask': (4, 508, 221), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 221, 3), 'template_pseudo_beta_mask': (4, 4, 221), 'atom14_atom_exists': (4, 221, 14), 'residx_atom14_to_atom37': (4, 221, 14), 'residx_atom37_to_atom14': (4, 221, 37), 'atom37_atom_exists': (4, 221, 37), 'extra_msa': (4, 5120, 221), 'extra_msa_mask': (4, 5120, 221), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 221), 'true_msa': (4, 508, 221), 'extra_has_deletion': (4, 5120, 221), 'extra_deletion_value': (4, 5120, 221), 'msa_feat': (4, 508, 221, 49), 'target_feat': (4, 221, 22)}
2022-04-09 19:00:32.892367: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:55]
********************************
Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
Compiling module jit_apply_fn.149819
********************************
I0409 19:27:21.522277 139865793787712 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (221, 221, 64)}, 'experimentally_resolved': {'logits': (221, 37)}, 'masked_msa': {'logits': (508, 221, 23)}, 'predicted_lddt': {'logits': (221, 50)}, 'structure_module': {'final_atom_mask': (221, 37), 'final_atom_positions': (221, 37, 3)}, 'plddt': (221,), 'ranking_confidence': ()}
I0409 19:27:21.522848 139865793787712 run_alphafold.py:202] Total JAX model model_1_pred_0 on sweet_metazoan_F7D9S0 predict time (includes compilation time, see --benchmark): 1838.4s
I0409 19:27:26.533854 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:27:28.535824 139865793787712 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I0409 19:27:29.179506 139865793787712 amber_minimize.py:68] Restraining 1742 / 3532 particles.
I0409 19:27:49.272111 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:27:53.467640 139865793787712 amber_minimize.py:497] Iteration completed: Einit 5198.28 Efinal -4264.63 Time 19.18 s num residue violations 0 num residue exclusions 0
I0409 19:27:55.588373 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:27:56.617614 139865793787712 run_alphafold.py:190] Running model model_2_pred_0 on sweet_metazoan_F7D9S0
I0409 19:27:59.180104 139865793787712 model.py:165] Running predict with shape(feat) = {'aatype': (4, 221), 'residue_index': (4, 221), 'seq_length': (4,), 'template_aatype': (4, 4, 221), 'template_all_atom_masks': (4, 4, 221, 37), 'template_all_atom_positions': (4, 4, 221, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 221), 'msa_mask': (4, 508, 221), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 221, 3), 'template_pseudo_beta_mask': (4, 4, 221), 'atom14_atom_exists': (4, 221, 14), 'residx_atom14_to_atom37': (4, 221, 14), 'residx_atom37_to_atom14': (4, 221, 37), 'atom37_atom_exists': (4, 221, 37), 'extra_msa': (4, 1024, 221), 'extra_msa_mask': (4, 1024, 221), 'extra_msa_row_mask': (4, 1024), 'bert_mask': (4, 508, 221), 'true_msa': (4, 508, 221), 'extra_has_deletion': (4, 1024, 221), 'extra_deletion_value': (4, 1024, 221), 'msa_feat': (4, 508, 221, 49), 'target_feat': (4, 221, 22)}
2022-04-09 19:31:45.701855: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:55]
********************************
Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
Compiling module jit_apply_fn__1.149819
********************************
I0409 19:54:31.859675 139865793787712 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (221, 221, 64)}, 'experimentally_resolved': {'logits': (221, 37)}, 'masked_msa': {'logits': (508, 221, 23)}, 'predicted_lddt': {'logits': (221, 50)}, 'structure_module': {'final_atom_mask': (221, 37), 'final_atom_positions': (221, 37, 3)}, 'plddt': (221,), 'ranking_confidence': ()}
I0409 19:54:31.860479 139865793787712 run_alphafold.py:202] Total JAX model model_2_pred_0 on sweet_metazoan_F7D9S0 predict time (includes compilation time, see --benchmark): 1592.7s
I0409 19:54:35.127964 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:54:35.358972 139865793787712 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I0409 19:54:35.695668 139865793787712 amber_minimize.py:68] Restraining 1742 / 3532 particles.
I0409 19:54:50.386341 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:54:50.959881 139865793787712 amber_minimize.py:497] Iteration completed: Einit 5700.67 Efinal -4254.93 Time 13.62 s num residue violations 0 num residue exclusions 0
I0409 19:54:54.265672 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:55:01.368335 139865793787712 run_alphafold.py:190] Running model model_3_pred_0 on sweet_metazoan_F7D9S0
I0409 19:55:03.042567 139865793787712 model.py:165] Running predict with shape(feat) = {'aatype': (4, 221), 'residue_index': (4, 221), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 221), 'msa_mask': (4, 512, 221), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 221, 14), 'residx_atom14_to_atom37': (4, 221, 14), 'residx_atom37_to_atom14': (4, 221, 37), 'atom37_atom_exists': (4, 221, 37), 'extra_msa': (4, 5120, 221), 'extra_msa_mask': (4, 5120, 221), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 221), 'true_msa': (4, 512, 221), 'extra_has_deletion': (4, 5120, 221), 'extra_deletion_value': (4, 5120, 221), 'msa_feat': (4, 512, 221, 49), 'target_feat': (4, 221, 22)}
2022-04-09 19:58:10.149797: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:55]
********************************
Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
Compiling module jit_apply_fn__2.110442
********************************
I0409 20:22:45.222151 139865793787712 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (221, 221, 64)}, 'experimentally_resolved': {'logits': (221, 37)}, 'masked_msa': {'logits': (512, 221, 23)}, 'predicted_lddt': {'logits': (221, 50)}, 'structure_module': {'final_atom_mask': (221, 37), 'final_atom_positions': (221, 37, 3)}, 'plddt': (221,), 'ranking_confidence': ()}
I0409 20:22:45.222905 139865793787712 run_alphafold.py:202] Total JAX model model_3_pred_0 on sweet_metazoan_F7D9S0 predict time (includes compilation time, see --benchmark): 1662.2s
I0409 20:22:47.995997 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:22:48.224998 139865793787712 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I0409 20:22:48.562712 139865793787712 amber_minimize.py:68] Restraining 1742 / 3532 particles.
I0409 20:23:07.021081 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:23:07.633584 139865793787712 amber_minimize.py:497] Iteration completed: Einit 5711.85 Efinal -4302.03 Time 17.35 s num residue violations 0 num residue exclusions 0
I0409 20:23:10.685475 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:23:11.315222 139865793787712 run_alphafold.py:190] Running model model_4_pred_0 on sweet_metazoan_F7D9S0
I0409 20:23:13.004297 139865793787712 model.py:165] Running predict with shape(feat) = {'aatype': (4, 221), 'residue_index': (4, 221), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 221), 'msa_mask': (4, 512, 221), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 221, 14), 'residx_atom14_to_atom37': (4, 221, 14), 'residx_atom37_to_atom14': (4, 221, 37), 'atom37_atom_exists': (4, 221, 37), 'extra_msa': (4, 5120, 221), 'extra_msa_mask': (4, 5120, 221), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 221), 'true_msa': (4, 512, 221), 'extra_has_deletion': (4, 5120, 221), 'extra_deletion_value': (4, 5120, 221), 'msa_feat': (4, 512, 221, 49), 'target_feat': (4, 221, 22)}
I0409 20:50:54.579662 139865793787712 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (221, 221, 64)}, 'experimentally_resolved': {'logits': (221, 37)}, 'masked_msa': {'logits': (512, 221, 23)}, 'predicted_lddt': {'logits': (221, 50)}, 'structure_module': {'final_atom_mask': (221, 37), 'final_atom_positions': (221, 37, 3)}, 'plddt': (221,), 'ranking_confidence': ()}
I0409 20:50:54.580522 139865793787712 run_alphafold.py:202] Total JAX model model_4_pred_0 on sweet_metazoan_F7D9S0 predict time (includes compilation time, see --benchmark): 1661.6s
I0409 20:50:57.246246 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:50:57.478736 139865793787712 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I0409 20:50:57.835792 139865793787712 amber_minimize.py:68] Restraining 1742 / 3532 particles.
I0409 20:51:16.007959 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:51:16.629822 139865793787712 amber_minimize.py:497] Iteration completed: Einit 5922.10 Efinal -4334.91 Time 15.28 s num residue violations 0 num residue exclusions 0
I0409 20:51:18.662594 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:51:19.123664 139865793787712 run_alphafold.py:190] Running model model_5_pred_0 on sweet_metazoan_F7D9S0
I0409 20:51:21.032859 139865793787712 model.py:165] Running predict with shape(feat) = {'aatype': (4, 221), 'residue_index': (4, 221), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 221), 'msa_mask': (4, 512, 221), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 221, 14), 'residx_atom14_to_atom37': (4, 221, 14), 'residx_atom37_to_atom14': (4, 221, 37), 'atom37_atom_exists': (4, 221, 37), 'extra_msa': (4, 1024, 221), 'extra_msa_mask': (4, 1024, 221), 'extra_msa_row_mask': (4, 1024), 'bert_mask': (4, 512, 221), 'true_msa': (4, 512, 221), 'extra_has_deletion': (4, 1024, 221), 'extra_deletion_value': (4, 1024, 221), 'msa_feat': (4, 512, 221, 49), 'target_feat': (4, 221, 22)}
2022-04-09 20:54:27.613558: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:55]
********************************
Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
Compiling module jit_apply_fn__4.110442
********************************
I0409 21:14:56.327582 139865793787712 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (221, 221, 64)}, 'experimentally_resolved': {'logits': (221, 37)}, 'masked_msa': {'logits': (512, 221, 23)}, 'predicted_lddt': {'logits': (221, 50)}, 'structure_module': {'final_atom_mask': (221, 37), 'final_atom_positions': (221, 37, 3)}, 'plddt': (221,), 'ranking_confidence': ()}
I0409 21:14:56.328574 139865793787712 run_alphafold.py:202] Total JAX model model_5_pred_0 on sweet_metazoan_F7D9S0 predict time (includes compilation time, see --benchmark): 1415.3s
I0409 21:14:58.848648 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 21:14:59.077749 139865793787712 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I0409 21:14:59.415221 139865793787712 amber_minimize.py:68] Restraining 1742 / 3532 particles.
I0409 21:15:20.303997 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 21:15:20.877890 139865793787712 amber_minimize.py:497] Iteration completed: Einit 5946.69 Efinal -4265.18 Time 18.49 s num residue violations 0 num residue exclusions 0
I0409 21:15:22.856115 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 21:15:23.404444 139865793787712 run_alphafold.py:271] Final timings for sweet_metazoan_F7D9S0: {'features': 17525.107759714127, 'process_features_model_1_pred_0': 9.964778423309326, 'predict_and_compile_model_1_pred_0': 1838.4048104286194, 'relax_model_1_pred_0': 34.420247316360474, 'process_features_model_2_pred_0': 2.56180739402771, 'predict_and_compile_model_2_pred_0': 1592.6808323860168, 'relax_model_2_pred_0': 22.32586359977722, 'process_features_model_3_pred_0': 1.6738569736480713, 'predict_and_compile_model_3_pred_0': 1662.180498123169, 'relax_model_3_pred_0': 25.30202007293701, 'process_features_model_4_pred_0': 1.6887319087982178, 'predict_and_compile_model_4_pred_0': 1661.5763659477234, 'relax_model_4_pred_0': 24.005033254623413, 'process_features_model_5_pred_0': 1.9084506034851074, 'predict_and_compile_model_5_pred_0': 1415.2962276935577, 'relax_model_5_pred_0': 26.46107578277588}
Components of out file generated by HPC:
SLURM_JOB_NAME = af_g_new1
SLURM_JOB_NODELIST = gpu001
SLURM_JOB_UID = 15585
SLURM_JOB_PARTITION = gpu
SLURM_TASK_PID = 33813
SLURM_CPUS_ON_NODE = 40
SLURM_NTASKS = 40
SLURM_TASK_PID = 33813
timings.json file output
{
"features": 26155.667644023895,
"process_features_model_1_pred_0": 9.124100923538208,
"predict_and_compile_model_1_pred_0": 1900.6409442424774,
"relax_model_1_pred_0": 35.85753798484802,
"process_features_model_2_pred_0": 2.5820021629333496,
"predict_and_compile_model_2_pred_0": 1627.4828066825867,
"relax_model_2_pred_0": 25.265653133392334,
"process_features_model_3_pred_0": 1.7214865684509277,
"predict_and_compile_model_3_pred_0": 1700.9324979782104,
"relax_model_3_pred_0": 35.66818594932556,
"process_features_model_4_pred_0": 1.80977201461792,
"predict_and_compile_model_4_pred_0": 1702.664762020111,
"relax_model_4_pred_0": 25.467517852783203,
"process_features_model_5_pred_0": 1.7546741962432861,
"predict_and_compile_model_5_pred_0": 1457.488024711609,
"relax_model_5_pred_0": 25.19716238975525
}
Thank you,
Aditi
sudo ./run_alphafold.sh -d /home/panfulu/alphafold_database -o ./ -f P00519-2.fasta -t 2020-05-14
/var/tmp/sclLAKGtd: line 8: ./run_alphafold.sh: Permission denied
Hello,
On line 77, you forgot the $ for uniprot and pdb_seqres mkdir
mkdir "$params" "$mgnify" "$pdb70" "$pdb_mmcif" "$mmcif_download_dir" "$mmcif_files" "$uniclust30" "$uniref90" "uniprot" "pdb_seqres"
should be
mkdir "$params" "$mgnify" "$pdb70" "$pdb_mmcif" "$mmcif_download_dir" "$mmcif_files" "$uniclust30" "$uniref90" "$uniprot" "$pdb_seqres"
Greetings,
David
Hi After the installation I get error as:
Traceback (most recent call last):
File "/gpfs/home/js12009/Projects/Projects/AlphaFold_TEST/AlphaFold_2_2_0/Basic/run_alphafold.py", line 31, in
from alphafold.data import pipeline
File "/gpfs/share/apps/miniconda3/gpu/4.9.2/envs/alphafold2.2/lib/python3.8/site-packages/alphafold/data/pipeline.py", line 26, in
from alphafold.data import templates
File "/gpfs/share/apps/miniconda3/gpu/4.9.2/envs/alphafold2.2/lib/python3.8/site-packages/alphafold/data/templates.py", line 31, in
from alphafold.data.tools import kalign
ModuleNotFoundError: No module named 'alphafold.data.tools'
is there a python or conda library I am missing in the installation?
Previous steps:
from alphafold.common import protein
from alphafold.common import residue_constants
seem to work.
First - thanks for the recipe!
During the deployment in the section with dependencies I run into a small problem:
https://github.com/kalininalab/alphafold_non_docker#install-dependencies
conda install -y -c anaconda cudnn==8.2.1
conda install -y -c bioconda hmmer hhsuite==3.3.0 kalign2
conda install -y -c conda-forge openmm==7.5.1 cudatoolkit==11.0.3 pdbfixer
installation of cudnn already pulls cudatoolkit as a dependency:
added / updated specs:
- cudnn==8.2.1
The following NEW packages will be INSTALLED:
cudatoolkit pkgs/main/linux-64::cudatoolkit-11.3.1-h2bc3f7f_2
cudnn pkgs/main/linux-64::cudnn-8.2.1-cuda11.3_0
which later conflicts with the
conda install -y -c conda-forge openmm==7.5.1 cudatoolkit==11.0.3 pdbfixer
Seems that the proper way would be to use both cudnn and cudatoolkit from the same repo - conda-forge. I wonder though where the cudnn dependency came from because it is not mentioned in the reference Dockerfile.
I followed the instruction and successfully installed alphafold on cluster. It partically works, but only one gpu get used.
I added some code in scripts. Logs showed tensorflow did discover 2 gpu, but nvidia-smi revealed data and computing occupied at gpu 0, gpu 1 was idle.
here is log:
$HOME/.local/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --preset has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
warnings.warn(
I0813 13:45:04.035152 140042046441280 templates.py:837] Using precomputed obsolete pdbs $DATA/pdb_mmcif/obsolete.dat.
I0813 13:45:05.206957 140042046441280 tpu_client.py:54] Starting the local TPU driver.
I0813 13:45:05.239395 140042046441280 xla_bridge.py:214] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local://
I0813 13:45:05.629722 140042046441280 xla_bridge.py:214] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
I0813 13:45:14.966193 140042046441280 run_alphafold.ano.py:284] Have 5 models: ['model_1', 'model_2', 'model_3', 'model_4', 'model_5']
I0813 13:45:14.966413 140042046441280 run_alphafold.ano.py:297] Using random seed 8606097073378666681 for the data pipeline
I0813 13:45:15.419880 140042046441280 run_alphafold.ano.py:155] Running model model_1
2021-08-13 13:46:07.772502: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 6942677504 exceeds 10% of free system memory.
I0813 13:46:09.743540 140042046441280 model.py:145] Running predict with shape(feat) = {'aatype': (32, 2179), 'residue_index': (32, 2179), 'seq_length': (32,), 'template_aatype': (32, 4, 2179), 'template_all_atom_masks': (32, 4, 2179, 37), 'template_all_atom_positions': (32, 4, 2179, 37, 3), 'template_sum_probs': (32, 4, 1), 'is_distillation': (32,), 'seq_mask': (32, 2179), 'msa_mask': (32, 508, 2179), 'msa_row_mask': (32, 508), 'random_crop_to_size_seed': (32, 2), 'template_mask': (32, 4), 'template_pseudo_beta': (32, 4, 2179, 3), 'template_pseudo_beta_mask': (32, 4, 2179), 'atom14_atom_exists': (32, 2179, 14), 'residx_atom14_to_atom37': (32, 2179, 14), 'residx_atom37_to_atom14': (32, 2179, 37), 'atom37_atom_exists': (32, 2179, 37), 'extra_msa': (32, 5120, 2179), 'extra_msa_mask': (32, 5120, 2179), 'extra_msa_row_mask': (32, 5120), 'bert_mask': (32, 508, 2179), 'true_msa': (32, 508, 2179), 'extra_has_deletion': (32, 5120, 2179), 'extra_deletion_value': (32, 5120, 2179), 'msa_feat': (32, 508, 2179, 49), 'target_feat': (32, 2179, 22)}
2021-08-13 13:49:42.988439: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 39.13GiB (rounded to 42012920064)requested by op
2021-08-13 13:49:42.991276: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:468] *******************************************************_____________________________________________
2021-08-13 13:49:42.991431: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2040] Execution of replica 0 failed: Resource exhausted: Out of memory while trying to allocate 42012919928 bytes.
visible gpus [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
visible gpus [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
visible gpus [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
visible gpus [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
visible gpus [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
running process_features
2021-08-13 13:45:15 running: process_features
Traceback (most recent call last):
File "run_alphafold.ano.py", line 328, in <module>
app.run(main)
File "$HOME/.local/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "$HOME/.local/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "run_alphafold.ano.py", line 301, in main
predict_structure(
File "run_alphafold.ano.py", line 162, in predict_structure
prediction_result = model_runner.predict(processed_feature_dict)
File "$HOME/alphafold/alphafold-2.0/alphafold/model/model.py", line 147, in predict
result = self.apply(self.params, jax.random.PRNGKey(0), feat)
File "$HOME/.conda/envs/alphafold/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 183, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "$HOME/.conda/envs/alphafold/lib/python3.8/site-packages/jax/_src/api.py", line 399, in cache_miss
out_flat = xla.xla_call(
File "$HOME/.conda/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1561, in bind
return call_bind(self, fun, *args, **params)
File "$HOME/.conda/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1552, in call_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "$HOME/.conda/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1564, in process
return trace.process_call(self, fun, tracers, params)
File "$HOME/.conda/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 607, in process_call
return primitive.impl(f, *tracers, **params)
File "$HOME/.conda/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 610, in _xla_call_impl
return compiled_fun(*args)
File "$HOME/.conda/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 898, in _execute_compiled
out_bufs = compiled.execute(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: Resource exhausted: Out of memory while trying to allocate 42012919928 bytes.
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
--------------------
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "run_alphafold.ano.py", line 328, in <module>
app.run(main)
File "$HOME/.local/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "$HOME/.local/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "run_alphafold.ano.py", line 301, in main
predict_structure(
File "run_alphafold.ano.py", line 162, in predict_structure
prediction_result = model_runner.predict(processed_feature_dict)
File "$HOME/alphafold/alphafold-2.0/alphafold/model/model.py", line 147, in predict
result = self.apply(self.params, jax.random.PRNGKey(0), feat)
File "$HOME/.conda/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 898, in _execute_compiled
out_bufs = compiled.execute(input_bufs)
RuntimeError: Resource exhausted: Out of memory while trying to allocate 42012919928 bytes.
here is nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:0A.0 Off | Off |
| N/A 39C P0 67W / 300W | 29754MiB / 32510MiB | 62% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:00:0B.0 Off | Off |
| N/A 39C P0 55W / 300W | 496MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 52643 C python 29749MiB |
| 1 N/A N/A 52643 C python 491MiB |
+-----------------------------------------------------------------------------+
I am trying to run the non-docker version of alphafold2 in this repo: I succeeded in doing so in an AWS GPU instance with a GPU of 16Gb of RAM, and for the proteins I am inputting, it peaks at around 3Gb of RAM utilisation, by looking at nvidia-smi
while alphafold2 is running.
I am now trying the same with a laptop that has an Nvidia GPU with 4Gb of RAM (see info below), but so far, I am unable to make the same run_alphafold command to see the GPU. Any ideas?:
I0820 11:22:14.030323 140191155447616 templates.py:836] Using precomputed obsolete pdbs /bfx_share1/quick_share/alphafold2/db/pdb_mmcif/obsolete.dat.
I0820 11:22:14.230584 140191155447616 xla_bridge.py:236] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker:
2021-08-20 11:22:14.253180: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
I0820 11:22:14.253384 140191155447616 xla_bridge.py:236] Unable to initialize backend 'gpu': Failed precondition: No visible GPU devices.
I0820 11:22:14.253819 140191155447616 xla_bridge.py:236] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
W0820 11:22:14.253926 140191155447616 xla_bridge.py:240] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I0820 11:22:15.007403 140191155447616 run_alphafold.py:259] Have 1 models: ['model_1']
I0820 11:22:15.007551 140191155447616 run_alphafold.py:272] Using random seed 3180855101326110185 for the data pipeline
I0820 11:22:15.008080 140191155447616 jackhmmer.py:130] Launching subprocess "/home/user/miniconda3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpjdujyngs/output.sto --noali --F1 0.0005 --F2 5e-05 --F
3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /home/user/alphafold/CL-1384189538793.fasta /bfx_share1/quick_share/alphafold2/db/uniref90/uniref90.fasta"
I0820 11:22:15.019448 140191155447616 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0820 11:22:16.779201 140191155447616 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 1.760 seconds
I0820 11:22:16.786322 140191155447616 jackhmmer.py:130] Launching subprocess "/home/user/miniconda3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpvmikh78k/output.sto --noali --F1 0.0005 --F2 5e-05 --F
3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /home/user/alphafold/CL-1384189538793.fasta /bfx_share1/quick_share/alphafold2/db/mgnify/mgy_clusters.fa"
I0820 11:22:16.797401 140191155447616 utils.py:36] Started Jackhmmer (mgy_clusters.fa) query
$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001F91sv000017AAsd00003A41bc03sc00i00
vendor : NVIDIA Corporation
model : TU117M [GeForce GTX 1650 Mobile / Max-Q]
driver : nvidia-driver-460-server - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-470 - distro non-free recommended
driver : nvidia-driver-460 - distro non-free
driver : nvidia-driver-418-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
$ nvidia-smi
Fri Aug 20 11:22:38 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| N/A 38C P8 3W / N/A | 148MiB / 3903MiB | 1% E. Process |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1331 G /usr/lib/xorg/Xorg 55MiB |
| 0 N/A N/A 1373 G /usr/bin/sddm-greeter 88MiB |
+-----------------------------------------------------------------------------+
Hi,
I install AF_non_docker as this git site. I think every thing goes smoothly in installation. When run 'bash run_alphafold.sh', a "Couldn't get ptxas version string" occurred. Any way to fix this issue?
2021-11-20 11:04:09.946403: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 184848424 exceeds 10% of free system memory.
2021-11-20 11:04:10.060729: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 184848424 exceeds 10% of free system memory.
2021-11-20 11:04:10.171764: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 184848424 exceeds 10% of free system memory.
I1120 11:04:10.312324 139757326907200 model.py:165] Running predict with shape(feat) = {'aatype': (4, 173), 'residue_index': (4, 173), 'seq_length': (4,), 'template_aatype': (4, 4, 173), 'template_all_atom_masks': (4, 4, 173, 37), 'template_all_atom_positions': (4, 4, 173, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 173), 'msa_mask': (4, 508, 173), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 173, 3), 'template_pseudo_beta_mask': (4, 4, 173), 'atom14_atom_exists': (4, 173, 14), 'residx_atom14_to_atom37': (4, 173, 14), 'residx_atom37_to_atom14': (4, 173, 37), 'atom37_atom_exists': (4, 173, 37), 'extra_msa': (4, 5120, 173), 'extra_msa_mask': (4, 5120, 173), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 173), 'true_msa': (4, 508, 173), 'extra_has_deletion': (4, 5120, 173), 'extra_deletion_value': (4, 5120, 173), 'msa_feat': (4, 508, 173, 49), 'target_feat': (4, 173, 22)}
2021-11-20 11:04:10.349723: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:81] Couldn't get ptxas version string: Internal: Couldn't invoke ptxas --version
2021-11-20 11:04:10.350581: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: Failed to launch ptxas' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Fatal Python error: Aborted
Thread 0x00007f1bc9d33740 (most recent call first):
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 474 in backend_compile
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 863 in compile_or_get_cached
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 921 in from_xla_computation
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 892 in compile
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 759 in _xla_callable_uncached
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 439 in xla_primitive_callable
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 180 in cached
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 187 in wrapper
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 416 in apply_primitive
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 624 in process_primitive
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 272 in bind
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 408 in shift_right_logical
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/prng.py", line 240 in threefry_seed
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/prng.py", line 202 in seed_with_impl
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/random.py", line 122 in PRNGKey
File "/mnt/mpathb/alphafold2/alphafold/alphafold/model/model.py", line 167 in predict
File "/mnt/mpathb/alphafold2/alphafold/run_alphafold.py", line 193 in predict_structure
File "/mnt/mpathb/alphafold2/alphafold/run_alphafold.py", line 403 in main
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main
File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312 in run
File "/mnt/mpathb/alphafold2/alphafold/run_alphafold.py", line 427 in
I have been working to get both monomer and multimer predictions running on my schools HPC, but I continue to run into the same error with amber relaxation:
Error initializing CUDA: CUDA error (34) at /home/conda/feedstock_root/build_artifacts/openmm_1622798701405/work/platforms/cuda/src/CudaContext.cpp:138
Traceback (most recent call last):
File "/gpfs/share/apps/miniconda3/gpu/4.9.2/envs/alphafold220/alphafold-2.2.0/run_alphafold.py", line 422, in
app.run(main)
File "/gpfs/share/apps/miniconda3/gpu/4.9.2/envs/alphafold220/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/gpfs/share/apps/miniconda3/gpu/4.9.2/envs/alphafold220/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/gpfs/share/apps/miniconda3/gpu/4.9.2/envs/alphafold220/alphafold-2.2.0/run_alphafold.py", line 398, in main
predict_structure(
File "/gpfs/share/apps/miniconda3/gpu/4.9.2/envs/alphafold220/alphafold-2.2.0/run_alphafold.py", line 242, in predict_structure
relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein)
File "/gpfs/share/apps/miniconda3/gpu/4.9.2/envs/alphafold220/alphafold-2.2.0/alphafold/relax/relax.py", line 61, in process
out = amber_minimize.run_pipeline(
File "/gpfs/share/apps/miniconda3/gpu/4.9.2/envs/alphafold220/alphafold-2.2.0/alphafold/relax/amber_minimize.py", line 475, in run_pipeline
ret = _run_one_iteration(
File "/gpfs/share/apps/miniconda3/gpu/4.9.2/envs/alphafold220/alphafold-2.2.0/alphafold/relax/amber_minimize.py", line 419, in _run_one_iteration
raise ValueError(f"Minimization failed after {max_attempts} attempts.")
ValueError: Minimization failed after 100 attempts.
This error can be avoided in monomer predictions by setting the "-r" flag to false, but setting this flag to false in the multimer mode doesn't change anything and I receive the same error that I was getting before.
This is also a general issue in that amber relaxation will not work for either of the modes and if there is a solution, please let me know!
I am getting the following error:
ValueError: Could not find CIFs in /path/to/mmcif_files
I have checked and the files are present, is there a way around this?
We've now installed alphafold_non_docker on a Linux system with an NVIDIA Quadro P1000 (4GB) but the system also has a 2GB NVIDIA card that appears as device 0 in nvidia-smi
.
When attempting to use the bash script with -a 1
, it actually used the smaller card and runs out of memory, which is expected for the input protein which peaks at 3Gb of RAM in another computer where this works successfully.
When attempting without the -a
flag, or with the -a 0
flag, then it runs on the 4Gb device, which is listed as device 1 in nvidia-smi
. It runs for a while, but at the prediction step, it crashes with this error:
You do not need to update to CUDA 9.2.88; cherry-picking the ptxas binary is sufficient.
2021-08-31 12:14:16.286331: W external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH:
:/usr/lib/oracle/12.2/client64/lib/:/usr/lib/oracle/12.2/client64
Traceback (most recent call last):
File "/home/user/alphafold/run_alphafold.py", line 302, in <module>
app.run(main)
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/user/alphafold/run_alphafold.py", line 276, in main
predict_structure(
File "/home/user/alphafold/run_alphafold.py", line 148, in predict_structure
prediction_result = model_runner.predict(processed_feature_dict)
File "/home/user/alphafold/alphafold/model/model.py", line 133, in predict
result = self.apply(self.params, jax.random.PRNGKey(0), feat)
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/api.py", line 405, in cache_miss
out_flat = xla.xla_call(
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1614, in bind
return call_bind(self, fun, *args, **params)
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1605, in call_bind
outs = primitive.process(top_trace, fun, tracers, params) File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1617, in process
return trace.process_call(self, fun, tracers, params)
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 613, in process_call
return primitive.impl(f, *tracers, **params)
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 619, in _xla_call_impl
compiled_fun = _xla_callable(fun, device, backend, name, donated_invars,
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/linear_util.py", line 262, in memoized_fun
ans = call(fun, *args)
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 752, in _xla_callable
out_nodes = jaxpr_subcomp(
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 487, in jaxpr_subcomp
ans = rule(c, axis_env, extend_name_stack(name_stack, eqn.primitive.name),
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/lax/control_flow.py", line 350, in _while_loop_translation_rule
new_z = xla.jaxpr_subcomp(body_c, body_jaxpr.jaxpr, backend, axis_env,
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 487, in jaxpr_subcomp
ans = rule(c, axis_env, extend_name_stack(name_stack, eqn.primitive.name),
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 1060, in f
outs = jaxpr_subcomp(c, jaxpr, backend, axis_env, _xla_consts(c, consts),
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 487, in jaxpr_subcomp
ans = rule(c, axis_env, extend_name_stack(name_stack, eqn.primitive.name),
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/lax/control_flow.py", line 350, in _while_loop_translation_rule
new_z = xla.jaxpr_subcomp(body_c, body_jaxpr.jaxpr, backend, axis_env,
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 478, in jaxpr_subcomp
ans = rule(c, *in_nodes, **eqn.params)
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/lax/linalg.py", line 503, in _eigh_cpu_gpu_translation_rule
v, w, info = syevd_impl(c, operand, lower=lower)
File "/data/miniconda3/envs/alphafold/lib/python3.8/site-packages/jaxlib/cusolver.py", line 281, in syevd
lwork, opaque = cusolver_kernels.build_syevj_descriptor(
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: cuSolver internal error
...
This is with the usual sudo apt-get install nvidia-drivers-460
plus sudo apt-get install nvidia-cuda-toolkit
method. Rebooting and sorting out the 'Secure Boot' malarkey was needed for this laptop.
EDIT: just to make sure that the smaller card wasn't a problem, we attempted to take the smaller card off the computer and reboot. Only the larger 4Gb card appeared in the list in nvidia-smi
, however, he issue remained as described above when trying to run alphafold.
Any ideas what this libcusolver issue could be due to?
Hi! Thank you for this non docker implementation!
I'm trying to generate multiple models with alphafold; it is my understanding that this requires use of the -m option.
I can get the script to work if I input model_1 in the -m option, but if I try a comma separated list with more than model_1 (or even just input any other name other than model_1), I get the following error:
Traceback (most recent call last):
File "/apps/gb/AlphaFold/2.0.0/alphafold/run_alphafold.py", line 303, in
app.run(main)
File "/apps/gb/AlphaFold/2.0.0/lib/python3.7/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/apps/gb/AlphaFold/2.0.0/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/apps/gb/AlphaFold/2.0.0/alphafold/run_alphafold.py", line 253, in main
model_config = config.model_config(model_name)
File "/apps/gb/AlphaFold/2.0.0/alphafold/alphafold/model/config.py", line 32, in model_config
raise ValueError(f'Invalid model name {name}.')
ValueError: Invalid model name model_2.
Here's the command I'm running:
bash $ROOTALPHAFOLD/alphafold/run_alphafold.sh -d /db/AlphaFold -o ./test5/ -m model_1, model_2 -f ./fasta.fa -t 2020-05-14 -b True
Updating jax via:
pip install --upgrade jax jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html
results is jax being upgraded to 0.2.26:
Collecting contextlib2
Using cached contextlib2-21.6.0-py2.py3-none-any.whl (13 kB)
Collecting PyYAML
Using cached PyYAML-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (701 kB)
Installing collected packages: PyYAML, contextlib2, ml-collections, jax
Attempting uninstall: jax
Found existing installation: jax 0.2.26
Uninstalling jax-0.2.26:
Successfully uninstalled jax-0.2.26
Successfully installed PyYAML-6.0 contextlib2-21.6.0 jax-0.2.14 ml-collections-0.1.0
...which causes the following error:
ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74.
If jax must stay jax==0.2.14
(or at least < 0.2.26), then it appears that one must run pip install jax==0.2.14
after running pip install --upgrade jax
The driver version is 11.7
The cudatoolkit version is 11.7
I0922 20:54:25.670873 139785783146304 amber_minimize.py:407] Minimizing protein, attempt 98 of 100.
I0922 20:54:27.095493 139785783146304 amber_minimize.py:68] Restraining 5073 / 10077 particles.
I0922 20:54:27.823423 139785783146304 amber_minimize.py:417] Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)
I0922 20:54:27.830594 139785783146304 amber_minimize.py:407] Minimizing protein, attempt 99 of 100.
I0922 20:54:28.679339 139785783146304 amber_minimize.py:68] Restraining 5073 / 10077 particles.
I0922 20:54:29.414243 139785783146304 amber_minimize.py:417] Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)
I0922 20:54:29.421343 139785783146304 amber_minimize.py:407] Minimizing protein, attempt 100 of 100.
I0922 20:54:30.978424 139785783146304 amber_minimize.py:68] Restraining 5073 / 10077 particles.
I0922 20:54:31.718168 139785783146304 amber_minimize.py:417] Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)
Traceback (most recent call last):
File "/22t/chenhx/software/alphafold-2.2.0/run_alphafold.py", line 422, in
app.run(main)
File "/22t/chenhx/miniconda3/envs/testalphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/22t/chenhx/miniconda3/envs/testalphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/22t/chenhx/software/alphafold-2.2.0/run_alphafold.py", line 398, in main
predict_structure(
File "/22t/chenhx/software/alphafold-2.2.0/run_alphafold.py", line 242, in predict_structure
relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein)
File "/22t/chenhx/software/alphafold-2.2.0/alphafold/relax/relax.py", line 61, in process
out = amber_minimize.run_pipeline(
File "/22t/chenhx/software/alphafold-2.2.0/alphafold/relax/amber_minimize.py", line 475, in run_pipeline
ret = _run_one_iteration(
File "/22t/chenhx/software/alphafold-2.2.0/alphafold/relax/amber_minimize.py", line 419, in _run_one_iteration
raise ValueError(f"Minimization failed after {max_attempts} attempts.")
ValueError: Minimization failed after 100 attempts.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.