nrbennet / dl_binder_design Goto Github PK

View Code? Open in Web Editor NEW

211.0 7.0 51.0 5.93 MB

License: MIT License

Python 97.73% Shell 2.27%

dl_binder_design's Introduction

dl_binder_design

This repo contains the scripts described in the paper Improving de novo Protein Binder Design with Deep Learning.

Third Party Source Code
Setup
Inference
- ProteinMPNN-FastRelax Binder Design
  - Running ProteinMPNN with Fixed Residues
- AlphaFold2 Complex Prediction
Troubleshooting

Third Party Source Code

This repository provides a copy of Brian Coventry's silent tools, these tools are also provided here. This repository provides a wrapper to Justas' ProteinMPNN code and some of the code in the wrapper class is adapted directly from ProteinMPNN code. Finally, this repository provides a version of the AlphaFold2 source code with the "initial guess" modifications described in this paper. The AF2 source code is provided with the original DeepMind license at the top of each file.

Setup

Conda Environment

I have split the single conda env that this repo used to use (dl_binder_design.yml; still provided in <base_dir>/include if anyone is interested) into two smaller and easier to install environments. The old environment required PyTorch, JAX, TensorFlow, and PyRosetta packages to all be compatible with one another, which is difficult. The new environments should be easier to install and I have also added import tests so that it is easier and faster to check that the installation has been successful.

Both of these environments require PyRosetta which requires a license that is free to academics and available here. This license will give you access to the USERNAME and PASSWORD referenced below. If you do not provide this USERNAME and PASSWORD, you will get a CondaHTTPError when you attempt to run the installation.

The steps to installing the environments are as follows:

Ensure that you have the Anaconda or Miniconda package manager
Ensure that you have the PyRosetta channel included in your ~/.condarc
Your ~/.condarc should look something like this:

channels: 
- https://USERNAME:[email protected]
- conda-forge
- defaults

More information about conda installing PyRosetta may be found here: https://www.pyrosetta.org/downloads
Clone this repo

Install ProteinMPNN-FastRelax Environment

Navigate to <base_dir>/include
Run conda env create -f proteinmpnn_fastrelax.yml
Test the environment by activating your environment and running python importtests/proteinmpnn_importtest.py, if you encounter an error in this script then something has gone wrong with your installation. If the script prints out that the tests pass then you have installed this environment correctly. This script will also test whether it can access a GPU, it is not recommended to run ProteinMPNN with a GPU as it is only marginally faster than running on CPU and FastRelax cannot take advantage of the GPU anyway.

Install AlphaFold2 Environment

Navigate to <base_dir>/include
Run conda env create -f af2_binder_design.yml
Test the environment by activating your environment and running python importtests/af2_importtest.py. Run this script from a node which has acccess to the GPU you wish to use for AF2 inference, this script will print out a message about whether it was able to find and use the GPU on your node. If this script hits an error before printing anything then the installation has not been done correctly.

Troubleshooting AF2 GPU Compatibility

Getting a conda environment that recognizes and can run AF2 on your GPU is one of the more difficult parts of this process. Because of the many different GPUs out there, it is not possible for us to provide one .yml file that will work with all GPUs. We provide a CUDA 11.1 compatible env (dl_binder_design.yml) and a CUDA 12 compatible env (af2_binder_design.yml). For other versions of the CUDA driver, you may need to change which CUDA version is installed in the conda env, this can be done by changing the CUDA version in this line of af2_binder_design.yml: - jax[cuda12_pip].

NOTE: This conda environment can only accomodate NVIDIA GPUs at this point in time.

Clone ProteinMPNN

This repo requires the code from ProteinMPNN to work. It expects this code to be in the mpnn_fr directory so we can just clone it to be there

Naviate to <base_dir>/mpnn_fr
Run git clone https://github.com/dauparas/ProteinMPNN.git

Silent Tools

The scripts contained in this repository work with a type of file called silent files. These are essentially a bunch of compressed .pdb files that are all stored in one file. Working with silent files is conventient and saves a lot of disk space when dealing with many thousands of structures.

Brian Coventry wrote a bunch of really nice commandline tools (called silent_tools) to manipulate silent files. These tools are included in this repository but may also be downloaded separately from this GitHub repo.

The two commands that allow you to go from pdb to silent file and back are the following:

pdbs to silent: <base_dir>/silentfrompdbs *.pdb > my_designs.silent

silent to pdbs: <base_dir>/silentextract all_structs.silent

NOTE: Some silent tools require PyRosetta and will fail if run in a Python environment without access to PyRosetta.

Download AlphaFold2 Model Weights

The scripts in this repository expect AF2 weights to be in <base_dir>/model_weights/params and will fail to run if the weights are not there. If you already have AF2 params_model_1_ptm.npz weights downloaded then you may simply copy them to <base_dir>/model_weights/params or create a symlink. If you do not have these weights downloaded you will have to download them from DeepMind's repository, this can be done as follows:

cd <base_dir>/af2_initial_guess
mkdir -p model_weights/params && cd model_weights/params
wget https://storage.googleapis.com/alphafold/alphafold_params_2022-12-06.tar
tar --extract --verbose --file=alphafold_params_2022-12-06.tar

Inference

Summary

The binder design pipeline requires protein binder backbones as an input. The recommended way to generate these backbones is to use RFdiffusion, which will give you a directory of .pdb files. These files can either be turned into silent files which are more memory efficient and easier to work with than a directory of pdb files (and your system administrator will thank you for using them), or you can use these directories of .pdb files as-is with this pipeline, using either the -pdbdir flag alone or the -pdbdir flag in combination with the -runlist flag.

Example Commands

Example commands demonstrating how to run each of these scripts with different types of input can be found here:

<base_dir>/examples

ProteinMPNN-FastRelax Binder Design

Here is an example of how to run ProteinMPNN-FastRelax with a silent file of designs. This will use the default of 1 FastRelax cycle of 1 ProteinMPNN sequence per round (NOTE: running with -relax_cycles > 0 and -seqs_per_struct > 1 is disabled as it leads to an explosion in the amount of FastRelax trajectories being run and is probably bad idea):

<base_dir>/mpnn_fr/dl_interface_design.py -silent my_designs.silent

This will create a file titled out.silent containing your designs. This file can be fed directly to AF2 interface prediction.

With the refactor, this script is now able to read and write both PDB files and silent files. The script I have also added more informative argument messages which can be accessed by running:

<base_dir>/mpnn_fr/dl_interface_design.py -h

NOTE: This script expects your binder design to be the first chain it receives. This script is robust to non-unique indices, unlike the AF2 interface script. NOTE 2: The outputs of this script do not have relaxed sidechains (since sidechains are not input to AF2 and it's not worth the computation to relax them) so the structures will look strange if you visualize them in PyMol, this is perfectly normal, the structures will look better after run though AF2.

Running ProteinMPNN with Fixed Residues

If you used RFdiffusion to generate your binder designs and would like to fix a region, you can use the following command to add 'FIXED' labels to your pdbs which will be recognized by the ProteinMPNN scripts. Thanks to Preetham Venkatesh for writing this!

python <base_dir>/helper_scripts/addFIXEDlabels.py --pdbdir /dir/of/pdbs --trbdir /dir/of/trbs --verbose

These pdb files can be collected into a silent file (or just used as PDB files) and run through the ProteinMPNN script which will detect the FIXED labels and keep those sequence positions fixed.

AlphaFold2 Complex Prediction

Running the interface prediction script is simple:

<base_dir>/af2_initial_guess/predict.py -silent my_designs.silent

This will create a file titled out.silent containing the AF2 predictions of your designs. It will also output a file titled out.sc with the scores of the designs, pae_interaction is the score that showed the most predictivity in the experiments performed in the paper.

With the refactor, this script is now able to read and write PDB and silent files as well as perform both monomer and complex predictions. The arguments for these can be listed by running:

<base_dir>/af2_initial_guess/predict.py -h

NOTE: This script expects your binder design to be the first chain it receives. The binder will be predicted from single sequence and with an intial guess. The target chains will be fixed to the input structure. The script also expects your residue indices to be unique, ie. your binder and target cannot both start with residue 1.

Troubleshooting

One of the most common errors that people have been having is one that looks like this:

Struct with tag SAMETAG failed in 0 seconds with error: <class 'EXCEPTION'>

Where SAMETAG and EXCEPTION can be many different things. What is happening here is that the main loops of both of the scripts provided here are wrapped in a try-catch block; the script tries to run each design and if an error occurs, the script notes which design had an error and continues to the next design. This error catching is convenient when running production-scale design campaigns but is a nuisance for debugging since the messages are not very informative.

If you hit this error, I recommend running the same command that yielded the error but while adding the -debug flag to the command. This flag will make the script run without the try-catch block and errors will print with the standard verbose, easier-to-debug messages.

dl_binder_design's People

Contributors

Stargazers

Watchers

dl_binder_design's Issues

runlist no output

Hi,

According to "dl_interface_design.py --help" it should be possible to select specific pdb-files with the "-runlist" flag.
Although, I seem to have problems getting an output from this command.

I run it like this:

dl_interface_design.py -pdbdir indir -runlist indir/my.pdb -outpdbdir outdir

stdout show no errors and finishes after a few seconds, but no output is written to outdir.

Am I using the syntax wrongly or is this a bug?

//Jesper

<class 'IndexError'> when using dl_interface_design.py

Hi,
i installed model anf run dl_interface_design.py with a pdb generated by RFdiffusion, then i got this error:
Attempting pose: /home/fyy/Desktop/input/pdb/design_enzyme_0.pdb
Struct with tag /home/fyy/Desktop/input/pdb/design_enzyme_0.pdb failed in 0 seconds with error: <class 'IndexError'>

i changed the pdb and the result was the same.How can i do to solve this?

How to use the addFIXEDlabels.py script, what is the meaning for trb

Hi @nrbennet

I was attempting to design a binder protein which had some fixed residues identities in a region.  So I want to use this script to label this region. However, I still don't understand how to use it properly.  In the help information, this scrip should have two options like --pdbdir  --trbdir. Could you tell me what is the exact mean for these options. what does trb represent?  In my understanding, maybe I should provide a expression like this "A41-40" to stand for the residues 41 and 40 in A chain which should be fixed in the design progress.  I guess that maybe this script could add some phrase like "FIXED" into pdb files. Please help me, any suggestion is appreciate.

Best regards,
Ning

How to set seqs_per_struct >=2 when relax_cycles =1 ?

Hi, very great work! And also great for sharing method !
I have one question that it seems that seqs_per_struct must be 1 if relax_cycles > 0, defined in dl_interface_design.py.
How to implement seqs_per_struct >=2 when relax is allowed ?

tensorflow 2.12.1 requires jax>=0.3.15

Hello,

If I strictly follow the yml file, I would encounter the version conflict error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.12.1 requires jax>=0.3.15, but you have jax 0.2.19 which is incompatible.

The conda install has already installed the latest tf 2.12.*, so the jax version specified in the pip section won't be compatible, I am trying to figure out if I need to downgrade the tf version, but just want to in the meantime submit this issue in case there's any official guidance.

Thanks in advance,
Frank

Jupyter notebook available?

I'm wondering if a jupyter notebook is available?

Structure with tag designed_tag failed in 0 seconds with error: <class 'IndexError'>

I am attempting to run the dl_interface_design.py tool on a set of designs I generated using RFdiffusion.

Attempting pose: test_binder_0
Struct with tag test_binder_0 failed in 0 seconds with error: <class 'IndexError'>

When I try to run the predict.py on the same file, I get
Test_binder_0 has already been processed. Skipping

Please can I get some help resolving this issue?
Each pdb file only contain one sequence for the each peptide.

RuntimeError: jaxlib/cusolver.cc:51: operation cusolverDnCreate(&handle) failed: cuSolver internal error

when i run ./af2_initial_guess/predict.py -silent out.silent, it is error in RuntimeError: jaxlib/cusolver.cc:51: operation cusolverDnCreate(&handle) failed: cuSolver internal error.

i try to set os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = "0.5", but it isn't work.

Adapting initial guess without Rosetta

Hi,
I wad wondering if it would be possible to encode the initial guess without relying on Rosetta capabilities (perhaps translating the work to BioPython?) Thanks so much!

More than two chains?

Hi, I was wondering if this implementation has the architecture to handle more than two chains: in the case that I would like to filter my binders against a oligomer target? I feel like it would be difficult to properly model the inter chain relationships. Thanks so much and would love to talk more about this!

Conda environment setup does not work

The Issue is regarding (from Guthub Page):
"Your ~/.condarc should look something like this:
channels:

https://USERNAME:[email protected]
conda-forge
defaults"

My ~/. condarc looks as follows:
(base) [brysting@seneca include]$ conda config --show channels
channels:

Yet, when I set up the environment, I get an error:

(base) [brysting@seneca include]$ conda env create -f dl_binder_design_original.yml

Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

Downloading and Extracting Packages

CondaHTTPError: HTTP 401 UNAUTHORIZED for url <https://conda.graylab.jhu.edu/linux-64/pyrosetta-2023.33 release.9c16e13-py39_0.tar.bz2>
Elapsed: 00:00.327765

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

Minimized AF2 predictions

Hello,

I am doing some testing of the code with several hundreds of designs of 7-15 residues peptidic binders targeting a GPCR I used hotspots to define the binding site during the RFDiffusion prediction.
During the AF2 step, I found major clashes in almost all of them. The lowest PAE_interaction value I have until now is 12.4. Are the AF2 predictions minimized/relaxed?

Thanks in advance.

Guidance Needed: Specifying GPU for AF2_initial_guess

I am currently working with the AF2_initial_guess application on a Ubuntu machine equipped with multiple GPUs. I am interested in designating a specific GPU for running this application to optimize performance.

Could anyone provide detailed instructions or point me towards relevant resources that could assist me in achieving this setup? Your help would be greatly appreciated.

Thank you in advance for your time and assistance.

Running with alternate AF2 models

Hi Nate,

Thanks for making this available! I was able to get it up and running on my own system very easily and it seems to be working well. I was wondering if you have explored using other alphafold models. Based on the code, it seems like it is set up to only use the model_1_ptm model.

Do you have a sense of whether there's a benefit to using all five of the monomer_ptm models?
What about the multimer model? I'm curious if it would work better, but there is also the issue that it was trained on some of complexes I'm testing, which confounds the analysis

Lastly, I've noticed that the runtimes reported in the out.sc file can vary from ~5 s to > 90 s. Is this something you've seen too?

Possible relax protocol issue

I suspect maybe there is an issue with the relax protocol XML file we use.
See the attached test script using pyrosetta, it gives an error:

ERROR: Assertion `chain_index <= pose.conformation().num_chains()` failed.
ERROR:: Exit from: /home/benchmark/rosetta/source/src/core/select/residue_selector/ChainSelector.cc line: 158

However, if I run the rosetta command line, there is no error:
rosetta/2020.08.61146/main/source/bin/relax.static.linuxgccrelease -parser:protocol Relax.xml -in:file:s 1crn.pdb1 -corrections::beta_nov16 true -overwrite

Wonder if you can help check if the issue is with the Relax.xml (which is the same as the RosettaFastRelaxUtil.xml file used in dl_binder_design) or I should file the issue with pyrosetta. Thanks a lot!

debug.zip

Run dl_binder_design with already generated backbones

Dear all,
is it possible to run the dl_binder_design pipeline using a backbone and/or a series of sequences already generated with proteinMPNN? How can I input the target pdb if I already have those generated backbone/sequences?

Thanks a lot!

Marco

Why can't GPU be used for inference?

When I run interfaceAF2predict.py and dl_interface_design.py to make inferences, I found only CPU being used. I'm sure the cuda has been installed by .yml file and GPU works fine in other programs. So how can I use my GPU for inference?

Segementation fault: pyrosetta & tensorflow

Hi all,

You weren't kidding about the tensorflow/GPU issues... it's a pain.

I have tried many combinations. I can install tensorflow and have it recognise the GPU, however importing tensorflow and then pyrosetta causes an immediate segmentation fault. Alternatively, importing pyrosetta and then tensorflow causes a floating point exception. I have tracked the issue down to those two imports. It seems agnostic to python 3.8, 3.9, 3.10 and 3.11.

A minimally working example could probably be made by installing pyrosetta and tensorflow.

mamba create -n error_env
mamba install pyrosetta
python -m pip install "tensorflow[and-cuda"

Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 03:49:32) 
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow.compat.v1 as tf
2023-11-07 14:50:01.994451: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-07 14:50:02.022545: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-07 14:50:02.022586: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-07 14:50:02.022607: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-07 14:50:02.028142: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
>>> import pyrosetta
Segmentation fault (core dumped)

(Note: I have tried export TF_ENABLE_ONEDNN_OPTS=0, however this does nothing).

Reversing the imports:

>>> import pyrosetta
>>> import tensorflow.compat.v1 as tf
2023-11-07 14:51:10.878175: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-07 14:51:10.909619: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-07 14:51:10.909652: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-07 14:51:10.909675: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Floating point exception (core dumped)

Environment:

# packages in environment at /usr/local/programs/miniconda/envs/dl_binder_design:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   2.0.0              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3                    pypi_0    pypi
biopython                 1.81             py39hd1e30aa_1    conda-forge
blas                      2.16                        mkl    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
ca-certificates           2023.7.22            hbcca054_0    conda-forge
cachetools                5.3.2                    pypi_0    pypi
certifi                   2023.7.22                pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
contextlib2               21.6.0             pyhd8ed1ab_0    conda-forge
cudatoolkit               11.1.74              h6bb024c_0    nvidia
dm-haiku                  0.0.5                    pypi_0    pypi
dm-tree                   0.1.6                    pypi_0    pypi
flatbuffers               23.5.26                  pypi_0    pypi
gast                      0.5.4                    pypi_0    pypi
google-auth               2.23.4                   pypi_0    pypi
google-auth-oauthlib      1.0.0                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
idna                      3.4                      pypi_0    pypi
importlib-metadata        6.8.0                    pypi_0    pypi
intel-openmp              2023.1.0         hdb19cb5_46305  
jax                       0.2.19                   pypi_0    pypi
jaxlib                    0.1.70+cuda111           pypi_0    pypi
jmp                       0.0.4                    pypi_0    pypi
keras                     2.14.0                   pypi_0    pypi
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libblas                   3.8.0                    16_mkl    conda-forge
libcblas                  3.8.0                    16_mkl    conda-forge
libclang                  16.0.6                   pypi_0    pypi
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_2    conda-forge
libgfortran-ng            7.5.0               h14aa051_20    conda-forge
libgfortran4              7.5.0               h14aa051_20    conda-forge
libgomp                   13.2.0               h807b86a_2    conda-forge
liblapack                 3.8.0                    16_mkl    conda-forge
liblapacke                3.8.0                    16_mkl    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libsqlite                 3.44.0               h2797004_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_2    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libuv                     1.46.0               hd590300_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markdown                  3.5.1                    pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
mkl                       2020.2                      256  
ml-collections            0.1.1              pyhd8ed1ab_0    conda-forge
ml-dtypes                 0.2.0                    pypi_0    pypi
ncurses                   6.4                  h59595ed_2    conda-forge
ninja                     1.11.1               h924138e_0    conda-forge
numpy                     1.22.4           py39hc58783e_0    conda-forge
nvidia-cublas-cu11        11.11.3.6                pypi_0    pypi
nvidia-cuda-cupti-cu11    11.8.87                  pypi_0    pypi
nvidia-cuda-nvcc-cu11     11.8.89                  pypi_0    pypi
nvidia-cuda-runtime-cu11  11.8.89                  pypi_0    pypi
nvidia-cudnn-cu11         8.7.0.84                 pypi_0    pypi
nvidia-cufft-cu11         10.9.0.58                pypi_0    pypi
nvidia-curand-cu11        10.3.0.86                pypi_0    pypi
nvidia-cusolver-cu11      11.4.1.48                pypi_0    pypi
nvidia-cusparse-cu11      11.7.5.86                pypi_0    pypi
nvidia-nccl-cu11          2.16.5                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
openssl                   3.1.4                hd590300_0    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
pip                       23.3.1             pyhd8ed1ab_0    conda-forge
pyasn1                    0.5.0                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pyrosetta                 2023.44+release.7762b42          py39_0    https://conda.graylab.jhu.edu
python                    3.9.18          h0755675_0_cpython    conda-forge
python_abi                3.9                      4_cp39    conda-forge
pytorch                   1.9.1           py3.9_cuda11.1_cudnn8.0.5_0    pytorch
pyyaml                    6.0.1            py39hd1e30aa_1    conda-forge
readline                  8.2                  h8228510_1    conda-forge
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
tabulate                  0.9.0                    pypi_0    pypi
tensorboard               2.14.1                   pypi_0    pypi
tensorboard-data-server   0.7.2                    pypi_0    pypi
tensorflow                2.14.0                   pypi_0    pypi
tensorflow-estimator      2.14.0                   pypi_0    pypi
tensorflow-io-gcs-filesystem 0.34.0                   pypi_0    pypi
tensorrt                  8.5.3.1                  pypi_0    pypi
termcolor                 2.3.0                    pypi_0    pypi
tk                        8.6.13          noxft_h4845f30_101    conda-forge
typing_extensions         4.8.0              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
urllib3                   2.0.7                    pypi_0    pypi
werkzeug                  3.0.1                    pypi_0    pypi
wheel                     0.41.3             pyhd8ed1ab_0    conda-forge
wrapt                     1.14.1                   pypi_0    pypi
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zipp                      3.17.0                   pypi_0    pypi
zlib                      1.2.13               hd590300_5    conda-forge

I'm fairly happy that tensorflow is working correctly and the CUDA versions are correct. My system is running Driver Version: 510.39.01 CUDA Version: 11.6

>>> import tensorflow.compat.v1 as tf
>>> tf.sysconfig.get_build_info()
OrderedDict([('cpu_compiler', '/usr/lib/llvm-16/bin/clang'), ('cuda_compute_capabilities', ['sm_35', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'compute_80']), ('cuda_version', '11.8'), ('cudnn_version', '8'), ('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)])

Conversely, if I install a CPU only version of tensorflow, both pyrosetta and tensorflow cooperate - however inference is very slow (as expected).

I have tried on a system with GTX3090s (Driver 510.39.01, CUDA 11.6) and a system with an A6000 (Driver 535.54.03, CUDA 12.2).

The issue is similar. When the GPU is not detected i.e. tf.config.list_physical_devices('GPU') reports and empty list [], then pyrosetta and tensorflow cooperate. As soon as tf reports physical devices, I encounter segmentation faults. I have tried with tf-nightly, the most recent pyrosetta, and various python versions.

Any help at all would be very, very appreciated.

Kindly,
Charles

Is "AF2 initial guess" supposed to have access to the standard AF2 databases?

Hello,

My installation seems to be running as expected and it produced some beautifully-folded peptide binders with low pae_interaction scores for my project. My question is how is the "AF2 initial guess" pipeline able to generate a predicted structure without access to all the same databases that are normally required to run AF2? i.e.:

    bfd/                                   # ~ 1.8 TB
    mgnify/                                # ~ 64 GB
    params/                                # ~ 3.5 GB
    pdb70/                                 # ~ 56 GB
    pdb_mmcif/                             # ~ 206 GB
    uniclust30/                            # ~ 87 GB
    uniref90/                              # ~ 59 GB

Did I miss an important part of the installation? Is the predict.py script somehow running the full AF2 installation on my machine? Or am I just ignorant to how your implementation of AF2 works?
I just want to make sure that my dl_binder_design installation is configured properly before I put too much trust into the pae_interaction scores and spend money on peptide orders.

Thank you,

Robert Szabla

Question about AlphaFold initial guess

Hi, I saw that the initial guess used AF2 pTM model rather than multimer model for prediction. I wonder have you tested the Multimer model plus initial position of designed complex for in silico validation?

error with af2_binder_design.yml - ImportError: cannot import name 'SCOPData' from 'Bio.Data - wrong version of biopython

As of 19 Feb 2024, af2_binder_design.yml will install biopython 1.83, resulting in the following error during test ImportError: cannot import name 'SCOPData' from 'Bio.Data' (~/.conda/envs/af2_binder_design/lib/python3.11/site-packages/Bio/Data/__init__.py)

Downgraded to biopython==1.81 from biopython==18.3 and this resolved the issue and resulted in Found a GPU! This environment passes all import tests. `'Bio.Data.SCOPData' module was be deprecated in 1.83

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

example script for <<inputs/pdbs /design_ppi_0.pdb>>

In the examples directly, there is a file design_ppi_0.pdb with chain A as all GLY and a protein target in chain B which would correspond to the output from RFdiffusion.

Do you have a script available that will run MPNN on chain A and then run alphafold on it afterwards? Or two separate scripts to achieve the same result?

My environment is fine and runs all of the example scripts so I am only seeking an RFdiffusion pipeline solution.
thank you
logan donaldson

af2_complex running error

/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/Bio/Data/SCOPData.py:18: BiopythonDeprecationWarning: The 'Bio.
Data.SCOPData' module will be deprecated in a future release of Biopython in favor of 'Bio.Data.PDBData.
warnings.warn(
/scratch/valiente/dl_binder_design/af2_initial_guess/af2_util.py:14: UserWarning: Import of 'rosetta' as a top-level module is deprecated and
may be removed in 2018, import via 'pyrosetta.rosetta'.
from rosetta import *
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
WARNING:tensorflow:From /scratch/valiente/dl_binder_design/af2_initial_guess/alphafold/model/tf/input_pipeline.py:151: calling map_fn (from t
ensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
PyRosetta-4 2023 [Rosetta PyRosetta4.conda.linux.cxx11thread.serialization.CentOS.python39.Release 2023.33+release.9c16e13c3cc4d3ef76e5869e0a
5b44da70cff686 2023-08-18T11:32:04] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
WARNING! No GPU detected running AF2 on CPU
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////

Processing struct with tag: design_ppi_0_dldesign_0_cycle1
The distance between residues 95 and 96 is 57.33 A > limit 3.0 A.
I'm going to insert a chainbreak after residue 95
Running model_1_ptm
Traceback (most recent call last):
File "/scratch/valiente/dl_binder_design/af2_initial_guess/predict.py", line 546, in
if args.debug: af2_runner.process_struct(pdb)
File "/scratch/valiente/dl_binder_design/af2_initial_guess/predict.py", line 282, in process_struct
prediction_result = self.model_runner.apply( self.model_runner.params,
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/src/traceback_util.py", line 166, in reraise_with
filtered_traceback
return fun(*args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/pjit.py", line 253, in cache_miss
outs, out_flat, out_tree, args_flat, jaxpr = _python_pjit_helper(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/pjit.py", line 161, in _python_pjit_helper
args_flat, _, params, in_tree, out_tree, _ = infer_params_fn(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/api.py", line 324, in infer_params
return pjit.common_infer_params(pjit_info_args, *args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/pjit.py", line 491, in common_infer_params
jaxpr, consts, canonicalized_out_shardings_flat = _pjit_jaxpr(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/pjit.py", line 969, in _pjit_jaxpr
jaxpr, final_consts, out_type = _create_pjit_jaxpr(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/linear_util.py", line 345, in memoized_fun
ans = call(fun, *args)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/pjit.py", line 922, in _create_pjit_jaxpr
jaxpr, global_out_avals, consts = pe.trace_to_jaxpr_dynamic(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/profiler.py", line 314, in wrapper
return func(*args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/interpreters/partial_eval.py", line 2155, in t
race_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/src/interpreters/partial_eval.py", line 2177, in t
race_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/linear_util.py", line 188, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/transform.py", line 127, in apply_fn
out, state = f.apply(params, {}, *args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/transform.py", line 383, in apply_fn
out = f(*args, **kwargs)
File "/scratch/valiente/dl_binder_design/af2_initial_guess/alphafold/model/model.py", line 60, in _forward_fn
return model(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/module.py", line 428, in wrapped
out = f(*args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/module.py", line 279, in run_interceptors
return bound_method(*args, **kwargs)
File "/scratch/valiente/dl_binder_design/af2_initial_guess/alphafold/model/modules.py", line 385, in call
_, prev = hk.while_loop(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/stateful.py", line 621, in while_loop
val, state = jax.lax.while_loop(pure_cond_fun, pure_body_fun, init_val)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/src/traceback_util.py", line 166, in reraise_with
filtered_traceback
return fun(*args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/lax/control_flow/loops.py", line 1210, in whil
e_loop
init_vals, init_avals, body_jaxpr, in_tree, *rest = _create_jaxpr(init_val)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/lax/control_flow/loops.py", line 1193, in _cre
ate_jaxpr
body_jaxpr, body_consts, body_tree = _initial_style_jaxpr(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/lax/control_flow/common.py", line 65, in _init
ial_style_jaxpr
jaxpr, consts, out_tree = _initial_style_open_jaxpr(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/lax/control_flow/common.py", line 59, in _init
ial_style_open_jaxpr
jaxpr, _, consts = pe.trace_to_jaxpr_dynamic(wrapped_fun, in_avals, debug)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/profiler.py", line 314, in wrapper
return func(*args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/interpreters/partial_eval.py", line 2155, in t
race_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/src/interpreters/partial_eval.py", line 2177, in t
race_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/linear_util.py", line 188, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/stateful.py", line 616, in pure_body_fun
val = body_fun(val)
File "/scratch/valiente/dl_binder_design/af2_initial_guess/alphafold/model/modules.py", line 377, in
get_prev(do_call(x[1], recycle_idx=x[0],
File "/scratch/valiente/dl_binder_design/af2_initial_guess/alphafold/model/modules.py", line 339, in do_call
return impl(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/module.py", line 428, in wrapped
out = f(*args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/module.py", line 279, in run_interceptors
return bound_method(*args, **kwargs)
File "/scratch/valiente/dl_binder_design/af2_initial_guess/alphafold/model/modules.py", line 165, in call
representations = evoformer_module(batch0, is_training)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/module.py", line 428, in wrapped
out = f(*args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/module.py", line 279, in run_interceptors
return bound_method(*args, **kwargs)
File "/scratch/valiente/dl_binder_design/af2_initial_guess/alphafold/model/modules.py", line 1771, in call
template_pair_representation = TemplateEmbedding(c.template, gc)(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/module.py", line 428, in wrapped
out = f(*args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/module.py", line 279, in run_interceptors
return bound_method(*args, **kwargs)
File "/scratch/valiente/dl_binder_design/af2_initial_guess/alphafold/model/modules.py", line 2066, in call
template_pair_representation = mapping.sharded_map(map_fn, in_axes=0)(
File "/scratch/valiente/dl_binder_design/af2_initial_guess/alphafold/model/mapping.py", line 141, in mapped_fn
remainder_shape_dtype = hk.eval_shape(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/stateful.py", line 688, in eval_shape
out_shape = jax.eval_shape(stateless_fun, internal_state(), *args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/src/traceback_util.py", line 166, in reraise_with
filtered_traceback
return fun(*args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/api.py", line 2807, in eval_shape
out = pe.abstract_eval_fun(wrapped_fun.call_wrapped,
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/interpreters/partial_eval.py", line 670, in ab
stract_eval_fun
_, avals_out, _ = trace_to_jaxpr_dynamic(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/profiler.py", line 314, in wrapper
return func(*args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/interpreters/partial_eval.py", line 2155, in t
race_to_jaxpr_dynamic
jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/src/interpreters/partial_eval.py", line 2177, in t
race_to_subjaxpr_dynamic
ans = fun.call_wrapped(*in_tracers)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/linear_util.py", line 188, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/linear_util.py", line 188, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/stateful.py", line 684, in stateless_fun
out = fun(*args, **kwargs)
File "/scratch/valiente/dl_binder_design/af2_initial_guess/alphafold/model/mapping.py", line 139, in apply_fun_to_slice
return fun(*input_slice)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/stateful.py", line 576, in mapped_fun
out, state = mapped_pure_fun(args, state)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/src/traceback_util.py", line 166, in reraise_with
filtered_traceback
return fun(*args, **kwargs)
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/api.py", line 1258, in vmap_f
out_flat = batching.batch(
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/linear_util.py", line 188, in call_wrapped
ans = self.f(*args, **dict(self.params, **kwargs))
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/stateful.py", line 568, in pure_fun
state_out = difference(state_in, internal_state())
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/haiku/_src/stateful.py", line 313, in difference
params_after = jax.tree_multimap(functools.partial(if_changed, is_new_param),
File "/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/jax/_src/deprecations.py", line 53, in getattr
raise AttributeError(f"module {module!r} has no attribute {name!r}")
jax._src.traceback_util.UnfilteredStackTrace: AttributeError: module 'jax' has no attribute 'tree_multimap'

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

Some questions about pdb_interfaceAF2predict.py

Hello. I see your work is very good, I have some questions to ask you.
I would like to use your predicted "predicted_aligned_error" value to pick out the pool of many complex conformation models with higher accuracy. However, when I was using it, did I find your model with a single input chain? How I want to get "predicted_aligned_error" of compound, how can I do it?

for symmetric multimers

Hi,

I generate hexameric multimer using RFdiffusion.

And then, I want to perform validation whether the designed hexameric multimers will be formed as is or not.

So, I tried to follow the AF2 initial guess protocol mentioned in the supplementary data of the RFdiffusion paper as follows,

/opt/tools/dl_binder_design/af2_initial_guess/predict.py -silent des.silent

but there was an error like this,

Struct with tag run_1_0_mpnn failed in 0 seconds with error: <class 'Exception'>

Currently this code is not available for symmetric multimers ?

Sincerely,

Jongseo

af2_complex_rmsd score?

Apologies if I missed this somewhere, but is there an easy way to get the RMSD following the execution of the dl_binder_design/af2_initial_guess/predict.py script? Or does that require running another piece of the workflow? I see the various plddt and pae outputs in the .silent file but not the af2_complex_rmsd metric discussed in the methods. Thanks!

interfaceAF2predict.py

Hey nrbennet,
Really amazing work !
When I run the interfaceAF2predict.py it return error : FileNotFoundError: [Errno 2] No such file or directory: '/projects/ml/alphafold/params/params_model_1_ptm.npz'

Any suggestion would be appreciated !

IndexError in dl_interface_design.py

Hello,
I'm encountering an issue/need assistance and I'm hoping to get your guidance.

I have activated the dl_binder_design environment and run command like:
~/autodl-tmp/dl_binder_design/mpnn_fr/dl_interface_design.py -silent lhyal_binder.silent -output_intermediates -checkpoint_path ~/autodl-tmp/dl_binder_design/mpnn_fr/ProteinMPNN/vanilla_model_weights/v_48_020.pt
the .silent file is the .pdb file in the output of the RFdiffision.

But I get the following error report as follows.

core.pack.guidance_scoreterms.approximate_buried_unsat_penalty: Rough mem use: 31640 bytes
protocols.relax.FastRelax: CMD: accept_to_best  726.868  22.7618  0.596962  0.55
core.pack.guidance_scoreterms.approximate_buried_unsat_penalty: Building hbond graph
core.pack.guidance_scoreterms.approximate_buried_unsat_penalty: Hbond graph has: 266 edges requiring: 173528 bytes
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue LEU 93.  Returning BOGUS ID instead.
core.conformation.Residue: [ WARNING ] missing an atom: 93  H   that depends on a nonexistent polymer connection! 
core.conformation.Residue: [ WARNING ]  --> generating it using idealized coordinates.
core.scoring.atomic_depth.AtomicDepth: actual boxlength 151, box[132*151*136], resolution  0.504

core.pack.guidance_scoreterms.approximate_buried_unsat_penalty: Rough mem use: 31640 bytes
protocols.relax.FastRelax: CMD: endrepeat  726.868  22.7618  0.596962  0.55
core.pack.guidance_scoreterms.approximate_buried_unsat_penalty: Building hbond graph
core.pack.guidance_scoreterms.approximate_buried_unsat_penalty: Hbond graph has: 266 edges requiring: 173528 bytes
core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue LEU 93.  Returning BOGUS ID instead.
core.conformation.Residue: [ WARNING ] missing an atom: 93  H   that depends on a nonexistent polymer connection! 
core.conformation.Residue: [ WARNING ]  --> generating it using idealized coordinates.
core.scoring.atomic_depth.AtomicDepth: actual boxlength 151, box[132*151*136], resolution  0.504

core.pack.guidance_scoreterms.approximate_buried_unsat_penalty: Rough mem use: 31640 bytes
protocols::checkpoint: Deleting checkpoints of FastRelax
MPNN generated 1 sequences in 1 seconds
core.io.silent.SilentFileData: [ WARNING ] renamed tag lhyal_binder_9_dldesign_0_cycle0 to lhyal_binder_9_dldesign_0_cycle0_1    (SilentStruct with lhyal_binder_9_dldesign_0_cycle0 already exists!)
lhyal_binder_9 reported success. 1 designs generated in 141 seconds
Attempting pose: lhyal_pocket_0
Traceback (most recent call last):
  File "/root/autodl-tmp/dl_binder_design/mpnn_fr/dl_interface_design.py", line 256, in <module>
    main( pdb, silent_structure, mpnn_model, sfd_in, sfd_out )
  File "/root/autodl-tmp/dl_binder_design/mpnn_fr/dl_interface_design.py", line 199, in main
    dl_design( pose, pdb, silent_structure, mpnn_model, sfd_out )
  File "/root/autodl-tmp/dl_binder_design/mpnn_fr/dl_interface_design.py", line 163, in dl_design
    chains = get_chains( pose )
  File "/root/autodl-tmp/dl_binder_design/mpnn_fr/dl_interface_design.py", line 115, in get_chains
    endB = endA + pose.split_by_chain()[2].size()
IndexError

Is it possible to inpaint a missing part of a protein and designing sequences using the dl_binder_design?

Hi all,

I am inpainting and repairing part of a protein. Previously, this was performed using RFDesign (https://github.com/RosettaCommons/RFDesign). Now the repo suggests using RFdiffusion instead.
So I inpainted 14 residues of a protein using RFdiffusion and generated a bunch of structures. They look beautiful.
Now I need to generate AA sequences of the structures using ProteinMPNN-FastRelax protocol to do sequence design, as suggested by the RFdiffusion repo.

After fed the pdb files to the "dl_interface_design.py", I got error messages like this:
PyRosetta-4 2023 [Rosetta PyRosetta4.conda.linux.cxx11thread.serialization.CentOS.python39.Release 2023.38+release.52c4cf62073872b07f5cf0623533318b147b5910 2023-09-19T15:49:21] retrieved from: https://urldefense.com/v3/__http://www.pyrosetta.org__;!!K-Hz7m0Vt54!g-kftWD_6n-wmi1E_dFBo61k6nznk11FFjQKUk5LBifU2FiEtos7BqGHbuL4NkDIC2o1zjgBESZeu-qCf1hK6gc$
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
Found GPU will run ProteinMPNN on GPU
Attempting pose: /dssg/home/acct-clsljd/clsljd/bin/RFdiffusion/examples/example_outputs/YbtE/0925/8.pdb
Struct with tag /dssg/home/acct-clsljd/clsljd/bin/RFdiffusion/examples/example_outputs/YbtE/0925/8.pdb failed in 1 seconds with error: <class 'IndexError'>

These are the commands I used to run the program:
module load miniconda3
source activate dl_binder_design

python /dssg/home/acct-clsljd/clsljd/bin/dl_binder_design/mpnn_fr/dl_interface_design.py
-pdbdir /dssg/home/acct-clsljd/clsljd/bin/RFdiffusion/examples/example_outputs/YbtE/0925
-outpdbdir /dssg/home/acct-clsljd/clsljd/bin/RFdiffusion/examples/Design/YbtE/0925

Thanks for reading this issue thread.

ml_dtypes > 0.2 breaks the af2_binder_design environment

I created the af2_binder_design conda env from af2_binder_design.yml. Conda solved and installed the env with no obvious errors.
But when I try running python importtests/af2_importtest.py in the environment, I get the error:

AttributeError: module 'ml_dtypes' has no attribute 'float8_e4m3b11'. Did you mean: 'float8_e4m3fn'?

After some digging, I found that float8_e4m3b11 was deprecated in version 0.2.0 of ml_dtypes and removed in version 0.3.0.

I can confirm that adding - ml_dtypes==0.2.0 to - pip: in the af2_binder_design.yml file fixes this problem.

Rob

Error when using silent tools

Hi,
After installation according to yml file provided, I run ./silent_tools/silentfrompdbs *.pdb > DmKHC.silent, and strange errors appeared, mainly complaining "Cannot find file ... H3i.am1bcc.fa.mm.params". The path to H3i params seems quite strange:

Sampling Temperature

In the RFdiffusion paper, the sampling temperature for binder design is set to 0.0001. On this github page, the sampling temperature is set at 0.0001. When not performing binder design, the default of 0.1 is used. Is the recommended temperature for binder design still the 0.0001?

issue with silentfrompdbs

In case anybody else runs into the issue where silentfrompdbs gives the following error:

basic.io.database: [ WARNING ] Unable to locate database file chemical/residue_type_sets/fa_standard/residue_types//home/bcov/from/jason/H3i.am1bcc.fa.mm.params

This appears to be the result of somebody hard coding a path that probably worked on the development machine. This appears to be an old issue seen in other deployments of RossettaFold (https://www.rosettacommons.org/node/11552). Changing the original line 53 from:

$jd2_program -l $tmp_list -out:file:silent $tmp_file -out:file:silent_struct_type binary -extra_res_fa /home/bcov/from/jason/H3i.am1bcc.fa.mm.params 1>&2
to:
$jd2_program -l $tmp_list -out:file:silent $tmp_file -out:file:silent_struct_type binary

Seems to allow the program to complete, but I'm not sure of the potential downstream ramifications of this change.

Running without silent tools?

Is it possible to run the pipeline without silent tools in order to run in a Python environment without access to PyRosetta?

-checkpoint_path

Hi,
I followed the instructions and get the following error:
dl_interface_design.py: error: the following arguments are required: -checkpoint_path
What should I do?
Thank you!

Fixing residues of binder chain in ProteinMPNN

Hello!

I am using motif scaffolding in RFdiffusion to incorporate binding motif into de novo scaffold, however, it seems that ProteinMPNN does not fix the motif sequence and redesigns it. Is there an argument for fixing residues by number?

I've checked dl_interface_design.py for arguments, but only found "-fix_FIXED_res" for residues labeled as "FIXED" (in pdb, i guess), but I don't understand where this label would come from.

Thank you!

af2_complex_error

How can i solve this error?
/scratch/valiente/anaconda3/envs/dl_binder_design/lib/python3.9/site-packages/Bio/Data/SCOPData.py:18: BiopythonDeprecationWarning: The 'Bio.
Data.SCOPData' module will be deprecated in a future release of Biopython in favor of 'Bio.Data.PDBData.
warnings.warn(
/scratch/valiente/dl_binder_design/af2_initial_guess/af2_util.py:14: UserWarning: Import of 'rosetta' as a top-level module is deprecated and
may be removed in 2018, import via 'pyrosetta.rosetta'.
from rosetta import *
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
WARNING:tensorflow:From /scratch/valiente/dl_binder_design/af2_initial_guess/alphafold/model/tf/input_pipeline.py:151: calling map_fn (from t
ensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
PyRosetta-4 2023 [Rosetta PyRosetta4.conda.linux.cxx11thread.serialization.CentOS.python39.Release 2023.33+release.9c16e13c3cc4d3ef76e5869e0a
5b44da70cff686 2023-08-18T11:32:04] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
WARNING! No GPU detected running AF2 on CPU
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////

Processing struct with tag: design_ppi_2_dldesign_0_cycle1
The distance between residues 81 and 82 is 54.63 A > limit 3.0 A.
I'm going to insert a chainbreak after residue 81
Running model_1_ptm
Struct with tag /scratch/valiente/trabajo/2018-2019/Philip_M_Kim/RFdifussion_new_test/protein_mp_fr_out/design_ppi_2_dldesign_0_cycle1.pdb fa
iled in 78 seconds with error: <class 'AttributeError'>
Processing struct with tag: design_ppi_1_dldesign_0_cycle1
The distance between residues 74 and 75 is 24.13 A > limit 3.0 A.
I'm going to insert a chainbreak after residue 74
Running model_1_ptm
Struct with tag /scratch/valiente/trabajo/2018-2019/Philip_M_Kim/RFdifussion_new_test/protein_mp_fr_out/design_ppi_1_dldesign_0_cycle1.pdb fa
iled in 5 seconds with error: <class 'AttributeError'>
Processing struct with tag: design_ppi_4_dldesign_0_cycle1
The distance between residues 91 and 92 is 39.92 A > limit 3.0 A.
I'm going to insert a chainbreak after residue 91
Running model_1_ptm
Struct with tag /scratch/valiente/trabajo/2018-2019/Philip_M_Kim/RFdifussion_new_test/protein_mp_fr_out/design_ppi_4_dldesign_0_cycle1.pdb fa
iled in 5 seconds with error: <class 'AttributeError'>
Processing struct with tag: design_ppi_9_dldesign_0_cycle1
The distance between residues 91 and 92 is 55.87 A > limit 3.0 A.
I'm going to insert a chainbreak after residue 91
Running model_1_ptm
Struct with tag /scratch/valiente/trabajo/2018-2019/Philip_M_Kim/RFdifussion_new_test/protein_mp_fr_out/design_ppi_9_dldesign_0_cycle1.pdb fa
iled in 5 seconds with error: <class 'AttributeError'>
Processing struct with tag: design_ppi_3_dldesign_0_cycle1
The distance between residues 78 and 79 is 62.85 A > limit 3.0 A.
I'm going to insert a chainbreak after residue 78
Running model_1_ptm
Struct with tag /scratch/valiente/trabajo/2018-2019/Philip_M_Kim/RFdifussion_new_test/protein_mp_fr_out/design_ppi_3_dldesign_0_cycle1.pdb fa
iled in 5 seconds with error: <class 'AttributeError'>
Processing struct with tag: design_ppi_0_dldesign_0_cycle1
The distance between residues 95 and 96 is 56.01 A > limit 3.0 A.
I'm going to insert a chainbreak after residue 95
Running model_1_ptm
Struct with tag /scratch/valiente/trabajo/2018-2019/Philip_M_Kim/RFdifussion_new_test/protein_mp_fr_out/design_ppi_0_dldesign_0_cycle1.pdb fa
iled in 5 seconds with error: <class 'AttributeError'>
Processing struct with tag: design_ppi_6_dldesign_0_cycle1
The distance between residues 93 and 94 is 30.94 A > limit 3.0 A.
I'm going to insert a chainbreak after residue 93
Running model_1_ptm
Struct with tag /scratch/valiente/trabajo/2018-2019/Philip_M_Kim/RFdifussion_new_test/protein_mp_fr_out/design_ppi_6_dldesign_0_cycle1.pdb fa
iled in 5 seconds with error: <class 'AttributeError'>
Processing struct with tag: design_ppi_5_dldesign_0_cycle1
The distance between residues 96 and 97 is 69.17 A > limit 3.0 A.
I'm going to insert a chainbreak after residue 96
Running model_1_ptm
Struct with tag /scratch/valiente/trabajo/2018-2019/Philip_M_Kim/RFdifussion_new_test/protein_mp_fr_out/design_ppi_5_dldesign_0_cycle1.pdb fa
iled in 5 seconds with error: <class 'AttributeError'>
Processing struct with tag: design_ppi_7_dldesign_0_cycle1
The distance between residues 71 and 72 is 30.48 A > limit 3.0 A.
I'm going to insert a chainbreak after residue 71
Running model_1_ptm
Struct with tag /scratch/valiente/trabajo/2018-2019/Philip_M_Kim/RFdifussion_new_test/protein_mp_fr_out/design_ppi_7_dldesign_0_cycle1.pdb fa
iled in 5 seconds with error: <class 'AttributeError'>
Processing struct with tag: design_ppi_8_dldesign_0_cycle1
The distance between residues 96 and 97 is 36.09 A > limit 3.0 A.
I'm going to insert a chainbreak after residue 96
Running model_1_ptm
Struct with tag /scratch/valiente/trabajo/2018-2019/Philip_M_Kim/RFdifussion_new_test/protein_mp_fr_out/design_ppi_8_dldesign_0_cycle1.pdb fa
iled in 5 seconds with error: <class 'AttributeError'>

Segmentation fault (core dumped) for CUDA12

I have only installed the environment for af2_binder_design.yml, and encountered an issue when running predict.py.
Segmentation fault (core dumped)
The environment is as follows：NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2
jax 0.4.23
jaxlib 0.4.23+cuda12.cudnn89
May I ask how I can solve this problem? Thank you very much

for interfaceAF2predict.py

Hi,

thank you for your useful repository.

In the interfaceAF2predict.py, in line 480,

I think with open (checkpoint_filename, 'a') as f: is missing.

sincerely,

jongseo

Any reason why cysteine is ommitted by default in ProteinMPNN? Or is that just an accident?

Hi,

I noticed the AA Cysteine is omitted by default in the ProteinMPNN design in dl_interface_design.py

parser.add_argument( "-omit_AAs", type=str, default='CX', help='A string of all residue types (one letter case-insensitive) that you would not like to use for design. Letters not corresponding to residue types will be ignored (default: CX)' )

https://github.com/nrbennet/dl_binder_design/blob/bfab591db9a70c9c4eaf449c30ce4a1edcb7c989/mpnn_fr/dl_interface_design.py#L64C1-L64C58

Was this left there by accident or is there a reason for this?

Thanks

segmentation fault (core dumped) when running predict.py in af2_binder_design

I ran the importtest for af2_binder_design and it passed, but when I run predict.py, I get a segmentation fault. I narrowed it down to it being that I am running out of memory when importing packages, specifically jax.numpy and pyrosetta, but I am on a 40gb GPU and I have tried as high as 128G of memory on my university's HPC. How do I overcome this memory issue?

My jax version is 0.4.23 build pypi_0, channel pypi
jaxlib 0.4.23+cuda12.cudann89, pypi_0, pypi
jmp 0.0.4 pypi_0, pypi
pyrosetta 2024.01+release.00b7914, py311_0, graylab

Thanks in advance for your help!

How to use the '-fix_FIXED_res' parameter in the dl_interface_design_multi_seq.py

Hi Nate,

As far as I know, in the 2-chain hallucination results, the positions of the residues we want to fix are not static in the generated PDB files, they change, and we can see the exact positions of these fixed residues in the trb files. So how should we pass these positions to '-fix_FIXED_res'? As you know, we usually design a lot of proteins and these results are converted to a silent file, and there is no information about these fixed residues in the silent file.

sincerely,

Lei yang

differenct result of the AF2 initial guess for the same protein

Hi,

I attempted to create a pyrosetta-free version of this tool. The modified code worked well, but I encountered an issue.

For structure modeling, the original tool uses pyrosetta's dump_pdb to generate a side chain for the ProteinMPNN-designed sequence (Line 125 in af2_util.py). I replaced dump_pdb with openmm's addResidue, and finally, I obtained full-atom structures, and the initial guess worked fine.

However, the structure modeling result (pAE value) is approximately 7-8 for the structure originated from dump_pdb, but it is about 27-28 for the one originated from addResidue. The only differences between the two structures were the coordinates of side chain atoms; backbone atoms remained the same.

Moreover, when I substituted that temporary structure with a side chain re-packed one using rosetta, the resulting pAE value was around 27-28, indicating a non-binder.

I am curious whether the pyrosetta-generated side chain coordination is essential for the AF2 initial guess or not.

Deep Learning Binder Results

Hi @nrbennet ,
I was attempting to design insulin binders of my own (following the example specifications of RFDiffusion) and then running the backbone through MPNN_FR and AF2 initial guess. I compared a potential binder to a benchmark insulin binder from the supplement material in Improving de novo protein binder design with deep learning, and even though the target template was identical, I found confusing results:

InsulinR_mb:
{'plddt_total': 95.02760208110635, 'plddt_binder': 91.0824370734358, 'plddt_target': 96.7371735844303, 'pae_binder': 2.7015252, 'pae_target': 2.4785695, 'pae_interaction': 4.80579948425293, 'time': 146.24850199604407}

design_ppi_scaffolded_6_dldesign_4:
{'plddt_total': 52.28481075101907, 'plddt_binder': 94.86918808372161, 'plddt_target': 33.83158057351464, 'pae_binder': 1.7483437, 'pae_target': 19.690062, 'pae_interaction': 26.69976043701172, 'time': 16.96811721706763}

I am curious as to how you would read these outputs, as shouldn't the plddt of the target both be relatively high given the same template is in use (the template atom positions are identical)? Is this supposed to indicate that my binder, when recapitulates, fails and impacts the accuracy of the target structure? Thanks and would appreciate the help!

Issue with silent file index

Hi! Using this as part of the RFdiffusion protocol. Currently running into the following error when MPNN is being used when running dl_binder_design.py:

core.io.silent.SilentFileData: Finished reading 1 structures from temp.silent
No GPU found, running MPNN on CPU
Attempting pose: temp
Traceback (most recent call last):
File "/home/user/Software/dl_binder_design/mpnn_fr/dl_interface_design.py", line 226, in main( pdb, silent_structure, mpnn_model, sfd_in, sfd_out )
File "/home/user/Software/dl_binder_design/mpnn_fr/dl_interface_design.py", line 169, in main dl_design( pose, pdb, silent_structure, mpnn_model, sfd_out )
File "/home/user/Software/dl_binder_design/mpnn_fr/dl_interface_design.py", line 133, in dl_design chains = get_chains( pose )
File "/home/user/Software/dl_binder_design/mpnn_fr/dl_interface_design.py", line 114, in get_chains endB = endA + pose.split_by_chain()[2].size()

IndexError

I assume it's having some problem reading my silent file? Is there an example silent file I could use to test if it's my silent file or possibly an issue with my environment?

GPU error: Segmentation fault (core dumped)

Hi experts,

 Thanks for your nice work for binder design.  It's pretty good.  I have attempted to use this code, and I have a question to ask you that when using AF2 predict.py script, it appears some weird information like that" warning: Linking two modules of different target triples: 'LLVMDialectModule' is 'nvptx64-nvidia-gpulibs' whereas '' is 'nvptx64-nvidia-cuda' "  and " Segmentation fault (core dumped)".  I don't know whether the former warning is safe.  And the latter error possibly suggests that the memory of GPU is insufficient.  The GPU I used is RTX 3090, whose memory space is 24GB.  So I think whether there is an option to control the memory using, just like batch option or something I don't know.  Please help me. Your suggestion is very important for me.  Thanks in advance.

Sincerely,
Ning

Floating point exception (Segmentation fault) when testing the conda env from af2_binder_design.yml

Hi @nrbennet ,

I'm trying to install the library in a Python 3.11 / Ubuntu 22.04 environment, however, when I'm running importtests/af2_importtest.py, I'm getting a "Floating point exception (core dumped)" error.
When adding debug logging statements to the test script, the error seems to come from the
import tensorflow.compat.v1 as tf statement fromdl_binder_design/af2_initial_guess/alphafold/model/model.py line 18.

Have you seen this issue before?

Could you maybe share with us a requirements.txt file created by pip freeze with a frozen set of compatible Python dependencies?

Thanks,
Daniel

Majority of the sequence is composed of lysines

Hi,

I noticed that the predicted sequence output consists of mostly lysines (K) for my structures. I am predicting the sequence for the binder outputs from RfDiffusion.

Is this expected? Is there anything to be done to counter this?

Thanks

For dl_interface_design.py

Hi,

I tried to use the dl_interface_design.py to design new sequences for my binder, and i accept the warning like these

'core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue LEU 72. Returning BOGUS ID instead.'

'core.conformation.Residue: [ WARNING ] missing an atom: 72 H that depends on a nonexistent polymer connection! '

Is this normal?

sincerely,

Lei yang

pyjd2 in silentfrompdb

Hi, I am trying to use silentfrompdbs in silent_tool, however, an error appeared as following:

./include/silent_tools/silentfrompdbs: line 51: pyjd2: command not found
./include/silent_tools/silentfrompdbs: line 53: silentls: command not found
./include/silent_tools/silentfrompdbs: line 53: silentrename: command not found
rm: cannot remove 'tmp_4T902UGUZSohZ.silent': No such file or directory
rm: cannot remove 'tmp_4T902UGUZSohZ.silent.idx': No such file or directory

It seems that the pyjd2 did not work or find in silentfrompdbs script, any suggestions is appreciated. thanks.

nrbennet / dl_binder_design Goto Github PK

dl_binder_design's Introduction

dl_binder_design

Table of Contents

Third Party Source Code

Setup

Conda Environment

Install ProteinMPNN-FastRelax Environment

Install AlphaFold2 Environment

Troubleshooting AF2 GPU Compatibility

Clone ProteinMPNN

Silent Tools

Download AlphaFold2 Model Weights

Inference

Summary

Example Commands

ProteinMPNN-FastRelax Binder Design

Running ProteinMPNN with Fixed Residues

AlphaFold2 Complex Prediction

Troubleshooting

dl_binder_design's People

Contributors

Stargazers

Watchers

Forkers

dl_binder_design's Issues

Recommend Projects

Recommend Topics

Recommend Org