Giter Club home page Giter Club logo

Comments (13)

msuruzhon avatar msuruzhon commented on June 12, 2024 1

This error is due to some artifacts in the code back when I was refactoring that class. I have created some new commits which you can download from the essexlab/label/dev conda channel under the developmental 1.2.1 version. That being said, I have found another issue with the alignment, so the best bet until i fix it for this case is to turn off the recursive bond breaking algorithm (which is supposed to grow the MCS structure to a bigger physical one) during the first part of the alignment, namely:

from ProtoCaller.Ensemble import Perturbation
pert = Perturbation(ligands_proto[0], ligands_proto[1])
pert.alignToReference(protein_proto.ligand_ref, mcs_parameters=dict(break_recursively=False))
pert.alignToEachOther(mcs_parameters=dict(keep_stereo=False))
morphs = [pert]

This molecule seems to present a lot of problems for the software, so this makes it a very useful test case when the bugs are finally addressed. I will probably add it to the unit tests.

from protocaller.

msuruzhon avatar msuruzhon commented on June 12, 2024

Hi, the code works for me in version 1.2.0, and I also have 8 GB RAM. That being said, it does take a long time and a lot of memory due to the structural differences between the reference crystal ligand and the ligands of interest. This means that the maximum common substructure algorithm needs a lot of computational power to come up with structures that might be obvious to us, or even worse, find another solution to the problem that we are not interested in. That's why I would highly recommend engineering a custom reference ligand (with e.g. Avogadro) which is more similar to your ligands of interest and the code shouldn't struggle in this case.

Otherwise, I suspect that I could optimise the memory consumption and/or the speed of the code in the case of a relatively small common substructure but that part of the code is very complicated and I don't know when I will get the time to do that, and I am also conscious about introducing new bugs to that part of the code.

from protocaller.

kexul avatar kexul commented on June 12, 2024

Hi @msuruzhon , how about use a core-constrained docking here. In Merck KGaa's paper , they switched from flexible ligand alignment to core-constrained docking in most cases. I've found a nice implement by rdkit and rdock.

from protocaller.

kexul avatar kexul commented on June 12, 2024

Here is my alignment result generate by protocaller 1.2.0. Should it be considered as a good alignment?
image

By the way , this is the result of older version of protocaller.
image

Here are the .gro files.
morph.zip

from protocaller.

msuruzhon avatar msuruzhon commented on June 12, 2024

This alignment is an artifact of the current MCS algorithm not handling the (known) case where the stereochemical label of an atom changes between molecules, resulting in a diminished MCS, and therefore bad alignment. It seems that this case should be handled properly, but it will take some time for me to implement it. Until then, a practical solution for that is to create a perturbation "manually" by discarding stereochemical mapping, e.g.:

from ProtoCaller.Ensemble import Perturbation
pert = Perturbation(ligands_proto[0], ligands_proto[1])
pert.alignToReference(protein_proto.ligand_ref)
pert.alignToEachOther(mcs_parameters=dict(keep_stereo=False))
morphs = [pert]

and then pass the morphs to the Ensemble constructor. This should hopefully work for now until I implement a fix at some point.

As for docking, this is certainly an interesting functionality, but it's unfortunately not something I can devote much attention to. If you want to experiment with it, you can certainly extract the parametrised complex_template object from the parametrised protein and save it as a pdb, which you can then use for docking. Afterwards, you can use the resulting ligand coordinates to create the Ligand object, and use the ligand as its own reference, so that no alignment is actually done. I hope this makes sense somewhat. If you manage to figure out how to do a nice workflow, you are welcome to create a pull request for me to review in the future.

from protocaller.

kexul avatar kexul commented on June 12, 2024

I'm glad to implement a docking based workflow, I'll create a pull request as soon as possible.

By the way, the code to disable stereo mapping raise an error here:

INFO:numexpr.utils:NumExpr defaulting to 4 threads.
INFO:root:Downloading ligand files from the Protein Data Bank...
INFO:root:Parametrising ligand ligand0...
WARNING:root:Cannot parametrise unprotonated ligand. Protonating first with default parameters...
INFO:root:Running OpenBabel... 
INFO:root:Running OpenBabel... 
INFO:root:Running antechamber... 
INFO:root:Running parmchk2... 
INFO:root:Running tleap... 
INFO:root:Parametrising ligand ligand1...
WARNING:root:Cannot parametrise unprotonated ligand. Protonating first with default parameters...
INFO:root:Running OpenBabel... 
INFO:root:Running OpenBabel... 
INFO:root:Running antechamber... 
INFO:root:Running parmchk2... 
INFO:root:Running tleap... 
INFO:root:Running PDB2PQR... 
INFO:root:Running OpenBabel... 
INFO:root:Parametrising original crystal system...
INFO:root:Running tleap... 
INFO:root:Running tleap... 
INFO:root:Creating morph ligand0~ligand1...
INFO:root:Morph ligand0~ligand1 is already aligned to a reference
INFO:root:Morph ligand0~ligand1 is already aligned to a reference
Traceback (most recent call last):
  File "prepare.py", line 49, in <module>
    prepare_system(ligands)
  File "prepare.py", line 43, in prepare_system
    system.prepareComplexes()
  File "/root/miniconda3/envs/uii/lib/python3.7/site-packages/ProtoCaller/Ensemble/__init__.py", line 203, in prepareComplexes
    morph_BSS, mcs =  morph.alignAndCreateMorph(self.protein.ligand_ref)
TypeError: cannot unpack non-iterable NoneType object

Here is my code:

import logging
import os
logging.basicConfig(level=logging.INFO)

from ProtoCaller.Utils.fileio import Dir
from ProtoCaller.Ensemble import Ligand, Protein, Ensemble, Perturbation

def prepare_system(ligands) -> None: 
    """
    pdb_file: cocrystal file name 
    ligands: list of ligands file name
    """
    with Dir('SYSTEM', overwrite=True):
        # create a protein from its PDB code and the residue number of the ligand
        # we are going to use for mapping
        #lig = Ligand('/proto/ref.pdb', minimise=False, protonated=True)
        protein_proto = Protein('3eyg', ligand_ref='1')

        ligands_proto = []
        for idx, item in enumerate(ligands):
            li = Ligand(item, protonated=False, minimise=False, name=f"ligand{idx}", workdir="Ligands")
            ligands_proto.append(li)

        # create the morphs from the ligands
        pert = Perturbation(ligands_proto[0], ligands_proto[1])
        pert.alignToReference(protein_proto.ligand_ref)
        pert.alignToEachOther(mcs_parameters=dict(keep_stereo=False))
        morphs = [pert]

        # create a system from the protein and the morphs and set up some default
        # settings regarding system preparation
        system = Ensemble("GROMACS", protein=protein_proto, morphs=morphs,
                          box_length_complex=7, ligand_ff="gaff2",
                          workdir=protein_proto.workdir.path)

        # only keep the reference ligand and keep all crystallographic waters
        system.protein.filter(ligands=None, waters="all")
        # protonate the protein using PDB2PQR and add missing atoms / residues if
        # needed
        system.protein.prepare()
        # prepare the complex and solvated leg starting structures and save them as
        # GROMACS input
        system.prepareComplexes()

if __name__ == '__main__':
    ligands = ['/proto/mola.pdb', 
            '/proto/molb.pdb']

    prepare_system(ligands)

from protocaller.

kexul avatar kexul commented on June 12, 2024

Hi, I updated my protocaller and used the code you provided, the alignment seems perfect now! Thank you so much!

But, it seems like, some abnormal bonds showed in the ligand in morph.gro and complex_final.gro.
For example: atom 46 and 13; atom 46 and 14.
image
The following mdrun failed with LINCS warning complaining about atom distance.

Here is my output files.
3eyg.zip

Is it reasonable? Is there any clue to diagnose the morph.gro and complex_final.gro ? Since the alignment file was temporary file that were removed in the output folder in the normal pipeline.

from protocaller.

msuruzhon avatar msuruzhon commented on June 12, 2024

This is strange, because I get the following alignment when I run it:
snap

These are the output files + the input script:
Files.zip

Maybe you can find the differences between your script and mine? Otherwise this doesn't make any sense.

Also, when you call the alignment method manually, it should create aligned PDB and INPCRD files in the folder you are calling it from, so you can visualise them.

from protocaller.

kexul avatar kexul commented on June 12, 2024

Hi, @msuruzhon , I've checked the difference between our results. The alignment result of ligand1, i.e. molb was similar.
But, in my result, atom 30 was close to atom 9 and atom 42, their distances were 0.9 angstrom. In your result, the same residual was close to the heterocycle, but the closest distance, i.e. the distance between atom 31 and atom 10 was 1.4 angstrom.

image

image

Here is my result of alignment.
alignment.zip

Most of our code are identical, except you parametise the ligands before createing the system.
Here is my code used to generate the system:

import logging
import os
logging.basicConfig(level=logging.INFO)

from ProtoCaller.Utils.fileio import Dir
from ProtoCaller.Ensemble import Ligand, Protein, Ensemble, Perturbation

def prepare_system(ligands) -> None: 
    """
    pdb_file: cocrystal file name 
    ligands: list of ligands file name
    """
    with Dir('SYSTEM', overwrite=True):
        # create a protein from its PDB code and the residue number of the ligand
        # we are going to use for mapping
        #lig = Ligand('/proto/ref.pdb', minimise=False, protonated=True)
        protein_proto = Protein('3eyg', ligand_ref='1')

        ligands_proto = []
        for idx, item in enumerate(ligands):
            li = Ligand(item, protonated=False, minimise=False, name=f"ligand{idx}", workdir="Ligands")
            ligands_proto.append(li)

        # create the morphs from the ligands
        pert = Perturbation(ligands_proto[0], ligands_proto[1])
        pert.alignToReference(protein_proto.ligand_ref, mcs_parameters=dict(break_recursively=False))
        pert.alignToEachOther(mcs_parameters=dict(keep_stereo=False))
        morphs = [pert]

        # create a system from the protein and the morphs and set up some default
        # settings regarding system preparation
        system = Ensemble("GROMACS", protein=protein_proto, morphs=morphs,
                          box_length_complex=7, ligand_ff="gaff2",
                          workdir=protein_proto.workdir.path)

        # only keep the reference ligand and keep all crystallographic waters
        system.protein.filter(ligands=None, waters="all")
        # protonate the protein using PDB2PQR and add missing atoms / residues if
        # needed
        system.protein.prepare()
        # prepare the complex and solvated leg starting structures and save them as
        # GROMACS input
        system.prepareComplexes()

if __name__ == '__main__':
    ligands = ['/proto/mola.pdb', 
            '/proto/molb.pdb']

    prepare_system(ligands)

I think the difference may due to the randomness of rdkit minimization outside of the MCS part.

from protocaller.

msuruzhon avatar msuruzhon commented on June 12, 2024

It does seem to be the case that the minimisation algorithm introduces randomness - I also get different results from different runs. I guess that's another argument for designing custom reference ligands, as alignment can't always be trusted.

from protocaller.

kexul avatar kexul commented on June 12, 2024

Another bad case where default parameters will not work correctly.
test_align.zip

Here is the minimum code to reproduce, it will consume all of memory in my machine.

from ProtoCaller.Ensemble import Ligand, Perturbation

lig1 = Ligand('mol810_protonated.sdf', protonated=True, minimise=False,
                      parametrised_files=['mol810.prmtop', 'mol810.inpcrd'])
lig2 = Ligand('mol812_protonated.sdf', protonated=True, minimise=False,
              parametrised_files=['mol812.prmtop', 'mol812.inpcrd'])
pert = Perturbation(lig1, lig2)
pert.alignToEachOther()

When using mcs_parameters=dict(break_recursively=False), it will work correctly.

from protocaller.

msuruzhon avatar msuruzhon commented on June 12, 2024

This won't be straightforward to fix, and I am considering turning off the recursive bond breaking algorithm by default until it is viable.

from protocaller.

kexul avatar kexul commented on June 12, 2024

Hi @msuruzhon , recently, I tried the rmsdAlign function of BioSimSpace, it produced some good results and may be a good fallback plan when ProtoCaller couldn't do well in some cases.

It's easy to implement it thanks to the modular design of ProtoCaller, here is the code snippet I've used:

class BSSPerturbation(Perturbation):
    def __init__(self, ligand1, ligand2, name=None):
        Perturbation.__init__(self, ligand1, ligand2, name)

    def alignAndCreateMorph(self, ref):
        _runexternal.runExternal(f'obabel {ref.protonated_filename} -O ref.pdb')
        ref_bss = _BSS.IO.readMolecules('ref.pdb').getMolecule(0)

        lig1_bss = _BSS.IO.readMolecules(self.ligand1.parametrised_files).getMolecule(0)
        lig2_bss = _BSS.IO.readMolecules(self.ligand2.parametrised_files).getMolecule(0)
        lig1_aligned = _BSS.Align.rmsdAlign(lig1_bss, ref_bss)
        lig2_aligned = _BSS.Align.rmsdAlign(lig2_bss, lig1_aligned)
        merged = _BSS.Align.merge(lig1_aligned, lig2_aligned)
        return merged, None

from protocaller.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.