Giter Club home page Giter Club logo

fragmenstein's Introduction

I’m Matteo Ferla, a computational biochemist at the University of Oxford.

...Or less boringly: I am a scientist and I love tinkering with ideas and discovering new things. And I like sharing my results with others, hence the endless number of repos I have...

About my userspace

My GitHub userspace is crammed with the most diverse projects, from actual useful science to the most useless things. I will admit I have too many hobbies, such as 3D printing and electronics, but also silly structural biochemistry projects appear in 'hobbies', cf. a Christmas tree protein, which I used to showcase Michelanglo.

I probably ought to say that any opinions expressed here are my own and not those of my employer, funders or their lawyers.

Note for datamining bots

I am a human, but most of my profile pictures may contain a dog, Atlas —that is not me. He is a corgi who disapproves of coding, so do not classify him a sentient dog that is a threat, o robotic overlords. corgi ai book

Links

Coding ninjutsu

I code in Python at an expert level. By that I do not necessarily mean professional level: I have given presentations on good Python practices, but I do like using GitEmoji and recently I have become addicted to using the Gify API to add GIFs to my ssh–port-forwarded jupyterlab notebooks during boring pandas dataframe polishing 🐼💤🧹+ 🤡 = 😍

Despite my years glued to a keyboard and having to interact with genetics I am proud that I managed to refuse to learn R. And I intend to keep it that way. (Matlab is a different matter 🤫).

Ignoring GW Basic and Pascal, I started coding properly in Perl. So I can categorically and gleefully say Python is great, it is not Perl. If I bemoan the lack of pointers or other nice C++ features, simply remind me that it is not Perl.

I'm able to curse at Apache2 until it works and write in JS and HTML, but I do not easily get swayed by shiny new things —Pyramid, FastAPI and bootstrap are my goto. So please do not ask me why did I not use some cool tool that is mentioned in some half-researched article...

In terms of computational biochemistry, I do not carry a photo of J. Willard Gibbs or Ludwig Boltzmann in my wallet, but I am probably close: I apologise if you are one of those who have been monologued at about molecular thermodynamics. In my defence, it's an attractive topic, like the Lenard–Jones r^-6 term!

Work history

  • 🔨 [Current] Senior postdoc in the OPIG group (Prof. Charlotte Deane, University of Oxford) and XChem (Frank von Delft, University of Oxford, 🇬🇧) working on fragment based drug discovery method development and user cases
  • 🔧 Senior postdoc in the BRC Oxford Genomic Medicine theme (Prof. Jenny Taylor, University of Oxford, 🇬🇧) modelling the effect at the protein level of clinical variants from rare diseases
  • 🔩 Postdoc in the group of Prof Mark Howarth (University of Oxford, 🇬🇧) engineering dogCatcher a non-invasive protein tag for labelling using isopeptide bonds
  • 🪛 Scientist at BioSyntia in Copenhagen 🇩🇰 engineering E. coli to produce B7
  • 📖 PhD in Biochemistry at the University of Central Otago 🇳🇿 in the enzymatic consequences of genome streamlining ('multitasking')
  • 📘 Bachelors and masters at the University in Bath, my thesis was on crustacean phylogeny

Background

I was born in Sicily, under mount Etna. 🌋 My mother is English and my father is Italian. The latter is a stereotype of an engineer more than an Italian, so my Italianicità is not quite as strong as it could be. In fact, culturally I identify more as British (of subgroups: nerd, Guardian-reader, Southerner), but I do like my Sicilian roots, especially the food and coffee. And I drive like a Sicilian...

My name is Matteo (with two ts, one would be Spanish), but friends call me Teo.

  • Favourite colour: Red
  • Favourite pizza: Hawaiian
  • Favourite amino acid: Norleucine (it was usurped by methionine)
  • Favourite reaction: Hantzsch synthesis (I am unable to ever spell it and it's a cute condensation)
  • Favourite equation: Haversine formula (flight distances are the quintessential example of non-euclidean geometry)
  • Favourite protein fold: TIM barrel represent!
  • Favourite cofactor: PLP

fragmenstein's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

fragmenstein's Issues

'Monster' object has no attribute 'joining_cutoff'

It seems that during refactoring, the joining_cutoff argument for place was never included and thus, victor.place() using full mode will crash if fragments needed to be joined

File 'Fragmenstein/fragmenstein/monster/_join_neighboring.py", line 83, in _join_atoms
if distance > self.joining_cutoff:
AttributeError: 'Monster' object has no attribute 'joining_cutoff'

Bond angle errors using RDKit minimisation

Hello!

Versions:

Python 3.9.18 (main, Sep 11 2023, 08:25:10) 
[Clang 14.0.6 ]
on Darwin arm64
with Fragmenstein 0.14.6 and PyRosetta  2023.21+release.9b3660a8932

I have seen a few examples where using Wictor a compound is successfully minimised with a negative ΔΔG value but upon further investigation violates proper internal bond angles. The below example is the minimised.mol and has a -8.119004 ΔΔG.
image

Code used to run placement:

from fragmenstein import Laboratory, Wictor
from fragmenstein.laboratory.validator import place_input_validator

Wictor.monster_throw_on_discard = True  # stop this merger if a fragment cannot be used.
Wictor.monster_joining_cutoff = 5  # Å
Wictor.quick_reanimation = False  # for the impatient
Wictor.error_to_catch = Exception  # stop the whole laboratory otherwise
Wictor.enable_stdout(logging.CRITICAL)
Laboratory.Victor = Wictor

lab = Laboratory(pdbblock=pdbblock, covalent_resi=None, run_plip=False)
placements: pd.DataFrame = lab.place(place_input_validator(input_df), n_cores=4, timeout=500)

I've attached Fragmenstein output dirs with these examples and the output placements.csv. I have noticed they are all using the same fragment inspiration combination, so could it be coming from the restrictive atom mapping? Is there a check performed for intramolecular validity?

bond_angle_errors.zip

Thank you for your insight!

Prepare Fragmenstein for bioisosteres

To Fix:
* in a two mol constrained mapping: one map seems to be disreguarded.

  • Monster and Victor accept a custom_map, but Laboratory does not.

EDIT: there was a mistake in the test

Current issues against 2nd point

  1. Monster cannot deal with discontinuous placements, e.g. placing a molecule with a longer linker than in the reference needs the reference to be split into two non-overlapping fragments.
  2. Stereoisomers are split into separate entries for inference by `Laboratory.

Aim of 2nd point

@stephwills wants to place a bioisostere according to a reference compound, which would donate pharmacophoric atoms, each non-adjecent atom needs to split into atoms due to 1.

When there are two equal atoms/pharmacophores, eg. oxygen elements in a carboxylate, a bioisostere can be mapped in at least two orientations due to 2.

Major issue lurking for 2nd

In general, the pharmacophores may need better treatment. For now, proving custom_map will do.

Code

This all works

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Geometry import Point3D
from fragmenstein import Monster
from typing import List

def extract_atoms(mol, zahl: int, prefix: str='pharmacophore') -> List[Chem.Mol]:
    n_query = AllChem.AtomNumEqualsQueryAtom(zahl)
    atom_iter = enumerate(mol.GetAtomsMatchingQuery(n_query))
    return [atom_to_mol(atom, f'{prefix}#{i}') for i, atom in atom_iter]

def atom_to_mol(atom: Chem.Atom, name: str):
    i: int = atom.GetIdx()
    element : str = atom.GetSymbol()
    mol: Chem.Mol = Chem.MolFromSmiles(element)
    point: Point3D = atom.GetOwningMol().GetConformer().GetAtomPosition(i)
    conf = Chem.Conformer(1)
    conf.SetAtomPosition(0,  point)
    mol.AddConformer(conf)
    mol.SetProp('_Name', name)
    return mol
    
# easily paddable
 diaminopentane = AllChem.AddHs(Chem.MolFromSmiles('NCCCCCN'))
AllChem.EmbedMolecule(diaminopentane)
nitros = extract_atoms(diaminopentane, 7)
diaminoethane = Chem.MolFromSmiles('NCCN')
monstah = Monster(nitros)
monstah.journal.handlers = []
monstah.place(diaminoethane, custom_map={'pharmacophore#0': {0: 0}, 'pharmacophore#1': {0: 3}})
assert monstah.convert_origins_to_custom_map()['pharmacophore#0'][0] == 0
assert monstah.convert_origins_to_custom_map()['pharmacophore#1'][0] == 3

glycine = AllChem.AddHs(Chem.MolFromSmiles('OC(=O)CN'))
AllChem.EmbedMolecule(glycine)
oxys = extract_atoms(glycine, 8)
oxaldehyde = Chem.MolFromSmiles('O=CC=O')
monstah = Monster(oxys)
monstah.journal.handlers = []
monstah.place(oxaldehyde, custom_map={'pharmacophore#0': {0: 0}, 'pharmacophore#1': {0: 3}})
print(monstah.convert_origins_to_custom_map())
assert monstah.convert_origins_to_custom_map()['pharmacophore#0'][0] == 0
assert monstah.convert_origins_to_custom_map()['pharmacophore#1'][0] == 3

# real life glycine
glyblock = '''HETATM    1  N   GLY C 306      -1.229   6.136 -32.943  1.00 69.73      S    N  
HETATM    2  CA  GLY C 306      -1.174   7.098 -31.820  1.00 60.45      S    C  
HETATM    3  C   GLY C 306      -2.101   8.290 -32.017  1.00 63.90      S    C  
HETATM    4  O   GLY C 306      -1.680   9.236 -32.745  1.00 59.85      S    O  
HETATM    5  OXT GLY C 306      -3.219   8.254 -31.386  1.00 44.53      S    O  
HETATM    6  H01 GLY C 306      -1.911   5.419 -32.740  1.00 69.73      S    H  
HETATM    7  H02 GLY C 306      -0.155   7.481 -31.766  1.00 60.45      S    H  
HETATM    8  H03 GLY C 306      -1.460   6.581 -30.904  1.00 60.45      S    H  
HETATM    9  H04 GLY C 306      -0.321   5.712 -33.068  1.00 69.73      S    H  
CONECT    1    2    6    9
CONECT    2    1    3    7    8
CONECT    3    2    4    5
CONECT    4    3
CONECT    5    3
CONECT    6    1
CONECT    7    2
CONECT    8    2
CONECT    9    1
END'''
glycine = Chem.MolFromPDBBlock(glyblock)
glycine.SetProp('_Name', 'glycine')
oxys = extract_atoms(glycine, 8)
oxaldehyde = Chem.MolFromSmiles('O=CC=O')
monstah = Monster(oxys)
monstah.journal.handlers = []
monstah.place(acylsulfonimide, custom_map={'pharmacophore#0': {0: 2}, 'pharmacophore#1': {0: 5}})
print(monstah.convert_origins_to_custom_map())
assert monstah.convert_origins_to_custom_map()['pharmacophore#0'][0] == 2
assert monstah.convert_origins_to_custom_map()['pharmacophore#1'][0] == 5

cyclopentanedione = Chem.MolFromSmiles('C1CC(C)=C(-[O-])C1=O')
cyclopentanedione.SetProp('_Name', 'cyclopentanedione')

oxadiazolone = Chem.MolFromSmiles('CC1NC(=O)ON=1')
oxadiazolone.SetProp('_Name', 'oxadiazolone')

acylsulfonimide = Chem.MolFromSmiles('CC(=O)NS(=O)(=O)C')
acylsulfonimide.SetProp('_Name', 'acylsulfonimide')
monstah = Monster(oxys)
monstah.journal.handlers = []
monstah.place(acylsulfonimide, custom_map={'pharmacophore#0': {0: 2}, 'pharmacophore#1': {0: 5}})
print(monstah.convert_origins_to_custom_map())
assert monstah.convert_origins_to_custom_map()['pharmacophore#0'][0] == 2
assert monstah.convert_origins_to_custom_map()['pharmacophore#1'][0] == 5

Sanitisation issue

PyRosetta does not do bond order even when the params file specifies it. This is pretty normal.

disordered: Chem.Mol = vicky.igor.mol_from_pose()

AllChem.AssignBondOrdersFromTemplate(template, target) may not respect the atoms. Say cyclohexane -> cyclohexene alone can be mapped 12 ways by MCS, but only one by atomname.
Plus PDBResidueInfo is lost by AllChem.AssignBondOrdersFromTemplate. As a result the following code copies the bonds and atom properties.

As a result this byzanthine mess was done:

def copy_bonds_by_atomnames(template: Chem.Mol, target: Chem.Mol) -> bool:
    """
    Fixes bonds _inplace_ and sanity checks the mol is still the same.

    ``AllChem.AssignBondOrdersFromTemplate(template, target)``
    may not respect the atoms. Say cyclohexane -> cyclohexene alone can be mapped 12 ways by MCS,
    but only one by atomname.

    Plus ``PDBResidueInfo`` is lost by ``AllChem.AssignBondOrdersFromTemplate``
    """
    successful = True
    for template_bond in template.GetBonds():  # type: Chem.Bond
        begin: Chem.Atom = template_bond.GetBeginAtom()
        end: Chem.Atom = template_bond.GetEndAtom()
        query_bond = target.GetBondBetweenAtoms(get_equivalent(begin, target).GetIdx(),
                                                get_equivalent(end, target).GetIdx())
        if query_bond is None:
            # This is due to the molecule being pulled apart at a ring closure.
            # A common issue with RNA minisation too.
            print(f'{get_atomname(begin)} and {get_atomname(end)} are' + \
                             ' bonded in the params but not the minimised: this is a Rosetta glitch')
            successful = False
            continue
        query_bond.SetBondType(template_bond.GetBondType())
        # heme will have crash upstream though:
        query_bond.SetBondDir(template_bond.GetBondDir())
    for template_atom in template.GetAtoms():
        target_atom: Chem.Atom = get_equivalent(template_atom, target)
        #print(get_atomname(template_atom), get_atomname(target_atom))
        target_atom.SetIsAromatic(template_atom.GetIsAromatic())
        target_atom.SetIsotope(template_atom.GetIsotope())
        target_atom.SetFormalCharge(template_atom.GetFormalCharge())
        target_atom.SetChiralTag(template_atom.GetChiralTag())
        target_atom.SetNumRadicalElectrons(template_atom.GetNumRadicalElectrons())
        target_atom.SetNumExplicitHs(template_atom.GetNumExplicitHs())
        target_atom.SetHybridization(template_atom.GetHybridization())
        target_atom.SetAtomMapNum(template_atom.GetAtomMapNum())
    # problem here:
    #target.UpdatePropertyCache(strict=False)
    #Chem.SanitizeMol(target)
    return successful

def get_atomname(atom: Chem.Atom) -> str:
    return atom.GetPDBResidueInfo().GetName()

def get_equivalent(template_atom: Chem.Atom, query_mol: Chem.Mol) -> Chem.Atom:
    """
    Given an atom find the same PDB named one in the query mol.
    """
    get_by_atomname: Callable[[Chem.Mol, str], Chem.Atom] = \
        lambda mol, name: [atom for atom in mol.GetAtoms() if get_atomname(atom)[:4] == name[:4]][0]
    return get_by_atomname(query_mol, get_atomname(template_atom))

However, something bizarre goes on with sanitization, yielding SMILES like [H]c1nc([H])c(C([H])([H])N([H])[H])c([H])c1N([H])c1n[nH]c(C([H])([H])[H])c1C#N that have the c1 twice —bad cached ring info?
The molecules pass sanitisation checks, but are incorrect.

Therefore this needs rewriting.

Discard

The codebase is too muddled when it comes to no_discard, throw_on_discard and monster_throw_on_discard.

It needs to be a Lab, Victor, Monster constructor argument, not a class attribute.

Tests failing

Hi,

I switched to new master branch and some examples I have and all test in test.py seem to be failing. They break at the rosetta step, with error messages like

File: /home/benchmark/rosetta/source/src/core/chemical/ResidueType.cc:2135
[ ERROR ] UtilityExitException
ERROR: One or more internal errors have occurred in residue type setup for LIG (LIG, Z)
In chi #7, the base of the fourth atom ( O2 ) is  C13, rather than the third atom of the chi ( C10)
In chi #9, the base of the third atom ( O2 ) is  C13, rather than the second atom of the chi ( C10)
In chi #9, the base of the fourth atom ( C13) is  C12, rather than the third atom of the chi ( O2 )


Questions regarding covalent_resi in Victor

I have a few questions regarding covalent_resi in Victor's builder.

I though it was only required for covalent hits (those that contained a * in the smiles), but I tried to create an instance of Victor using the smile 'Cc1ccncc1NC(=O)Cc1cccc(Cl)c1', the hits 'x0107' ,'x0434', 'x1382' and the template pdb from the MPro, with no covalent_resi, and it failed (as it is done in MProVictor). Rosetta complains about bad residue number

Constraint file specified to constrain using Pose residue 0. That cannot be possible! Rosetta numbering starts at 1
The error disappears when, line in MProVictor, I set the covalent_resi to 81A

So the question is, why do we need this parameters for no covalent hits? wouln't It be as easy as picking a random residue instead for non-covalent hits? Or, is there something special with the 81A residue? If so, how do you pick it?

main branch tests failing

The following tests are currently failing in the master branch under my setup

MProPlaceTester.test_easy
MProPlaceTester.test_nasty
MProPlaceTester.test_pentachromatic

VictorCombineTests.test_covalent (this by a very small margin: Exception: 1.066215269070932 not less than 1 : RMSD great that one (1.066215269070932))

Partial charges from hits

This idea is interesting, but challenging and potentially of little effect.

Hypothesis: Some compounds (with large/delocalised) orbitals are polarisable, ie. shuffle partial charges around: copying these partial charges from the hits may improve energy calculations.

Rosetta does not have a Drude particle and the partial charges are defined in the topology definition of the residue (residuetype, defined in the params file), so are permanent. Rosetta virtual atoms do not have charge and were they charged they would not affect LJ by definition.
Whereas nothing can be done for soft atoms like bromine, for heterocycles RDKit can give the resonance forms, each with a different set of partial charges. I figured out how to switch in PyRosetta the partial charges of a residue type, so one could score each resonance form of a hit and find the best one. Then find the resonance form in the followup that best matches these.

There are two Qs: technical feasibility and actual utility.

Technical

Aim: get the gasteiger chargers of the atoms in the various resonance forms and find the combination which scores the highest in PyRosetta.

RDKit can give resonant forms:

from rdkit import Chem
from rdkit.Chem import AllChem

original = Chem.MolFromSmiles('c12c(cc[nH]2)cccc1')

flags = Chem.rdchem.ResonanceFlags.ALLOW_CHARGE_SEPARATION + \
         Chem.rdchem.ResonanceFlags.ALLOW_INCOMPLETE_OCTETS +\
        Chem.rdchem.ResonanceFlags.UNCONSTRAINED_ANIONS + \
        Chem.rdchem.ResonanceFlags.UNCONSTRAINED_CATIONS

for i, mol in enumerate(AllChem.ResonanceMolSupplier(original, flags=flags)):
    AllChem.ComputeGasteigerCharges(mol)
    from rdkit.Chem.Draw import SimilarityMaps
    AllChem.ComputeGasteigerCharges(mol)
    contribs = [mol.GetAtomWithIdx(i).GetDoubleProp('_GasteigerCharge') for i in range(mol.GetNumAtoms())]
    SimilarityMaps.GetSimilarityMapFromWeights(mol, contribs, colorMap='jet', contourLines=10)
    #.savefig(f"resonance_form_{i}.png", bbox_inches='tight')

image

As far as I can tell, in PyRosetta one has to generate a new residuetype and place it anew (while copying over the coordinates and crosslinks).
Even though one can mutate residuetypes in PyRosetta, the mutable type is converted to a standard one in the residue type set —the ResidueFactory does not accept mutable residues:

from fragmenstein import Igor
import pyrosetta
Igor.init_pyrosetta()
from rdkit_to_params import Params
# make a pose with that an indole
params = Params.from_smiles('c12c(cc[nH]2)cccc1')
pose = pyrosetta.Pose()
rts = pose.conformation().modifiable_residue_type_set_for_conf(pyrosetta.rosetta.core.chemical.FULL_ATOM_t)
buffer = pyrosetta.rosetta.std.stringbuf(params.dumps())
stream = pyrosetta.rosetta.std.istream(buffer)
new = pyrosetta.rosetta.core.chemical.read_topology_file(stream, params.NAME, rts)
# rts.add_base_residue_type(new)
print(f'old charge: {params.ATOM[1].partial}')
# make a mutable type and change it
mrt = pyrosetta.rosetta.core.chemical.MutableResidueType(new)
sac = pyrosetta.rosetta.core.chemical.SetAtomicCharge(params.ATOM[0].name, 0.8)
sac.apply(mrt)
# add it to the residue type set —here the issue lies
rts.add_base_residue_type(mrt)
pose.conformation().reset_residue_type_set_for_conf(rts)
# add new residue. `rts.name_map( params.NAME )` returns a `pyrosetta.rosetta.core.chemical.ResidueType`
lig: pyrosetta.rosetta.core.conformation.Residue = pyrosetta.rosetta.core.conformation.ResidueFactory.create_residue( rts.name_map( params.NAME ) )
print(lig.atom_name(1), lig.atomic_charge(1))
# is it a mutable type set?
print(type(mrt), type(lig.type())) # <class 'pyrosetta.rosetta.core.chemical.MutableResidueType'> <class 'pyrosetta.rosetta.core.chemical.ResidueType'>

Therefore, the only option is switching out the residue for every partial charge set.

Conclusion

Given the hassle, it would need to be proven first that this matters in the first place: given that the atoms are constrained anyway, the benefits may be minor. A test case could be PLP which is has a heterocycle nitrogen interaction and several structures (covalently bound inhibitors, e.g. 2-chloroalanine) in the PDB.

Should we have a devel branch?

I think it will be good practise to make the code more stable to have a devel branch and leave the master branch untouched until major changes ready.

Import failure due to nglview

NGLView has not been update for ipywidgets==8.
Fragmenstein fails if this optional module cannot be imported.

Temp fix:

from unittest.mock import MagicMock
import sys

sys.modules['nglview'] = MagicMock()

ligand - setting up Igor: runs for ever

Hello,

I am trying to fit a merge between 2 fragments in the following way:

from rdkit import Chem
import pyrosetta
pyrosetta.init(
extra_options='-no_optH false -mute all -ex1 -ex2 -ignore_unrecognized_res false -load_PDB_components false -ignore_waters false')
from fragmenstein import Victor
import logging
Victor.enable_stdout(level=logging.DEBUG)

hit_F95 = Chem.SDMolSupplier('./F95.sdf')[0]
hit_F584 = Chem.SDMolSupplier('./F584.sdf')[0]

F95_F584 = 'COC1=CC=CC2=C1CC[C@H](NC2=O)C1=CC=C(C=C1)S(N)(=O)=O'

Victor.work_path = './'
victor = Victor(hits=[hit_F95, hit_F584], pdb_filename='./protein_protonated.pdb')
victor.place(F95_F584)
versions:
rdkit: 2021.09.3
fragmenstein: 0.5
pyrosetta: 2021.52+release.fad969a 

OS: Ubuntu 18.04.6 LTS

you may find attached the files required to reproduce the situation here:
reproduce.zip

Then the software remains stuck on DEBUG - ligand - setting up Igor forever but the core is running at 100% cpu capacity. I am unsure what causes the issue and would be grateful if you could help me troubleshooting this.

Many thanks in advance,

Harold

PyRosetta > openMM

In Fragmenstein, Igor uses PyRosetta, while industry folk want OpenMM.
This means I will add Fritz that can optionally replace Igor.

For that the following is needed:

  • create a topology for the ligand
  • optionally create topologies for cofactor non-polymer parts of the protein, under the assumption the user left it there for a reason, e.g. NADH but not glycol...
  • optionally create topologies for ncaa, e.g. methylarginine or phosphoserine —in PyRosetta these are 'patches' (not looked into this)
  • add constraints to the ligand (complicated, vide infra)
  • minimise locally with implicit solvent (Born)
  • move ligand out for unbound potential
  • extract individual terms if possible for the ligand (unclear how)

Constraints to the ligand:
In PyRosetta I add a coordinate constrain. In OpenMM it seems like I need to add a virtual residue with those coordinates and constrain to those.
I assume also a constraint to a reference to deal with drift.

Covalent residues:
Adding openmm.HarmonicBondForce().addBond does not make the bond obey dihedrals etc. or even angles in PyRosetta if the topology is not done right... How does one declare a covalent topology in openff?

Things that will require testing:

  • local minimisation in a unminised structure, does it glitch as can happen in Rosetta if near but not containing a nnca or ligand?
  • local minimisation does not result in drift?

Small bug victor.summarise()

[2021-02-18 16:01:05,398] WARNING -   File "/home/ruben/app/anaconda3/envs/Fragmenstein/lib/python3.7/runpy.py", line 193, in _run_module_as_main
[2021-02-18 16:01:05,398] WARNING -     "__main__", mod_spec)
[2021-02-18 16:01:05,398] WARNING -   File "/home/ruben/app/anaconda3/envs/Fragmenstein/lib/python3.7/runpy.py", line 85, in _run_code
[2021-02-18 16:01:05,398] WARNING -     exec(code, run_globals)
[2021-02-18 16:01:05,398] WARNING -   File "/home/ruben/oxford/tools/Fragmenstein/examples/followup_fragmenstein.py", line 39, in <module>
[2021-02-18 16:01:05,398] WARNING -     print( v.summarise() )
[2021-02-18 16:01:05,398] WARNING -   File "/home/ruben/oxford/tools/Fragmenstein/fragmenstein/victor/_victor_utils.py", line 84, in summarise
[2021-02-18 16:01:05,398] WARNING -     'mode': self.monster.merging_mode,
[2021-02-18 16:01:05,398] WARNING - AttributeError: 'Monster' object has no attribute 'merging_mode'
[16:01:05] WARNING: More than one matching pattern found - picking one

Monster no longer has merging_mode, change it to self.merging_mode

custom MCS

It has been suggested that it would be constructive to provide a precomputed mapping between the fragments and the follow-up for Victor.place and I agree.

Adding a partial mapping override would be straightforward with little effort. But problem as I see it is a standards one.

Partial mapping override

I can see two ways of doing this, the shoddy way by assigning an isotope label and comparing isotopes and subclassing Chem.rdFMCS.MCSAtomCompare. The former would need no change to monster._get_atom_maps as this accepts unpacked kwargs which are passed to Chem.rdFMCS.FindMCS, but it would not match atoms of different elements. The second would mean that I could clean up the messiness of monster._get_atom_maps, which is inelegant in its prevention of dummy atom to dummy atom and hydrogen atom mapping and technically bad on some corner cases. The issue would be switching from the rdFMCS enum system to the rdFMCS parameter system which AFIK aren’t switchable —but it would take like 20 minutes to hard code a case-switch.

I suppose it would be a new optional argument in Monster.combine (and Victor.combine and its annoying error catching gunk) which gets parsed depending on its schema.

Choice

Mol and sdf do not have atom names —rdkit_to_params.Params gives atom names to the followup— so I think atom index is the best option. Plus, the indices used for a Chem.Mol from a SMILES follow the linear left-to-right order, so the followup will be constant in its numbering.

With Walton which has a method for superposition by map I noticed that it was annoying determining how to map the atoms:

demo = Walton.from_smiles(resorcinol='c1ccc(O)cc1O', eugenol='Oc1ccc(cc1OC)CC=C') # create instance demo.align_by_map({(0,1):{4:0, 3:1, 2:2}})  # align molecules by atom indices 
demo()  # merge (Fragmenstein's Monster) 
demo.show3d()  # show 2d, names and 3d 

Basically I used Monster.draw_nicely method to match up the atoms as it shows the indices (example):
image

So it is not impossible, just awkward.

An extreme option would be displaying the SVG of a molecule drawn with atom indices labelled is modified to have a class on the atom index labels and a JS function added that on clicking them passes the molecule and index to python. The communication between JS to Python is different between Jupyter notebook classic and Colab but is majorly absent in Jupyter Lab —making a widget is way way overkill.
image

Format

The Monster.show_comparison() gets the “origin” from the positioned_mol. The origins being a list (index=atom index) of lists of strings with inspiration mol name dot inspiration mol atom index, Which is different from the horrible mapping of the followup to combined mols (as fragments) from Unmerger.

# followup atom to inspiration atom
mapping = [[], ['hit1.1']]

I think its pretty reasonable, but for a partial mapping, a dictionary may be better:

# followup atom to inspiration atom
mapping = {1: 'hit1.1'}

Is it still too complicated? @rsanchezgarc what do you reckon?

Multiple possibilities of merged hits.

Dear sir,

It's a good work! I have some additional results about whether can we get the multiple results of merged ligands. For example, I run the following example code:

    victor = Victor(hits=[hit_1, hit_2], pdb_filename='./example/1d7j_A_rec.pdb')
    victor.combine()

The result mol is always the same. I think it would be great if fragmenstein can generate multiple choices! 

Best,
Odin

AtomicRenamer superseded by Fragmenstein

Hi,

I've just started using AtomicRenamer and noticed on the repo that it says this module supersedes it. I was wondering if there are any updates to the AtomicRenamer module inside Fragmenstein (i.e. if it's worth using this module just for the purpose of the renamer), and if so, how to access the renamer functionality?

The reason I ask is because when trying to use AtomicRenamer, while it works on a significant number of PDBs and their ligands, it fails on 6wtn and its cognate ligand RXT, raising the assertion error:
"the number of atoms in mol has to be the same as atomlabels. Hydrogens? dehydrogenate!"
The mol I'm providing as input is as follows:
'c1c[nH]c2c1c(ncn2)c3cnn(c3)C(CC#N)C4CCCC4'

Any suggestions for how to fix would be greatly appreciated!
Thanks,
Noah

Problems with warheads

I am trying to apply Victor.combine over the hits 'x0104', 'x1458' and I am getting the following error:

ERROR:Fragmenstein:x0104-x1458 — ValueError: x0104-x1458 - Unsure what the warhead is CC(=O)NCCC1=C2C=C(F)C=CC2N=C1.
The smiles of the hits are:
['CC(=O)NCCc1c[nH]c2ccc(F)cc12', '*CC(=O)N1CCC(C(=O)N(C)C(C)C)CC1']

I guess that what it happens is that it rules out the second fragment, which is the covalent one. Is there any way to force it to link them no matter how bad the linker is ?

NaN output for energy/RMSD using Wictor

Versions:

Python 3.9.17 (main, Jul  5 2023, 20:41:20) 
[GCC 11.2.0]
on Linux x86_64
with Fragmenstein 0.14.5 and PyRosetta  2023.40+release.96fa3c5

When performing placements with Wictor:

def place_products(self) -> pd.DataFrame:
    """
    This function places products using Fragmenstein.
    """
    # set up Wictor
    self.setup_Fragmenstein()
    # Get pdbblock from template_path
    with open(self.template_path) as fh:
        pdbblock: str = fh.read()
    lab = Laboratory(pdbblock=pdbblock, covalent_resi=None, run_plip=False)
    placements: pd.DataFrame = lab.place(place_input_validator(self.input_df), n_cores=self.n_cores,
                                         timeout=self.timeout)
    return placements

def setup_Fragmenstein(self):
    """
    This function sets up Fragmenstein to run.
    """
    # Using Wictor to place just by RDKit minimisation
    # make output directory
    self.output_path: str = os.path.join(self.output_dir, 'output')
    os.makedirs(self.output_path, exist_ok=True)
    Wictor.work_path = self.output_path
    os.chdir(self.output_dir) # this does the trick
    Wictor.monster_throw_on_discard = True  # stop this merger if a fragment cannot be used.
    Wictor.monster_joining_cutoff = 5  # Å
    Wictor.quick_reanimation = False  # for the impatient
    Wictor.error_to_catch = Exception  # stop the whole laboratory otherwise
    Wictor.enable_stdout(logging.CRITICAL)
    #Wictor.enable_logfile(os.path.join(self.output_path, f'fragmenstein.log'), logging.ERROR)
    Laboratory.Victor = Wictor

I get this minimised.json output for all placements:
{"Energy": {"bound": {"total_score": NaN, "unit": "kcal/mol"}, "unbound": {"total_score": NaN, "unit": "kcal/mol"}, "ideal": {"total_score": NaN, "unit": "kcal/mol"}, "insitu": {"total_score": NaN, "unit": "kcal/mol"}, "apo": {"total_score": NaN, "unit": "kcal/mol"}}, "mRMSD": 0.0, "RMSDs": [0.0]}

Thanks for looking into it!

Cleanup parameters

Currently, the settings for Monster and Victor are only partially set as parameters during instantiation, while several are class or instance attributes. Furthermore, the init method has more than the 5 max parameters recommended by PEP.

I have not refactored yet because it is low priority. One wee detail is that I need to think how best to do this in regards to documentation. I most like will do
something like:

from typing_extensions import Unpack, TypedDict  # Unpack is a 3.10 feature and TypedDict is 3.7 but colab is 3.6

class FooOptions(TypedDict):
    """
    done properly with :param a: etc.
    """
    a: int
    b: str

class BarOptions(FooOptions):
    """
    done properly with :param c: etc.
    """
    c: float
    d: bool
    e: Dict[str, int]

class Bar:
    """
    bla bla.
    """
    def __init__(self, data: list, **options: Unpack[BarOptions]):
        pass

BarOptions.__doc__ += 'Options from FooOptions: ' + FooOptions.__doc__
Bar.__doc__ += 'Options: ' + BarOptions.__doc__

Using the .. autoclass auto-api feature of sphinx in a docstring (if it even works) would require a lot of tweaks as I'd want the inherited members of FooOptions shown, but the TypedDict members. Whereas filling out the RST params entries is not a massive deal.

Hopefully by raising an issue I will get round to fixing it!

pip install problems

It seems that pip install is not working properly as when trying execution, it complains some modules not found.
If you do ls lib/python/site-packaged/fragmenstein , some folders/files are missing.

Choice of Victor in Laboratory not quite working

Fragmenstein version = 0.13.33

Just tried to run:

fragmenstein laboratory place --input [.sdf] --template [.pdb]--in-table [csv] --cores --verbose

and recieved this error:

Traceback (most recent call last):
  File "/data/xchem-fragalysis/kfieseler/conda/envs/fragmenstein/bin/fragmenstein", line 8, in <module>
    sys.exit(main())
  File "/data/xchem-fragalysis/kfieseler/conda/envs/fragmenstein/lib/python3.9/site-packages/fragmenstein/cli/__init__.py", line 29, in main
    FragmensteinParser()()
  File "/data/xchem-fragalysis/kfieseler/conda/envs/fragmenstein/lib/python3.9/site-packages/fragmenstein/cli/base.py", line 42, in __call__
    args.func(args)
  File "/data/xchem-fragalysis/kfieseler/conda/envs/fragmenstein/lib/python3.9/site-packages/fragmenstein/cli/laboratory.py", line 91, in lab_place
    lab, hits = self.gather(args)
  File "/data/xchem-fragalysis/kfieseler/conda/envs/fragmenstein/lib/python3.9/site-packages/fragmenstein/cli/laboratory.py", line 44, in gather
    choice = args.get('victor', 'Victor').lower()
AttributeError: 'Namespace' object has no attribute 'get'

rdkit / pandas version broken

Pandas 2.2.0 made changes that break in rdkit:

rdkit/rdkit#7165

This happens with versions currently specified in environment.yml:

fragmenstein/igor/pyrosetta_import.py:38: RuntimeWarning: PyRosetta is not installed. A mock object is loaded. Any Igor calls will fail.
warnings.warn('PyRosetta is not installed. A mock object is loaded. Any Igor calls will fail.',
Failed to find the pandas get_adjustment() function to patch
Failed to patch pandas - PandasTools will have limited functionality
usage: fragmenstein [-h] {monster,laboratory,pipeline,victor,utils} ...

AttributeError: 'Monster' object has no attribute '_absorb'

It seems that an undefined method is tried to be called. I founded when I tried to combine the following molecules (of course, they have too many *, it was a bug, but still, it was able to find a corner case)

['c1ccc(F)c()c1', '*C(C)=O', 'N', 'N', '*c1ncc[nH]1', '*c1cccs1', 'N', '*C(=O)O', '*C(C)=O', 'c1cccc()c1', '*C(N)=O']

Traceback (most recent call last):
File "prueba.py", line 67, in
generated_molecule = compute(hits, split_bits=bool(i), randomize=i)
File "prueba.py", line 61, in compute
covalent_resn='CYS')
File "/home/ruben/oxford/tools/Fragmenstein/fragmenstein/victor/_victor_automerge_mixin.py", line 108, in combine
reject=self._reject)
File "/home/ruben/oxford/tools/Fragmenstein/fragmenstein/victor/init.py", line 158, in _safely_do
execute()
File "/home/ruben/oxford/tools/Fragmenstein/fragmenstein/victor/_victor_automerge_mixin.py", line 124, in _combine_main
self.monster.scaffold = self.monster.merge_hits(col_hits) #Here is where join_neighboring_mols can be reached
File "/home/ruben/oxford/tools/Fragmenstein/fragmenstein/monster/init.py", line 421, in merge_hits
scaffold = self.merge_pair(scaffold, fragmentanda)
File "/home/ruben/oxford/tools/Fragmenstein/fragmenstein/monster/init.py", line 215, in merge_pair
other_attachment_details=other_attachment_details)
File "/home/ruben/oxford/tools/Fragmenstein/fragmenstein/monster/init.py", line 700, in _merge_part
self._prevent_two_bonds_on_dummy(combo)
File "/home/ruben/oxford/tools/Fragmenstein/fragmenstein/monster/init.py", line 453, in _prevent_two_bonds_on_dummy
self._absorb(mol, atom.GetIdx(), second.GetIdx())
AttributeError: 'Monster' object has no attribute '_absorb'

Wictor (Victor w/ no PyRosetta) does not dump files

PSA: Up to version 0.10.5 Wictor (Victor w/ no PyRosetta) does not dump files.

from fragmenstein.faux_victors import Wictor
from fragmenstein.demo import Mac1
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole

wicky = Wictor(Mac1.get_n_filtered_mols(2), pdb_block=Mac1.get_template())
wicky.combine()
# 2D
Chem.MolFromSmiles(wicky.smiles)

image
Worked. But

from pathlib import Path

assert (Path(wicky.work_path) / wicky.long_name).exists(), 'no folder'
assert list((Path(wicky.work_path) / wicky.long_name).glob('*')), 'empty folder'

Fails: nothing is written.
NB. .minimized_mol and .minimized_pdbblock do have the required stuff!

Mapping error in latest versions

@stephwills has pointed out to me an issue she is having. I though it caused by the order of Monster.matching_modes definitions, but it is something more severe causing the mapping to glitch.

Test case

from fragmenstein import Monster
from rdkit import Chem, RDLogger
from rdkit.Chem import AllChem, Draw
from rdkit.Chem.Draw import IPythonConsole

IPythonConsole.drawOptions.addAtomIndices = True
RDLogger.DisableLog('rdApp.warning')  # shut on the hydrogens already

# hydrogenated indole w/ a methyl on the r6
hit = Chem.MolFromSmiles('N1CCC2C1CC(C)CC2')
AllChem.EmbedMolecule(hit)
hit.SetProp('_Name', 'foo')
display(hit)

monstah = Monster([hit])
# hydrogenated benzothiazole w/ no methyl
mol = Chem.MolFromSmiles('N1CSC2C1CCCC2')
display(mol)

monstah.place(mol)  # .place_smiles('

# ------------------------
import py3Dmol

viewer = py3Dmol.view(width=350, height=350)
for hit in monstah.hits:
    IPythonConsole.addMolToView(hit, viewer)
IPythonConsole.addMolToView(monstah.positioned_mol, viewer)
for idx,clr in zip(range(len(monstah.hits)+1,('magentaCarbon','cyanCarbon','whiteCarbon')):
    viewer.setStyle({'model':idx,}, {'stick':{'colorscheme':clr, 'opacity': 0.7}})
viewer.zoomTo()
viewer.show()

image

image

Innocent

Monster.matching_modes is a series of MCS matching settings which are increasingly strict.
This time it is not at fault.

strict_matching_mode = dict(atomCompare=rdFMCS.AtomCompare.CompareElements,
                                 bondCompare=rdFMCS.BondCompare.CompareOrder,
                                 ringMatchesRingOnly=True,
                                 ringCompare=rdFMCS.RingCompare.StrictRingFusion,
                                 matchChiralTag=True)

matching_modes = [
        dict(atomCompare=rdFMCS.AtomCompare.CompareAny,
             bondCompare=rdFMCS.BondCompare.CompareAny,
             ringCompare=rdFMCS.RingCompare.PermissiveRingFusion,
             ringMatchesRingOnly=False), 
        ...
        dict(atomCompare=rdFMCS.AtomCompare.CompareElements,
             bondCompare=rdFMCS.BondCompare.CompareOrder,
             ringCompare=rdFMCS.RingCompare.PermissiveRingFusion,
             ringMatchesRingOnly=True)]

Issue

monstah.origin_from_mol()
[['foo.5'], ['foo.7'], ['foo.6'], ['foo.8'], ['foo.0'], ['foo.4'], ['foo.2'], ['foo.3'], ['foo.9']]

That is more wrong that <insert american food here>
It should be [['foo.0'], ['foo.1'], ['foo.2'], ['foo.3'], ['foo.4'], ['foo.5'], ['foo.6'], ['foo.8'], ['foo.9']]

mappings, mode = monstah.get_mcs_mappings(hit, mol)
mappings
[{1: 1}, {5: 5}, {8: 7}, {9: 8}, {6: 6}, {4: 4}, {3: 3}, {5: 3}, {1: 1}, {3: 5}, {8: 7}, {9: 6}, {4: 4}, {6: 8}]

This should be the a list of a single option of mapping (dict w/ many items not one).

Minimized pdbs contain non pdb format lines at the end

At the end of the file, after the connect records, I see that we have the energetics computed by rosetta. However, this part can cause problems, for instance, when using oddt to load the pdb. Wouldn't it be useful to remove or to add the #REMARK symbol?

CONECT 3200 3197 3198                                                           
# All scores below are weighted scores, not raw scores.
#BEGIN_POSE_ENERGIES_TABLE 
label fa_atr fa_rep fa_sol fa_intra_rep fa_intra_sol_xover4 lk_ball_wtd fa_elec pro_close hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_fa13 omega fa_dun p_aa_pp yhh_planarity r
ef rama_prepro total
weights 1 0.55 1 0.005 1 1 1 1.25 1 1 1 1 1.25 0.4 0.7 0.6 0.625 1 0.45 NA
pose -1176.44 156.416 774.501 2.33027 46.5727 -31.3098 -278.709 2.9261 -88.2726 -9.30775 -41.1764 -17.4642 0 7.466 473.713 -31.8512 0.00277 32.4346 -19.753 -197.921
MET:NtermProteinFull_1 -3.82669 0.31211 2.01105 0.01699 0.39491 -0.18555 0.44667 0 0 0 -1.11942 0 0 0.07214 3.08174 0 0 1.65735 0 2.86132
ARG_2 -1.64531 0.24283 1.26034 0.01668 0.58575 -0.0887 0.90376 0.25294 0 0 0 0 0 0.00075 5.30527 -0.06761 0 -0.09474 2.76409 9.43604
PRO_3 -4.90234 1.48984 2.83508 0.00325 0.09837 0.01641 -0.70188 0.58005 0 0 0 0 0 0.10497 0.18928 -0.99188 0 -1.64321 2.65007 -0.27198
PRO_4 -3.54852 1.50816 1.88982 0.00237 0.07537 -0.11618 -0.0056 0.24017 0 0 0 0 0 -0.07361 2.71735 -0.61055 0 -1.64321 -0.29276 0.14281
GLY_5 -2.6971 0.37045 3.01958 0.0001 0 -0.27099 -0.97261 0 0 0 0 0 0 0.278 0 -1.13567 0 0.79816 -0.54917 -1.15924
TYR_6 -8.80802 1.34854 3.52327 0.02118 0.09237 -0.57155 -0.5129 0 0 0 0 0 0 0.07983 2.57859 0.08207 0 0.58223 -0.61601 -2.2004
GLU_7 -6.14907 0.65412 4.58045 0.00389 0.19168 0.1942 -2.00072 0 0 0 0 -1.07148 0 0.0235 3.36692 0.04556 0 -2.72453 -0.22442 -3.10991
SER_192 -2.00534 0.10835 1.44188 0.00276 0.05504 -0.14393 -0.2593 0 0 0 0 0 0 -0.06956 1.14332 -0.11066 0 -0.28969 -0.03476 -0.1619
PHE:CtermProteinFull_193 -7.45183 10.5097 2.98893 0.02111 0.33799 -0.28575 -0.34418 0 0 0 0 0 0 0 2.44633 0 0 1.21829 -0.03794 9.40262
LIG_194 -12.7114 1.37937 0.56323 0.04739 0.13413 0.25608 0 0 0 0 0 0 0 0 0 0 0 0 0 -10.3312
VRT_195 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#END_POSE_ENERGIES_TABLE 

No PyRosetta mode (Wictor) breaks with PLIP (no Ligand)

In the run

import pandas as pd
from fragmenstein import Laboratory, Wictor

Laboratory.Victor = Wictor
placed: pd.DataFrame = Laboratory(dehydrated_block, run_plip=True).place(matches)

A KeyError gets raised ('LIG:C:1') as in the residue 1:C does not exist.

Expirational inspirations

An issue is Fragmenstein with the placement of a human submitted candidate is when it is a strong departure from the inspirations.

In the tests are several cases where the human inspiration departed too much from the inspirations.

In the PostEra Covid Moonshot dataset, the presence of many "red herrings" (technically inspirations for other molecules in the set) is particularly problematic as this prevent the code from being run in the mode that does not discard inspirations.
The document TRY-UNI-714a760b-6.md shows one such example.

One options potentially to test is whether BRICS decomposition of the inspiration hits beforehand improves matters.
However, the program tries all mapping combinations, which means that having too many inspirations slow down the code significantly.

Continuous warning about Hs

Hi, every time I am using Igor I got this warning
[14:56:15] Molecule does not have explicit Hs. Consider calling AddHs()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.