volkamerlab / teachopencadd Goto Github PK

View Code? Open in Web Editor NEW

713.0 713.0 197.0 919.47 MB

TeachOpenCADD: a teaching platform for computer-aided drug design (CADD) using open source packages and data

Home Page: https://projects.volkamerlab.org/teachopencadd

License: Creative Commons Attribution 4.0 International

Jupyter Notebook 99.66% Python 0.34%

teachopencadd's People

Contributors

Stargazers

Watchers

Forkers

greglandrum jcheminform chemphy jinhou jhuichen yingyang90 joseteofilo dansteiert drlabratory jaimergp zerodesigner abisheklaxmanan ruskinkot1 zeromtmu morgeral richardjgowers pawansit dominiquesydow z5476t4508 ichxw speleo3 michelewichmann yonghui-cc bbyun28 masterwhook plin1112 ronghuizhou simsal35 icoa-sbc caddtips farabotti jixing475 kl3574 fanwangm kirilldemochkin jie-yin ruska612 gadjaoute febright rrchaudhari rgolde96 bmbunga tsenapathi hmf0103 rhjohnstone phenylazide kimheeye hmyang9223 mfrand-7 pablosoto1995 marvillar da-vinci1 hnlab hovo1990 shalkya gogo800 sebacastrocba yamazaki-youichi renjianyu1 harlvl mycode-bit sanjeevan-informatik kntkb xiao1peter tarapongsri shuail minghao2016 unixjunkie carlbullish tarsri seihwan2021 yingkaizhang xy21hb rinikerlab erikzhang-9762 zhangdachuanfoodies ajmuruga prasannavd whitneylampkin shunsunsun vincenzo-palmacci nina23bom jesperswillem dimacio cuidachao roquepalacios baba-hashimoto catenate15 russodanielp phantomhustle kkkaiqiang test-v1 austekan yarikson jibsn freeenergylab aerinko082 navjeet0211 liudonglianghi zuttergutao

teachopencadd's Issues

PLIP installation error for T016 Protein Ligand Interactions

I have openbabel installed but still i get this error message on my cmd prompt:

Collecting plip
Using cached plip-2.2.2-py3-none-any.whl (93 kB)
Requirement already satisfied: numpy in c:\users\que\anaconda3\envs\introductiontopython\lib\site-packages (from plip) (1.21.0)
Collecting lxml
Using cached lxml-4.6.3-cp38-cp38-win_amd64.whl (3.5 MB)
Collecting openbabel
Using cached openbabel-3.1.1.1.tar.gz (82 kB)
Building wheels for collected packages: openbabel
Building wheel for openbabel (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: 'C:\Users\QUE\anaconda3\envs\introductiontopython\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\QUE\AppData\Local\Temp\pip-install-eovmndz7\openbabel_265f2b5a89f44697b473157c7eb430db\setup.py'"'"'; file='"'"'C:\Users\QUE\AppData\Local\Temp\pip-install-eovmndz7\openbabel_265f2b5a89f44697b473157c7eb430db\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\QUE\AppData\Local\Temp\pip-wheel-qwam7u83'
cwd: C:\Users\QUE\AppData\Local\Temp\pip-install-eovmndz7\openbabel_265f2b5a89f44697b473157c7eb430db
Complete output (15 lines):
running bdist_wheel
running build
running build_ext
Warning: invalid version number '3.1.1.1'.
Guessing Open Babel location:

include_dirs: ['C:\Users\QUE\anaconda3\envs\introductiontopython\include', 'C:\Users\QUE\anaconda3\envs\introductiontopython\include', '/usr/local/include/openbabel3']
library_dirs: ['C:\Users\QUE\anaconda3\envs\introductiontopython\libs', 'C:\Users\QUE\anaconda3\envs\introductiontopython\PCbuild\amd64', '/usr/local/lib']
building 'openbabel._openbabel' extension
swigging openbabel\openbabel-python.i to openbabel\openbabel-python_wrap.cpp
swig.exe -python -c++ -small -O -templatereduce -naturalvar -IC:\Users\QUE\anaconda3\envs\introductiontopython\include -IC:\Users\QUE\anaconda3\envs\introductiontopython\include -I/usr/local/include/openbabel3 -o openbabel\openbabel-python_wrap.cpp openbabel\openbabel-python.i

Error: SWIG failed. Is Open Babel installed?
You may need to manually specify the location of Open Babel include and library directories. For example:
python setup.py build_ext -I/usr/local/include/openbabel3 -L/usr/local/lib
python setup.py install

I tried to gitclone plip and then open on jupyter notebook. I ended up with this error :(

ModuleNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_5344/3038562561.py in
9 import matplotlib.pyplot as plt
10 from matplotlib import colors
---> 11 from plip.structure.preparation import PDBComplex
12 from plip.exchange.report import BindingSiteReport
13

ModuleNotFoundError: No module named 'plip.structure'

I have tried searching for a fix online but i couldn't really get much.

T019: No openmmforcefields under Windows

T019 does not run under Windows.

Our CI fails with: ModuleNotFoundError: No module named 'openff'

Why?

teachopencadd/devtools/test_env.yml

Lines 30 to 32 in d1e92f4

 - openmm 

 # depends on openff-toolkit->ambertools -> not available on Windows yet! 

 # - openmmforcefields

Do we notify users?

Yes, talktorial contains disclaimer, linking back to this issue:

Also, note that this talktorial will not run on Windows for the time being (check progress in this issue).

What do we do regarding our CI?

Remove the T019 from the Windows CI.

Add branding header to each notebook

We might benefit from adding a standardized, custom header to each notebook to reflect the origin of the project and how to contribute back, help us develop this further and so on. I don't have any specific idea on the looks, but the Markdown cells will allow us to do anything in HTML so I think we have plenty of creative freedom there.

Some ideas:

Star the project widget
Add the logo
Links to repo / documentation
Citation instructions
Links to other interesting software projects (ours, mainly)

Open tasks for version 2021.1

Open tasks for the v2.0.0 release (after the merge of #74):

Is there a way to get a common cut-off value?

"T007 · Ligand-based screening: machine learning" was very helpful for us. Thanks for sharing this article.

I just have one question.

We are using IC50 data targeting other cells of ChEMBL, but our data does not have a few comments to divide "active" and "inactive".

Therefore, we also have to divide based on the pIC50 value, and I would like to know how to get the cut-off of 6.3 in the article you shared above.

I will wait for your reply.

Thank you again.

Kim Hyeon Ki

RDKit and pypdb are pinned to old releases - update relevant notebooks

Last nightly contained a change in the Python 3.9 branch: T002 is reporting a different subset of compounds (just one entry, though).

https://github.com/volkamerlab/teachopencadd/runs/1461859605?check_suite_focus=true#step:8:47

Might be temporary (if the database was changing, it should affect all branches), so we'll wait until tomorrow to see if that's the case.

This nightly also reported a bug in RDKit (py39), in the cairo code to export images. Might be related to the same error as above (Image has changed API?). We'll wait too.

T5: clusters are not sorted by size by default

Cell 17:

print ('Ten molecules from second largest cluster:')
# Draw molecules
Draw.MolsToGridImage([mols[i][0] for i in clusters[1][:10]], 
                     legends=[mols[i][1] for i in clusters[1][:10]], 
                     molsPerRow=5)

However, the clusters returned by Butina.ClusterData(distance_matr,len(fps),cutoff,isDistData=True) are not sorted by default, i.e. we cannot guarantee that clusters[1] is indeed the second largest cluster.

In the talktorial, it does happen that (at least) the first two clusters are correctly ordered, but when I was using a different original target, the second cluster only had one element, while others had more. Anyway, this is easily checked by just listing the lens of the clusters. Moreover, the docs do not claim that they are ordered.

T008: PDB API for chemicals changed

The PDB API changed again and may not be updated in biotite.

Talktorial T008 throws the following error (cell 9):

RequestError: Error 400: Invalid request to the [ text ] service: search is not enabled on [ chem_comp.formula_weight ] attribute

As far as I understand they split the Search Service from text into text and text_chem:
https://search.rcsb.org/#search-services

They split the search attribute web pages (needs updating in the talktorial as well)

The old API (I think):

{
  "query": {
    "type": "terminal",
    "service": "text",
        "parameters": {
          "attribute": "chem_comp.formula_weight",
          "operator": "greater",
          "value": 100
        }
  },
  "return_type": "entry"
}

The new API

{
  "query": {
    "type": "terminal",
    "service": "text_chem",
        "parameters": {
          "attribute": "chem_comp.formula_weight",
          "operator": "greater",
          "value": 100
        }
  },
  "return_type": "entry"
}

T014: ccp4 visualization with nglview broken?

Summary

Note: This bug is not on our end.
Volumes in nglview are currently not visualized in Jupyter Lab (you can look at them with Jupyter Notebook though).

Where?

https://nbviewer.jupyter.org/github/volkamerlab/teachopencadd/blob/t011-base/teachopencadd/talktorials/T014_binding_site_detection/talktorial.ipynb
Cell 18

Talktorial template

How to start

Replace example names as needed:

git clone https://github.com/volkamerlab/teachopencadd.git
cd teachopencadd/
# or... git pull on master
git checkout -b ab-099-title  # ab = your initials; 099 = your talktorial index; title = short talktorial title 
git commit --allow-empty -m "Start branch"
git push --set-upstream origin ab-099-title

# Now go to the suggested URL and create the PR using the suggested template
# If the template is not there, copy it from this issue: https://github.com/volkamerlab/TeachOpenCADD/issues/41

# Back to the CLI to set up the environment
conda env create -f devtools/conda-envs/test_env.yml
conda activate teachopencadd
jupyter labextension install @ijmbarr/jupyterlab_spellchecker
jupyter lab

Now you can start working on your talktorial!

Render the website locally

Once your talktorial ready, check how it renders in HTML.

cd docs
make html
cp -r talktorials/images/ _build/html/talktorials/images
cd _build/html
# open index.html on your browser:
xdg-open index.html
# or under windows: explorer.exe index.html
# or under MacOS: open index.html

If your notebook does not appear, you need to add the nblink forwarder in docs/notebooks. Copy paste an existing ones and update paths accordingly!

PR template

Create a PR using this template. Ping us as reviewer once you'd like feedback on the talktorial.

https://github.com/volkamerlab/teachopencadd/blob/master/.github/PULL_REQUEST_TEMPLATE/talktorial_review.md

T001-7: Update frozen dataset to latest ChEMBL version

Update ChEMBL query in T001 (and all downstream talktorials T002-T008).

This is not time sensitive but we should remember to do this at some point :)

Jupyter Lab 3

This is going to be released soon, which enables dynamic extensions. This will allow us to provide a package that works right away without having to worry about jupyter labextension install bla bla bla. Leaving this here so I don't forget to update the environment and recipe :)

T1: Fetch data by ChEMBL version?

Whenever a new ChEMBL version is released, the compound/bioactivity dataset fetched in T1 will change and affect all downstream notebooks.

When doing our packaging/refactoring of all notebooks, it would be great to find out how to fetch data by ChEMBL version (e.g. frozen to the ChEMBL version/dataset shown in our TeachOpenCADD publication).

Get help:

chembl/chembl_webresource_client#55
- Contacted
- Answer: TBA
https://www.ebi.ac.uk/support/
- Contacted (20200507)
- Answer: "At the moment, it’s not possible to extract data from web services (or the
  interface) for a previous ChEMBL release but we hope to work on this in the
  future."

Anaconda versions?

The main README.md refers to testing with anaconda2 (python 2.x?) but the environment.yml requires python > 3.6 if I'm reading it correctly. Maybe the README is out of date?

In this day and age I'd highly recommend making sure everything uses python 3.

T008: Align Complexes doesn't work

I am also having issues with this line could this be checked into? i tried using jupyter notebooks and google colab but still couldn't get around it

results = align(complexes, method=METHODS["mda"])

this is the error i keep getting when i run the code

AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_2872/2813988720.py in
----> 1 results = align(complexes, method=METHODS["mda"])

~\anaconda3\envs\introductiontopython\lib\site-packages\opencadd\structure\superposition\api.py in align(structures, method, **kwargs)
40 results = []
41 for mobile in mobiles:
---> 42 result = aligner.calculate([reference, mobile])
43 results.append(result)
44

~\anaconda3\envs\introductiontopython\lib\site-packages\opencadd\structure\superposition\engines\base.py in calculate(self, structures, *args, **kwargs)
29 """
30 assert len(structures) == 2
---> 31 return self._calculate(structures, *args, **kwargs)
32
33 def _calculate(self, structures, *args, **kwargs):

~\anaconda3\envs\introductiontopython\lib\site-packages\opencadd\structure\superposition\engines\mda.py in _calculate(self, structures, *args, **kwargs)
117
118 # Get matching atoms
--> 119 selection, alignment = self.matching_selection(*structures)
120 ref_atoms = ref_universe.select_atoms(selection["reference"])
121 mobile_atoms = mob_universe.select_atoms(selection["mobile"])

~\anaconda3\envs\introductiontopython\lib\site-packages\opencadd\structure\superposition\engines\mda.py in matching_selection(self, reference, mobile)
176 fasta["ref"], fasta["mob"], *_empty = alignment.get_gapped_sequences()
177 fasta.write("temp.fasta")
--> 178 selection = fasta2select(
179 "temp.fasta",
180 ref_resids=ref_resids,

~\anaconda3\envs\introductiontopython\lib\site-packages\opencadd\structure\superposition\sequences.py in fasta2select(fastafilename, ref_resids, ref_segids, target_resids, target_segids, backbone_selection)
115
116 """
--> 117 protein_gapped = Bio.Alphabet.Gapped(Bio.Alphabet.IUPAC.protein)
118 with open(fastafilename) as fasta:
119 alignment = Bio.AlignIO.read(fasta, "fasta", alphabet=protein_gapped)

AttributeError: module 'Bio' has no attribute 'Alphabet'

Make repo less heavy?

Repo is quite big already (0.5 GB). When someone can allocate time, look into options to make it less heavy.

Issue on T009 · Ligand-based pharmacophores

The command feature_factory = AllChem.BuildFeatureFactory(str(Path(RDConfig.RDDataDir) / "BaseFeatures.fdef")) is not working. It gives the error message:

OSError                                   Traceback (most recent call last)
<ipython-input-17-9f1fb722f467> in <module>()
----> 1 feature_factory = AllChem.BuildFeatureFactory(str(Path(RDConfig.RDDataDir) / "BaseFeatures.fdef"))

OSError: File: /opt/anaconda1anaconda2anaconda3/share/RDKit/Data/BaseFeatures.fdef could not be opened.

It solved with the command feature_factory = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')
but the following command also gives an error message:

AttributeError                            Traceback (most recent call last)
<ipython-input-21-3c1dd36efe52> in <module>()
----> 1 features = feature_factory.GetFeaturesForMol(str(example_molecule))
      2 print(f"Number of features found: {len(features)}")

AttributeError: 'str' object has no attribute 'GetFeaturesForMol'

The command list(feature_factory.GetFeatureDefs().keys()) also gives the error message:

AttributeError                            Traceback (most recent call last)
<ipython-input-18-d24aee5309ac> in <module>()
----> 1 list(feature_factory.GetFeatureDefs().keys())

AttributeError: 'str' object has no attribute 'GetFeatureDefs'

Please, help me solving these issues. Thanks.

Update `pypdb`-related notebooks T008 and T010

pypdb-related notebooks must be updated since the old API does not work any more due to changes in the RCSB API.

From pypdb's GH page:

As of November 2020, pypdb is undergoing significantly refactoring in order to accomodate changes to the RCSB PDB API and extend functionality. We regret any breaking changes that occur along the way. The previous version of pypdb is available here; however, it will no longer function to to the RCSB API being

Check out the new API:
https://github.com/williamgilpin/pypdb/blob/master/demos/demos.ipynb

Example issue in T010:

Input:

# Set up query dictionary
search_dict = pypdb.make_query("STI")
search_dict

Output:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-521e3a31d168> in <module>
      1 # Set up query dictionary
----> 2 search_dict = pypdb.make_query("STI")
      3 search_dict
AttributeError: module 'pypdb' has no attribute 'make_query'

T015_protein_ligand_docking Using OPAL Attribute error: Nonetype object has no attribute atttrs

When i run the code below:

%time result = step_03_opal(PROTEIN, smiles[:1], COMPLEX)

i get the following error:
AttributeError Traceback (most recent call last)
in

~\AppData\Local\Temp/ipykernel_6344/2733412323.py in step_03_opal(protein, smiles, pdbcomplex)
14 """
15 prepared_protein = opal_prepare_protein(protein)
---> 16 center, radius = dogsite_scorer_guess_binding_site(pdbcomplex)
17 size = [radius] * 3 # Vina supports non-cubic boxes, but we will use a cube for simplicity
18 for i, smile in enumerate(smiles):

~\AppData\Local\Temp/ipykernel_6344/3358257068.py in dogsite_scorer_guess_binding_site(protein)
131 job_location = dogsite_scorer_submit_with_pdbid(protein)
132 elif protein.endswith(".pdb"):
--> 133 job_location = dogsite_scorer_submit_with_custom_pdb(protein)
134 else:
135 raise ValueError("protein must be a PDB ID or a path to a .pdb file!")

~\AppData\Local\Temp/ipykernel_6344/3358257068.py in dogsite_scorer_submit_with_custom_pdb(pdbfile)
78 # 2. Get internal location id
79 html = BeautifulSoup(r.text)
---> 80 pdb_id = html.find("input", {"name": "dogsite[pdbCode]"}).attrs["value"]
81
82 # 3. Get the internal job ID

AttributeError: 'NoneType' object has no attribute 'attrs'

I hope you don't mind my many issues. Im really trying to understand and see how i can make maximum use of these talktorials in my current research work.
Thansk and cheers for the good work! @dominiquesydow and @jaimergp and the whole Volkamerlab team

Notebook Ligand_based_pharmacophores

The notebook T009_Ligand_based_pharmacophores is not working. The cell [5] has a list molecules but it shuld be the list molecule.
However, if I correct this issue, the mol_file could not be read because the type is None.

Do you know if there is another issue in this notebook?

ModuleNotFoundError

Binder talktorials cannot find and import the module

T016: Nitrogen is recognized differently on Windows vs. Unix

CI error message

[gw0] win32 -- Python 3.6.13 C:\Miniconda\envs\teachopencadd\python.exe
Notebook cell execution failed
Cell 15: Cell outputs differ

Input:
create_df_from_binding_site(interactions_by_site[selected_site], interaction_type="hbond")
# NBVAL_CHECK_OUTPUT

Traceback:
 mismatch 'text/html'

 assert reference_output == test_output failed:

  '<div>\n<styl...able>\n</div>' == '<div>\n<styl...able>\n</div>'
  Skipping 1084 identical leading characters in diff, use -v to show
  -      <td>N2</td>
  ?           ^
  +      <td>Nar</td>
  ?           ^^
          <td>(13.371, 34.064, 15.005)</td>
          <td>(10.667, 33.654, 16.145)</td>
        </tr>
      </tbody>
    </table>
    </div>

Why?

T016: it fails due to a diff issue. A nitrogen is recognized differently on Windows than Unix (?). Might have to do with different versions installed.

From #74 (comment)

OpenBabel is used to identify hydrogen bond donor and acceptor atoms. Halogen atoms are excluded from this group and treated separately (see below).

From https://plip-tool.biotec.tu-dresden.de/plip-web/plip/help

Atom types in OpenBabel:

EXTTYP  [n]         Nar
EXTTYP  [$(N=*)]        N2

From https://github.com/openbabel/openbabel/blob/master/data/atomtyp.txt

Create conda package for this project

It should be possible to make this conda-installable with a conda recipe. I'll leave this open as a reminder!

T001: Frozen bioactivity dataset is not frozen

In T001, we thought we froze the bioactivity dataset (by checking for activity IDs in ChEMBL 27) but it seems not to work.

Version on master branch:
Number of bioactivities queried for EGFR in this notebook: 7178
Number of bioactivities after ChEMBL 27 intersection: 7178

Running this notebook today:
Number of bioactivities queried for EGFR in this notebook: 8817
Number of bioactivities after ChEMBL 27 intersection: 8031 (I would expect 7178)

@jaimergp, I am sorry to bother you with this.
Do you understand why our intersection with the chembl27_activities.npz.zip does not produce stable results?

If not: Since I do not have the time to debug this (and you probably neither), my suggestion is to remove the chembl27_activities.npz.zip freezing bit --- and instead freeze the final output dataset output_df from this notebook to ensure stable outputs in all downstream talktorials (T002-T007).

can't this be hosted on a website?

Hello,

It would be nice if people can have a quick look without having to install anything.

Regards,
F.

Talktorial 3 Unwanted Substructure Brenk Error

Hi @dominiquesydow
In [10] comes up with the following error when I run it. It worked fine about 2 months ago when i actually just got introduced to TeachOpenCADD

ValueError                                Traceback (most recent call last)
<ipython-input-10-3809d0362bf6> in <module>
      1 Chem.Draw.MolsToGridImage(
      2     list(substructures.head(3).rdkit_molecule),
----> 3     legends=list(substructures.head(3).name),
      4 )

/srv/conda/envs/notebook/lib/python3.7/site-packages/rdkit/Chem/Draw/IPythonConsole.py in ShowMols(mols, maxMols, **kwargs)
    197   if not "drawOptions" in kwargs:
    198     kwargs["drawOptions"] = drawOptions
--> 199   res = fn(mols, **kwargs)
    200   if kwargs['useSVG']:
    201     return SVG(res)

/srv/conda/envs/notebook/lib/python3.7/site-packages/rdkit/Chem/Draw/__init__.py in MolsToGridImage(mols, molsPerRow, subImgSize, legends, highlightAtomLists, highlightBondLists, useSVG, returnPNG, **kwargs)
    611     return _MolsToGridImage(mols, molsPerRow=molsPerRow, subImgSize=subImgSize, legends=legends,
    612                             highlightAtomLists=highlightAtomLists,
--> 613                             highlightBondLists=highlightBondLists, returnPNG=returnPNG, **kwargs)
    614 
    615 

/srv/conda/envs/notebook/lib/python3.7/site-packages/rdkit/Chem/Draw/__init__.py in _MolsToGridImage(mols, molsPerRow, subImgSize, legends, highlightAtomLists, highlightBondLists, drawOptions, returnPNG, **kwargs)
    553           del kwargs[k]
    554     d2d.DrawMolecules(list(mols), legends=legends or None, highlightAtoms=highlightAtomLists,
--> 555                       highlightBonds=highlightBondLists, **kwargs)
    556     d2d.FinishDrawing()
    557     if not returnPNG:

ValueError: bad query type1

Talktorial 10 - Input Cell #10

Should be,
pdb_ids = list(set(found_pbd_ids + found_pbd_ids2))

and not,
pdb_ids = list(set(found_pbd_ids + found_pbd_ids))

Talktorial 4 - mistake in calculating experimental EF

In the function print_data_ef in cell 46, we need to convert the inputted percentage perc_ranked_dataset into a fraction to successfully compare with the values in enrich_df.

Also, in cell 47, when using the concatenated dataframe enrich_df, the function print_data_ef takes the last line to find the relevant fraction, so will always choose whichever similarity measure was second in the concatenation (in cell 43). Instead, I think we should just compute the EF for each similarity measure separately.

Collection of text edits

T006

Add level-2 bullet points here:

https://projects.volkamerlab.org/teachopencadd/talktorials/T006_compound_maximum_common_substructures.html

Rename topics, e.g. T1 > T01

Rename topics T1-T9 as T01-T09 after refactoring of T1-10 has been finished.

TalkTorial 1 - T1_ChEMBL - Step 24

Dear all,

being a new user I am struggling in some of the steps of this amazing Talktorial.
At the step 24, I do not manage to proceed to transform my "smile" structures in formula within the template.
PandasTools.AddMoleculeColumnToFrame(output_df, smilesCol='smiles')
Any suggestion?
Thanks

Include new talktorials to binder setup

Upon merge of #74 make sure new talktorials run with binder setup; add notes to those that may not run there (e.g. MD simulation).

This issue is prompted by @Carlbullish's question in #124 (comment):

And if you dont mind me asking, running the binder for talktorial 10 and other talktorials afterwards, is not yet implemented for successful execution as compared to talktorial 1 - talktorial 10?

[Resolved, see comments below] Talktorial 010 seems to run fine on Binder; @Carlbullish, did you run into troubles
Talktorials >010 will be included in Binder once we fixed some final technical issues in #74

Issue on tutorial T010_binding_site_comparison

I found difficulty to install the tutorial T010 because It seems the MDAnalysis package is not installed properly or numpy package is with an issue. Please, see the error message below:

ValueError                                Traceback (most recent call last)
<ipython-input-5-359cda61cea2> in <module>()
      1 import opencadd
----> 2 from opencadd.structure.core import Structure
      3 from opencadd.structure.superposition.api import align, METHODS
      4 from opencadd.structure.superposition.engines.mda import MDAnalysisAligner
      5 from teachopencadd.utils import seed_everything

5 frames
/usr/local/lib/python3.7/site-packages/MDAnalysis/lib/util.py in <module>()
    215 from ..exceptions import StreamWarning, DuplicateWarning
    216 try:
--> 217     from ._cutil import unique_int_1d
    218 except ImportError:
    219     raise ImportError("MDAnalysis not installed properly. "

MDAnalysis/lib/_cutil.pyx in init MDAnalysis.lib._cutil()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

T019: Ligand conversion

Adding an RDKit molecule to an OpenMM system might be easier than currently implemented in the T019 talktorial. We should evaluate the strategy in this notebook.

CI testing

Implement GH Actions to run all notebooks regularly with nbval or similar.

Website: Some "previous" and "next" buttons are mixed up

Website: For the new talktorials, the "previous" and "next" buttons are mixed up.
Example https://projects.volkamerlab.org/teachopencadd/talktorials/T011_query_online_api_webservices.html (bottom of website):

This is not time sensitive.

Talktorial index

This index gives an overview on the available TeachOpenCADD talktorials in our upcoming release (with comments on their previous index in TeachOpenCADD v1).

000_template # new
001_query_chembl # T1
002_compound_adme # T2
003_compound_unwanted_substructures # T3
004_compound_similarity # T4
005_compound_clustering # T5
006_compound_maximum_common_substructures # T6
007_compound_activity_machine_learning # T7
008_query_pdb # T8
009_compound_ensemble_pharmacophores # T9
010_binding_site_comparison # T10, fix issue
011_query_online_api_webservices # T11
012_query_pubchem # T11a (second part)
013_query_klifs # T11a (first part); could mention at the end opencadd.databases.klifs/klifs_utils
014_binding_site_detection # T12 (not on TeachOpenCADD yet, but here), we could extract binding site stuff from T11b and merge it with T12 as stand-alone notebook prior to docking
015_protein_ligand_docking # T11b
016_protein_ligand_interactions # T11c
017_python_jupyter_introduction # Revamp AI in Medicine

Docs not synced with `master`?

Docs are not synced with master branch.

We have the latest installation link on master: https://raw.githubusercontent.com/volkamerlab/TeachOpenCADD/master/devtools/other-conda-envs/users_env.yml

But the deprecated installation link still shows up on the website:
https://raw.githubusercontent.com/volkamerlab/TeachOpenCADD/master/environment.yml

I cannot find the RTD (?) builts for the TeachOpenCADD website, @jaimergp, could you please point me to them? Thank you!

Timestamp DB queries for input data consistency

If we query the databases often, the input data will change each time (specially with T1), so all the downstream notebooks will be slightly modified. We could filter out data after an arbitrary date (let's say 1/1/2020) to reduce this noise.

When this is addressed, make sure to review the text parts where some output is mentioned (as in "the first result shows a molecule named X").

Talktorial 11b: Problems connecting to OPAL web services

Hi.
Thank you very much for putting together these talktorials. It's aweTsome !
I'm having issues connecting to the OPAL web services. The line client = Client("http://nbcr-222.ucsd.edu/opal2/services/vina_1.1.2?wsdl") in the fonction opal_run_docking(protein, ligand, center, size, stream_output=True) is giving me an ExpatError and SAXParseException.
Any idea on how I could fix that?
Thanks again.

ExpatError Traceback (most recent call last)
~/anaconda3/envs/teachopencadd/lib/python3.6/xml/sax/expatreader.py in feed(self, data, isFinal)
216 # except when invoked from close.
--> 217 self._parser.Parse(data, isFinal)
218 except expat.error as e:

ExpatError: syntax error: line 1, column 0

During handling of the above exception, another exception occurred:

SAXParseException Traceback (most recent call last)
in
1 from suds.client import Client
----> 2 client = Client("http://nbcr-222.ucsd.edu/opal2/services/vina_1.1.2?wsdl")

T015: Docking with python API

Currently our T015 docking talktorial uses smina for docking, which works very well and is installable via conda-forge, but it does not have a Python API. Recently, AutoDock Vina released a new version that has a python API (installation instructions). We could think moving to this package in the future.

Use notebook metadata

In the same direction as #25, we can use the notebook metadata to host relevant information, like the category the talktorial belongs to

TeachOpenCADD as Jupyter Book?

Explore Jupyter Book to render TeachOpenCADD Jupyter Notebooks:

Also, maybe use this Scientific Python introduction for talktorial 017 (rendered with Jupyter Book)?
https://python-programming.quantecon.org/index_toc.html

Add support for Google Colab

We currently offer running the notebooks

locally via Jupyter Lab
remotely via Binder

In the future, add support for Google Colab.

Per notebook add cells at the beginning of the notebook to install the dependencies from our teachopencadd environment, e.g. like this (see discussion #129 (comment)):

!pip install condacolab
import condacolab
condacolab.install()

!wget https://raw.githubusercontent.com/volkamerlab/TeachOpenCADD/master/devtools/other-conda-envs/users_env.yml
!mamba env update -n base -f users_env.yml

However, some dependencies seem to cause problems:

T010: numpy version does not match with mdanalysis: #148

Installing TeachOpenCADD on google colab

I am trying to install TeachOpenCADD on google colab using the instruction in the web page:

https://projects.volkamerlab.org/teachopencadd/installing.html

After installing conda in the notebook, the command below gives the following error:

!conda env create -f https://raw.githubusercontent.com/volkamerlab/TeachOpenCADD/master/environment.yml

CondaHTTPError: HTTP 404 NOT FOUND for url https://raw.githubusercontent.com/volkamerlab/TeachOpenCADD/master/environment.yml
Elapsed: 00:00.133251

An HTTP error occurred when trying to retrieve this URL.
The URL does not exist.

Could you give the right URL, please?

I cannot import the packages from teachopencadd. Thanks.

Black-nb exclusions

Right now we are excluding T002 from being black checked. See comment here #89 (comment).

This is a note to remind us we should remove that exclusion whenever psf/black#1629 is fixed.

Talktorial T013 .ipynb is missing in the master repo for Query Klifs

I had to check the t011 branch to get the tutorial and all the codes in there seem fine. The team can have a look into it. Thanks for the awesome job. Getting to know about TeachOpenCADD is one of the best things that has happened to me this year.

CI: don't forget adding more notebooks to `treebeard.yaml`

As we open PRs, these need to add their new notebooks to treebeard.yaml. This won't be needed once the existing notebooks in tree have been processed. This involves minor changes to the CI pipelines so we just ls the directory instead of cherry picking which notebooks have to undergo testing.