Giter Club home page Giter Club logo

azimuth's People

Contributors

elibol avatar jjc2718 avatar nfusi avatar robertsami avatar v-ketian avatar zhmz90 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azimuth's Issues

Azimuth tests failed with wrong scikit-learn version.

Azimuth required scikit-learn >=0.17.1, < 0.18.1
pip install azimuth may install scikit-learn 0.18.1
so tests will fail when you do nosetests:

c:\python27\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
c:\python27\lib\site-packages\sklearn\grid_search.py:43: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
EE
======================================================================
ERROR: test_predictions_nopos (test_saved_models.SavedModelTests)
----------------------------------------------------------------------
Traceback (most recent call last):
...
  File "sklearn\tree\_tree.pyx", line 632, in sklearn.tree._tree.Tree.__setstate__ (sklearn\tree\_tree.c:8125)
KeyError: 'max_depth'
-------------------- >> begin captured stdout << ---------------------
No model file specified, using V3_model_nopos

--------------------- >> end captured stdout << ----------------------

======================================================================
ERROR: test_predictions_pos (test_saved_models.SavedModelTests)
----------------------------------------------------------------------
Traceback (most recent call last):
 ...
  File "sklearn\tree\_tree.pyx", line 632, in sklearn.tree._tree.Tree.__setstate__ (sklearn\tree\_tree.c:8125)
KeyError: 'max_depth'
-------------------- >> begin captured stdout << ---------------------
No model file specified, using V3_model_full

--------------------- >> end captured stdout << ----------------------

----------------------------------------------------------------------
Ran 2 tests in 0.813s

FAILED (errors=2)

DeprecationWarning from sklearn

I know I can suppress warnings on the client side, but it would be better to get rid of them at the source.

import azimuth.model_comparison as mc,numpy as np
/opt/apps/tools/anaconda2-2.5.0/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/opt/apps/tools/anaconda2-2.5.0/lib/python2.7/site-packages/sklearn/grid_search.py:43: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)

Negative guide scores

Hello! We've noticed that occasionally, guides that we score will get a negative predicted score from azimuth. Here's an example:

> from azimuth.model_comparison import predict
> import numpy
> predict(numpy.array(['AACTGATTTCTGGCGTTTTCTTTCTGGCTC']), numpy.array([8905]), numpy.array([96]))
No model file specified, using V3_model_full
array([-0.04603427])

Empirically, it looks like negative scores are more likely to happen with peptide percentages closer to 100. However, from the Azimuth documentation (and the general understanding of CRISPR on-target scores, it looks like scores are expected to be between 0.0 and 1.0. Is this possibly a bug in the scoring system?

If it's helpful, this does not happen if the peptide percentage is set to 95 or less.

Thanks!

Support Python 3

Azimuth doesn't seem to be Python 3 ready. Could this be done?

Issues with using/re-training saved models

Hi,

I'm trying to compute Rule Set 2 scores using model_comparison.predict on the nopos model.

If I download the latest version from github, the pickle files don't seem to load:

ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long'

So I run model_comparison.py as instructed on the main github page to reproduce the two models.
It prints a LOT of warnings:

WARNING: trimming max_index_to use down to length of string=30

but does eventually produce two model files.

However if I use those files, I get a different answer to the value you report in the README that comes with the Rule Set 2 Calculator 0.5909 vs 0.5656.

Looking at that latest test_saved_models.py, it looks like you expect the new value. So I guess I'm just wondering what the reason is for the change? Which one corresponds to the method used in your paper? I'm guessing the older one? In which case, how do I reproduce that?

(sorry I didn't mean to submit this issue when I submitted it, and have since resolved some issues with versioning, so now it's a question instead!)

Thanks,
Felicity

nosetests KeyError: max_depth

Fixed: I seemed to have missed reading this section, which solved my issue.

Generating new model .pickle files

Sometimes the pre-computed .pickle files in the saved_models directory are incompatible with different versions of scikitlearn. You can re-train the files saved_models/V3_model_full.pickle and saved_models/V3_model_nopos.pickle by running the command python model_comparison.py (which will overwrite the saved models). You can check that the resulting models match the models we precomputed by running python test_saved_models.py within the directory tests.

Using:
sklearn version 0.18.1
python version 2.7.6

I believe that the error arises because the pickled models were built using sklearn 0.17, and it seems like the API for trees changed slightly with sklearn 0.18.

Please let me know if there is any other helpful information I can provide.

$ nosetests

/afs/csail.mit.edu/u/m/maxwshen/.local/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/afs/csail.mit.edu/u/m/maxwshen/.local/lib/python2.7/site-packages/sklearn/grid_search.py:43: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
EE
======================================================================
ERROR: test_predictions_nopos (test_saved_models.SavedModelTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/cluster/mshen/tools/Azimuth/azimuth/tests/test_saved_models.py", line 17, in test_predictions_nopos
    predictions = azimuth.model_comparison.predict(np.array(df['guide'].values), None, None)
  File "/cluster/mshen/tools/Azimuth/azimuth/model_comparison.py", line 550, in predict
    model, learn_options = pickle.load(f)
  File "/usr/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1217, in load_build
    setstate(state)
  File "sklearn/tree/_tree.pyx", line 632, in sklearn.tree._tree.Tree.__setstate__ (sklearn/tree/_tree.c:8128)
KeyError: 'max_depth'
-------------------- >> begin captured stdout << ---------------------
No model file specified, using V3_model_nopos

--------------------- >> end captured stdout << ----------------------

======================================================================
ERROR: test_predictions_pos (test_saved_models.SavedModelTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/cluster/mshen/tools/Azimuth/azimuth/tests/test_saved_models.py", line 22, in test_predictions_pos
    predictions = azimuth.model_comparison.predict(np.array(df['guide'].values), np.array(df['AA cut'].values), np.array(df['Percent peptide'].values))
  File "/cluster/mshen/tools/Azimuth/azimuth/model_comparison.py", line 550, in predict
    model, learn_options = pickle.load(f)
  File "/usr/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1217, in load_build
    setstate(state)
  File "sklearn/tree/_tree.pyx", line 632, in sklearn.tree._tree.Tree.__setstate__ (sklearn/tree/_tree.c:8128)
KeyError: 'max_depth'
-------------------- >> begin captured stdout << ---------------------
No model file specified, using V3_model_full

--------------------- >> end captured stdout << ----------------------

----------------------------------------------------------------------
Ran 2 tests in 4.093s

FAILED (errors=2)

How to install

It was very hard to get the Azimuth 2.0 installed in Python 2.7 so I thought I share my experience with others.

  1. Do not use the docker image that was created back in 2017. The results generated by the docker image doesn't match the ones generated by the GPP Web portal (https://portals.broadinstitute.org/gpp/public/)

image

  1. Use the following instructions to install it in Python 2.7 and the results match the GPP Web portal

1 conda create --name azimuth python=2.7
2 conda activate azimuth
3 conda install biopython
4 conda install scikit-learn=0.17.1
5 pip install 'numpy==1.12.1'
6 pip install azimuth

image

running nosetests failed

When I run nosetests son azimuth it fails with the following output:

# nosetests
/Library/Python/2.7/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning)
/Library/Python/2.7/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20. DeprecationWarning)
F
======================================================================
FAIL: test_predictions (azimuth.tests.test_saved_models.SavedModelTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/azimuth/tests/test_saved_models.py", line 19, in test_predictions
self.assertTrue(np.allclose(predictions, df['Stable prediction'].values, atol=1e-3))
AssertionError: False is not true
-------------------- >> begin captured stdout << ---------------------
No model file specified, using V3_model_nopos
`--------------------- >> end captured stdout << ----------------------`
----------------------------------------------------------------------
Ran 1 test in 3.481s
``
FAILED (failures=1)

azimuth Python package and azimuth web service yield different scores

this script reproduces the problem.

Also, the scores obtained by that script from the azimuth Python package seem to agree with those produced by https://crispr.ml when submitting ENSG00000100823 (APEX1), and with those produced by GPP sgRNA Designer when submitting gene ID 328 (also APEX1), indicating that both servers ignore the parameters "Target Cut Length" and "Target Cut %" when calculating the "On-Target Efficacy Score". This limitation is not obvious from the documentation.

Discrepancy in location of PAM [featurization.py]

http://www.nature.com/nbt/journal/v32/n12/fig_tab/nbt.3026_F3.html shows the PAM at positions 24-27 within the 30-mer (using zero-based Python indexing) and the length-20 targeting sequence at positions 4-24. Azimuth's pam_audit code uses this definition, since it requires a 'GG' at seq[25:27]. nucleotide_features_dictionary also uses this definition.

However, countGC assumes the targeting sequence is at positions 5-25: return len(s[5:25].replace('A', '').replace('T', '')). Tm_feature also assumes this: featarray[i,1] = Tm.Tm_staluc(seq[20:25], rna=rna) #5nts immediately proximal of the NGG PAM. These seem incorrect. Could you clarify in the documentation whether the targeting sequence is at positions 4-24 or 5-25?

Thanks for making this tool available to the community!

ImportError: No module named cross_validation

Hi Author,
I am new to Python, so I know the problem is relate to the renaming and deprecation of cross_validation sub-module to model_selection.
But I don't know how can I fixed.
Look forward to your reply.
Yao

ValueError: 'R' is not in list

Hi,
I always get the follows error when running Azimuth.

  File "/home/Azimuth-2.0/azimuth/model_comparison.py", line 559, in predict
    feature_sets = feat.featurize_data(Xdf, learn_options, pandas.DataFrame(), gene_position, pam_audit=pam_audit, length_audit=length_audit)
  File "/home/Azimuth-2.0/azimuth/features/featurization.py", line 31, in featurize_data
    get_all_order_nuc_features(data['30mer'], feature_sets, learn_options, learn_options["order"], max_index_to_use=30, quiet=quiet)
  File "/home/Azimuth-2.0/azimuth/features/featurization.py", line 153, in get_all_order_nuc_features
    include_pos_independent=True, max_index_to_use=max_index_to_use, prefix=prefix)
  File "/home/Azimuth-2.0/azimuth/features/featurization.py", line 423, in apply_nucleotide_features
    feat_pd = seq_data_frame.apply(nucleotide_features, args=(order, max_index_to_use, prefix, 'pos_dependent'))
  File "/home/lib/python2.7/site-packages/pandas/core/series.py", line 3194, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/src/inference.pyx", line 1472, in pandas._libs.lib.map_infer
  File "/home/lib/python2.7/site-packages/pandas/core/series.py", line 3181, in <lambda>
    f = lambda x: func(x, *args, **kwds)
  File "/home/Azimuth-2.0/azimuth/features/featurization.py", line 468, in nucleotide_features
    features_pos_dependent[alphabet.index(nucl) + (position*len(alphabet))] = 1.0
ValueError: 'R' is not in list

Sometimes, also have the error ValueError: 'K' is not in list.
I have searched via Google, but no solution found.
So, how to solve the problemz? Thanks.

Unable to re-train

Attempts to run model_comparison.py to re-train for scikit-learn>=0.17.1 fail with the traceback below. There is no upper bound in the scikit-learn version so pip install azimuth installed scikit_learn==0.19.1. What is the solution (why is xlrd not a required package)?

[~]# python /Library/Python/2.7/site-packages/azimuth/model_comparison.py
/Library/Python/2.7/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning)
/Library/Python/2.7/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20. DeprecationWarning)
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/azimuth/model_comparison.py", line 609, in <module> save_final_model_V3(filename='saved_models/V3_model_nopos.pickle', include_position=False)
File "/Library/Python/2.7/site-packages/azimuth/model_comparison.py", line 468, in save_final_model_V3
'train_genes': azimuth.load_data.get_V3_genes(),
File "/Library/Python/2.7/site-packages/azimuth/load_data.py", line 466, in get_V3_genes
target_genes = np.concatenate((get_V1_genes(data_fileV1), get_V2_genes(data_fileV2)))
File "/Library/Python/2.7/site-packages/azimuth/load_data.py", line 456, in get_V1_genes
annotations, gene_position, target_genes, Xdf, Y = read_V1_data(data_file, learn_options=None)
File "/Library/Python/2.7/site-packages/azimuth/load_data.py", line 132, in read_V1_data
human_data = pandas.read_excel(data_file, sheetname=0, index_col=[0, 1])
File "/Library/Python/2.7/site-packages/pandas/io/excel.py", line 203, in read_excel
io = ExcelFile(io, engine=engine)
File "/Library/Python/2.7/site-packages/pandas/io/excel.py", line 232, in __init__
import xlrd # throw an ImportError if we need to
ImportError: No module named xlrd

add other PAMs

Looks like you hardcoded NGGs, what about CRISPR with other PAMs, like Cpf1?

explain why 30nt are needed

As typical gRNA length is 20nt I wonder why 30nt are needed and how to define those nt (as I can have gRNA and 10 nt before or gRNA and 20nt after)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.