Giter Club home page Giter Club logo

dragonn's Introduction

Build Status license

The dragonn package implements Deep RegulAtory GenOmic Neural Networks (DragoNNs) for predictive modeling of regulatory genomics, nucleotide-resolution feature discovery, and simulations for systematic development and benchmarking.

demo

Installation

To install the latest released version of DragoNN, install the Anaconda python distribution. Then, run:

conda install dragonn -c kundajelab

DragoNN is compatible with Python2 and Python3. Specific optional features such as DeepLIFT and MOE are compatible with Python2 only.

15 seconds to your first DragoNN model

The dragonn package provides a simple command line interface to train DragoNN models, test them, and predict on sequence data. Train an example model by running:

dragonn train --pos-sequences examples/example_pos_sequences.fa --neg-sequences examples/example_neg_sequences.fa --prefix training_example

This will store a model file, training_example.model.json, with the model architecture and a weights file, training_example.weights.h5, with the parameters of the trained model. Test the model by running:

dragonn test --pos-sequences examples/example_pos_sequences.fa --neg-sequences examples/example_neg_sequences.fa --arch-file training_example.arch.json --weights-file training_example.weights.h5

This will print the model's test performance metrics. Model predictions on sequence data can be obtained by running:

dragonn predict --sequences examples/example_pos_sequences.fa --arch-file training_example.arch.json --weights-file training_example.weights.h5 --output-file example_predictions.txt

This will store the model predictions for sequences in example_pos_sequences.fa in the output file example_predictions.txt. Interpret sequence data with a dragonn model by running:

dragonn interpret --sequences examples/example_pos_sequences.fa --arch-file training_example.arch.json --weights-file training_example.weights.h5 --prefix example_interpretation

This will write the most important subsequence in each input sequence along with its location in the input sequence in the file example_interpretation.task_0.important_sequences.txt. Note: by default, only examples with predicted positive class probability >0.5 are interpreted. Examples below this threshold yield important subsequence of Ns with location -1. This default can be changed with the flag --pos-threshold.

We encourage DragoNN users to share models in the Kipoi Genomics Model Zoo. Enjoy!

DragoNN paper supplement

We provide trained models, data, and code in the paper supplement to reproduce results in the DragoNN manuscript.

dragonn's People

Contributors

agitter avatar alexandari avatar annashcherbina avatar avantishri avatar chrisprobert avatar dependabot[bot] avatar jisraeli avatar kiminsigne avatar omoindrot avatar wainberg avatar ziwei-75 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dragonn's Issues

Roadmap

Here are some of the key features that we should add based on feedback from kundajelab members and people who have attended the dragonn workshops so far:

  • A dragonn interpret command that will interpret fasta data with a trained model using deeplift/ISM and return a homer-style output file, with the most important subsequence(s) for each sequence and corresponding location(s) in the sequence
  • Unit tests. We have some preliminary units tests that check for reproducible performance on simulations. These are failing due to irreproducibility issues in the simulations that we haven't worked through yet. Let's temporarily replace with tests that check that main classes of models (single task, multi task, etc) run to completion and add them to travis script.
  • More models in the model zoo, especially models presented in the primer

If you have any suggestions, please share your thoughts. This is our opportunity to make a big contribution to deep learning for genomics!

Minor grammar suggestions for installation of DragoNN AWS instance

Hi there,

On this page (http://kundajelab.github.io/dragonn/cloud_resources.html), the step "Alternatively, you can run the Dragonn jupyter notebook by executing the following commands: sudo su passwd ubuntu enter your desired password when prompted." suggests that
sudo su passwd ubuntu is a one line command. For whatever reason, this step only worked when I separated sudo su and passwd ubuntu. Maybe it was just my system, but future users might run into this as well.

Additionally, in the README of the image, the outlined steps fail to mention that the notebook to be launched requires an account "ubuntu" with an associated "passwd".

Errors in installing Dragonn:

Hi Johnny,

It looks like Dragonn is not so easy to install. We tried installing Dragonn following the steps in GitHub, but it failed in both Conda evironments.

$ which python
~/anaconda2/bin/python
$ python -V
Python 2.7.13 :: Anaconda custom (64-bit)
$ conda install dragonn -c kundajelab     # it works OK, no error returns.
$ dragonn
Traceback (most recent call last):
  File "/home/tangb/anaconda2/bin/dragonn", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/home/tangb/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3138, in <module>
    @_call_aside
  File "/home/tangb/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3122, in _call_aside
    f(*args, **kwargs)
  File "/home/tangb/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3151, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/home/tangb/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 666, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/home/tangb/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 679, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/home/tangb/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 867, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'pydot_ng==1.0.0' distribution was not found and is required by dragonn

The problem keeps occuring. Then I tried installing it with Anaconda 3 in another machine,

$ python -V
Python 3.6.3 :: Anaconda custom (64-bit)
$ conda install dragonn -c kundajelab
Fetching package metadata ...............
Solving package specifications:
PackageNotFoundError: Packages missing in current channels:
  - dragonn -> keras ==0.3.2

We have searched for the packages in the following channels:
  - https://conda.anaconda.org/kundajelab/win-64
  - https://conda.anaconda.org/kundajelab/noarch
  - https://repo.continuum.io/pkgs/main/win-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/win-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/win-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/win-64
  - https://repo.continuum.io/pkgs/pro/noarch
  - https://repo.continuum.io/pkgs/msys2/win-64
  - https://repo.continuum.io/pkgs/msys2/noarch

$ conda update keras
Fetching package metadata .............
Solving package specifications: .
All requested packages already installed.
packages in environment at C:\ProgramData\Anaconda3:
keras                     2.0.8            py36h65e7a35_0

But the following installation failed yet. Could you please help check the errors? Thanks a lot for your help.

Trouble executing DragoNN

Thanks for making this package available! However, I am having some trouble running it on an academic computing cluster where I have an account.

First, I tried to install it through python setup.py install. That appeared to work, but then I ran into the error below when trying to execute:

b-an01 [~/dragonn]$ dragonn
Traceback (most recent call last):
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/bin/dragonn", line 11, in <module>
    load_entry_point('dragonn==0.1.3', 'console_scripts', 'dragonn')()
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 565, in load_entry_point
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 2598, in load_entry_point
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 2258, in load
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 2264, in resolve
  File "build/bdist.linux-x86_64/egg/dragonn/__main__.py", line 5, in <module>
  File "build/bdist.linux-x86_64/egg/dragonn/utils.py", line 6, in <module>
  File "build/bdist.linux-x86_64/egg/simdna/__init__.py", line 3, in <module>
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 1203, in resource_filename
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 1716, in get_resource_filename
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 1737, in _extract_resource
KeyError: 'simdna/resources/HOCOMOCOv10_HUMAN_mono_homer_format_0.001.motif.gz'

On the other hand, using the Anaconda installer by conda install dragonn -c kundajelab, I run into this when executing:

(conda_env_dragonn) b-an01 [~/dragonn]$ dragonn
Traceback (most recent call last):
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/envs/conda_env_dragonn/bin/dragonn", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/envs/conda_env_dragonn/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 2985, in <module>
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/envs/conda_env_dragonn/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 2971, in _call_aside
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/envs/conda_env_dragonn/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 2998, in _initialize_master_working_set
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/envs/conda_env_dragonn/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 662, in _build_master
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/envs/conda_env_dragonn/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 675, in _build_from_requirements
  File "/pfs/nobackup/home/m/mikaelhu/anaconda2/envs/conda_env_dragonn/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/__init__.py", line 854, in resolve
pkg_resources.DistributionNotFound: The 'pydot_ng==1.0.0' distribution was not found and is required by dragonn

I used a Python 2.7 virtual environment because it seems DragoNN has extra functionalities with 2.7 as opposed to 3.x.

typo in hyperparameter_search.py

I believe there is a typo in the __init__ function for class HyperparameterSearcher. The default metric='auPRG' should be metric='auPRC'. I was running the simple_motif_detection.py script and received this key error:

Traceback (most recent call last):
  File "simple_motif_finding.py", line 89, in <module>
    searcher.search(num_hyperparameter_trials)
  File "build/bdist.macosx-10.9-x86_64/egg/dragonn/hyperparameter_search.py", line 117, in search
  File "build/bdist.macosx-10.9-x86_64/egg/dragonn/models.py", line 36, in score
  File "build/bdist.macosx-10.9-x86_64/egg/dragonn/metrics.py", line 73, in __getitem__
KeyError: 'auPRG'

workshop_tutorial.ipynb throwing ValueError

Went through the workshop_tutorial.ipynb without changing anything, but am getting a ValueError.

This is the line initiating the error:

one_filter_dragonn = get_SequenceDNN(one_filter_dragonn_parameters)


ValueError Traceback (most recent call last)
in ()
----> 1 one_filter_dragonn = get_SequenceDNN(one_filter_dragonn_parameters)

/Users/jeffrey/anaconda2/lib/python2.7/site-packages/dragonn/tutorial_utils.pyc in get_SequenceDNN(SequenceDNN_parameters)
80
81 def get_SequenceDNN(SequenceDNN_parameters):
---> 82 return SequenceDNN(**SequenceDNN_parameters)
83
84

/Users/jeffrey/anaconda2/lib/python2.7/site-packages/dragonn/models.pyc in init(self, seq_length, use_deep_CNN, use_RNN, num_tasks, num_filters, conv_width, num_filters_2, conv_width_2, num_filters_3, conv_width_3, pool_width, L1, dropout, GRU_size, TDD_size, verbose)
129 nb_filter=num_filters, nb_row=4,
130 nb_col=conv_width, activation='linear',
--> 131 init='he_normal', input_shape=self.input_shape))
132 self.model.add(Activation('relu'))
133 self.model.add(Dropout(dropout))

/Users/jeffrey/anaconda2/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/layers/convolutional.py in init(self, nb_filter, nb_row, nb_col, init, activation, weights, border_mode, subsample, dim_ordering, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, **kwargs)
253 self.initial_weights = weights
254 self.input = K.placeholder(ndim=4)
--> 255 super(Convolution2D, self).init(**kwargs)
256
257 def build(self):

/Users/jeffrey/anaconda2/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/layers/core.py in init(self, **kwargs)
49 self.set_input_shape(tuple(kwargs['batch_input_shape']))
50 elif 'input_shape' in kwargs:
---> 51 self.set_input_shape((None,) + tuple(kwargs['input_shape']))
52 self.trainable = True
53 if 'trainable' in kwargs:

/Users/jeffrey/anaconda2/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/layers/core.py in set_input_shape(self, input_shape)
155 self._input_shape = input_shape
156 self.input = K.placeholder(shape=self._input_shape)
--> 157 self.build()
158
159 @Property

/Users/jeffrey/anaconda2/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/layers/convolutional.py in build(self)
264 else:
265 raise Exception('Invalid dim_ordering: ' + self.dim_ordering)
--> 266 self.W = self.init(self.W_shape)
267 self.b = K.zeros((self.nb_filter,))
268 self.trainable_weights = [self.W, self.b]

/Users/jeffrey/anaconda2/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/initializations.py in he_normal(shape, name)
46 ''' Reference: He et al., http://arxiv.org/abs/1502.01852
47 '''
---> 48 fan_in, fan_out = get_fans(shape)
49 s = np.sqrt(2. / fan_in)
50 return normal(shape, s, name=name)

/Users/jeffrey/anaconda2/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/initializations.py in get_fans(shape)
5
6 def get_fans(shape):
----> 7 fan_in = shape[0] if len(shape) == 2 else np.prod(shape[1:])
8 fan_out = shape[1] if len(shape) == 2 else shape[0]
9 return fan_in, fan_out

/Users/jeffrey/anaconda2/lib/python2.7/site-packages/numpy/core/fromnumeric.pyc in prod(a, axis, dtype, out, keepdims)
2564
2565 return _methods._prod(a, axis=axis, dtype=dtype,
-> 2566 out=out, **kwargs)
2567
2568

/Users/jeffrey/anaconda2/lib/python2.7/site-packages/numpy/core/_methods.pyc in _prod(a, axis, dtype, out, keepdims)
33
34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):
---> 35 return umr_prod(a, axis, dtype, out, keepdims)
36
37 def _any(a, axis=None, dtype=None, out=None, keepdims=False):

ValueError: setting an array element with a sequence.

Any help would be appreciated.
Thanks 

DragoNN with multiple GPUs

Hello,

Great framework !
I was wondering if it's possible make DragoNN work with multiple GPUs?
When I try to run the example script with the theano flags Iike this:
THEANO_FLAGS='contexts=dev0->cuda0;dev1->cuda1,floatX=float32' python simple_motif_detection.py

I get the following error (repeated multiple times):

ERROR (theano.gof.opt): Optimization failure due to: LocalOptGroup(local_abstractconv_cudnn,local_abstractconv_gw_cudnn,local_abstractconv_gi_cudnn,local_abstractconv_gemm,local_abstractconv3d_gemm,local_abstractconv_gradweights_gemm,local_abstractconv3d_gradweights_gemm,local_abstractconv_gradinputs_gemm,local_abstractconv3d_gradinputs_gemm) ERROR (theano.gof.opt): node: AbstractConv2d{convdim=2, border_mode='valid', subsample=(1, 1), filter_flip=True, imshp=(None, 1, 4, 500), kshp=(45, 1, 4, 10), filter_dilation=(1, 1)}(Assert{msg='AbstractConv shape mismatch: shape of image does not match given imshp.'}.0, Assert{msg='AbstractConv shape mismatch: shape of filters does not match given kshp.'}.0) ERROR (theano.gof.opt): TRACEBACK: ERROR (theano.gof.opt): Traceback (most recent call last): File "/isdata/nalcaraz/Programs/anaconda2/lib/python2.7/site-packages/Theano-0.9.0b1-py2.7.egg/theano/gof/opt.py", line 1964, in process_node replacements = lopt.transform(node) File "/isdata/nalcaraz/Programs/anaconda2/lib/python2.7/site-packages/Theano-0.9.0b1-py2.7.egg/theano/gof/opt.py", line 1316, in transform new_repl = opt.transform(node) File "/isdata/nalcaraz/Programs/anaconda2/lib/python2.7/site-packages/Theano-0.9.0b1-py2.7.egg/theano/gpuarray/dnn.py", line 2641, in local_abstractconv_cudnn ctx = infer_context_name(*node.inputs) File "/isdata/nalcaraz/Programs/anaconda2/lib/python2.7/site-packages/Theano-0.9.0b1-py2.7.egg/theano/gpuarray/basic_ops.py", line 122, in infer_context_name raise ValueError("Could not infer context from inputs") ValueError: Could not infer context from inputs

It works fine when I use just use one GPU.
I'm using dragonn version 0.1.3 and
Theano 0.9.0beta1.

Multiple Motif Detection in DragoNN Simulated Data

Hello;

While I was working on transcription factor binding sites and motif detection, I noticed your DragoNN toolkit and Github profile. It is very informative and useful. I aim to develop a deep learning model for multiple motif recognition. At this point, I intend to use your simulation data accessible via the following link: https://github.com/kundajelab/dragonn/blob/master/paper_supplement/simulation_data/GC_fraction0.4max_num_motifs3min_num_motifs0motif_names%5B'CTCF_known1'%2C%20'ZNF143_known2'%2C%20'SIX5_known1'%5Dnum_seqs20000seq_length500.npz

As far as I understand, the sequences in this dataset consists of 500 nucleotides and total percentage of guanine-cytosine in the sequences is approximately 0.4 However, I am confused at the number of motifs in the sequences. Max and minimum number of motifs are set to 3 and 0 respectively.

What does it exactly mean ?

Can 3 instances of each motif exist in a positive sequence ? In other words, the max number of motifs is for each motif or sum of all three motifs ? In first case, up to 3 instances of each motif can exist in a positive sequence. In second case, only 1 instance of each motif can be accommodated in a positive sequence.

Citation for DragoNN

Hi,

Is there an official citation for DragoNN that I can use (e.g. bioRxiv)?

Thank you!

Missing Dependency for Example Script simple_motif_detection.py?

Hi,

  1. I think there is a discrepancy between this git repo and the zip/tar files available from http://kundajelab.github.io/. Most noticeably, the dragonn-0.1.0 zip file contains the example script run.py, which works fine for me, while the git repo contains the example script simple_motif_detection.py. Some other files also seem to be different.
  2. When I try to run "python simple_motif_detection.py", I get the following error message (running on OS X 10.11.6):
Traceback (most recent call last):
  File "simple_motif_detection.py", line 5, in <module>
    from dragonn.hyperparameter_search import RandomSearch
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/dragonn/hyperparameter_search.py", line 6, in <module>
    from moe.easy_interface.experiment import Experiment
ImportError: No module named moe.easy_interface.experiment

Is this a missing dependency? I tried installing the relevant module from http://yelp.github.io/MOE/install.html, but the error persisted.

  1. I'm putting this under the same issue as I assume it may be a related error, but when I run "dragonn train --pos-sequences examples/example_pos_sequences.fa --neg-sequences examples/example_neg_sequences.fa --prefix training_example" as indicated under "15 seconds to your first DragoNN model", I'm receiving the following error message:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/dragonn", line 9, in <module>
    load_entry_point('dragonn==0.1.0', 'console_scripts', 'dragonn')()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/__init__.py", line 542, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2569, in load_entry_point
    raise ImportError("Entry point %r not found" % ((group, name),))
ImportError: Entry point ('console_scripts', 'dragonn') not found

Despite these issues, what I have gotten to work so far has been pretty awesome! This truly is deep learning for dummies. Thanks!
-Rajiv

conda install not working for python 3.6 / tutorial not working for python 2.7(dragonn0.1.1)

I am writing my bachelor thesis on bioinformatics, looking into deep learning and neural networks. To gain a beter grasp of what I'm writing about I wanted to do the tutorials. But it seems I found myself in dependency hell.

When first trying to run the tutorial from a hosted runtime in collab the pip install of the dragonn package wouldn't work. I suspect because the hosted collab is running on too new a python version.

No worries I thought, I'll create a local runtime to connect to, I've done that before. Of course this means I needed to locally install the package, and thus install anaconda. After faffing about a bit with the anaconda version for windows I concluded that the conda packages needed were not available/compatable for windows. Alright I'll just setup a windows subsystem for linux (WSL) and do things from there. It had been some time since I worked last in the command line of linux, but after some more faffing about I managed to install dragonn0.1.1 for python 2.7. Good I though, now I just run a jupyter session, connect to that runtime from collab and we're cooking. Turns out the first function I try to import isn't in the simulations.py script of my dragonn install. I now supect the tutorial was written for a later dragonn version supported by a python 3.* version.

However everytime I tried installing dragonn on a conda environment running for example python 3.6 I have run into dependancy issues I am unable to resolve. More specifically I get the messaseges listed below:

Could not solve for environment specs
The following packages are incompatible
├─ dragonn is installable with the potential options
│  ├─ dragonn [0.1.0|0.1.1] would require
│  │  └─ keras 0.3.2  with the potential options
│  │     ├─ keras 0.3.2 would require
│  │     │  └─ python 2.7* , which can be installed;
│  │     └─ keras 0.3.2 would require
│  │        └─ python 3.4* , which does not exist (perhaps a missing channel);
│  └─ dragonn 0.1.2 would require
│     └─ pydot-ng, which does not exist (perhaps a missing channel);
└─ pin-1 is not installable because there are no viable options
   ├─ pin-1 1 would require
   │  └─ python 3.6.* , which conflicts with any installable versions previously reported;
   └─ pin-1 1 would require
      └─ python 3.6.* , which conflicts with any installable versions previously reported.

Pins seem to be involved in the conflict. Currently pinned specs:
 - python 3.6.* (labeled as 'pin-1')

I supect this is because conda can not get a hold of pydot-ng. I tried installing pydot-ng using pip, which worked, however it does not change the issue conda is running into.

At this point I am out of ideas. Can anyone help me in getting this working so I can do the tutorial?

Problems installing bleeding edge of DragoNN

Hello,

I tried to install the bleeding edge of DragoNN following standard procedures (I already did conda install -c kundajelab dragonn):

python setup.py install

I get the following error message, any ideas? I am running OSX 10.12 Sierra (upgraded recently). Curiously, I see that it makes a bdist.macosx-10.9-x86_64 directory, but I don't know if that is relevant:

running install
running bdist_egg
running egg_info
creating dragonn.egg-info
writing requirements to dragonn.egg-info/requires.txt
writing dragonn.egg-info/PKG-INFO
writing top-level names to dragonn.egg-info/top_level.txt
writing dependency_links to dragonn.egg-info/dependency_links.txt
writing entry points to dragonn.egg-info/entry_points.txt
writing manifest file 'dragonn.egg-info/SOURCES.txt'
reading manifest file 'dragonn.egg-info/SOURCES.txt'
writing manifest file 'dragonn.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.9-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/dragonn
copying dragonn/__init__.py -> build/lib/dragonn
copying dragonn/__main__.py -> build/lib/dragonn
copying dragonn/hyperparameter_search.py -> build/lib/dragonn
copying dragonn/metrics.py -> build/lib/dragonn
copying dragonn/models.py -> build/lib/dragonn
copying dragonn/plot.py -> build/lib/dragonn
copying dragonn/tutorial_utils.py -> build/lib/dragonn
copying dragonn/utils.py -> build/lib/dragonn
copying dragonn/visualize_util.py -> build/lib/dragonn
creating build/bdist.macosx-10.9-x86_64
creating build/bdist.macosx-10.9-x86_64/egg
creating build/bdist.macosx-10.9-x86_64/egg/dragonn
copying build/lib/dragonn/__init__.py -> build/bdist.macosx-10.9-x86_64/egg/dragonn
copying build/lib/dragonn/__main__.py -> build/bdist.macosx-10.9-x86_64/egg/dragonn
copying build/lib/dragonn/hyperparameter_search.py -> build/bdist.macosx-10.9-x86_64/egg/dragonn
copying build/lib/dragonn/metrics.py -> build/bdist.macosx-10.9-x86_64/egg/dragonn
copying build/lib/dragonn/models.py -> build/bdist.macosx-10.9-x86_64/egg/dragonn
copying build/lib/dragonn/plot.py -> build/bdist.macosx-10.9-x86_64/egg/dragonn
copying build/lib/dragonn/tutorial_utils.py -> build/bdist.macosx-10.9-x86_64/egg/dragonn
copying build/lib/dragonn/utils.py -> build/bdist.macosx-10.9-x86_64/egg/dragonn
copying build/lib/dragonn/visualize_util.py -> build/bdist.macosx-10.9-x86_64/egg/dragonn
byte-compiling build/bdist.macosx-10.9-x86_64/egg/dragonn/__init__.py to __init__.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/dragonn/__main__.py to __main__.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/dragonn/hyperparameter_search.py to hyperparameter_search.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/dragonn/metrics.py to metrics.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/dragonn/models.py to models.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/dragonn/plot.py to plot.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/dragonn/tutorial_utils.py to tutorial_utils.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/dragonn/utils.py to utils.pyc
byte-compiling build/bdist.macosx-10.9-x86_64/egg/dragonn/visualize_util.py to visualize_util.pyc
creating build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying dragonn.egg-info/PKG-INFO -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying dragonn.egg-info/SOURCES.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying dragonn.egg-info/dependency_links.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying dragonn.egg-info/entry_points.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying dragonn.egg-info/requires.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
copying dragonn.egg-info/top_level.txt -> build/bdist.macosx-10.9-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating dist
creating 'dist/dragonn-0.1.3-py2.7.egg' and adding 'build/bdist.macosx-10.9-x86_64/egg' to it
removing 'build/bdist.macosx-10.9-x86_64/egg' (and everything under it)
Processing dragonn-0.1.3-py2.7.egg
Copying dragonn-0.1.3-py2.7.egg to /Users/yeung/anaconda2/lib/python2.7/site-packages
Adding dragonn 0.1.3 to easy-install.pth file
Installing dragonn script to /Users/yeung/anaconda2/bin

Installed /Users/yeung/anaconda2/lib/python2.7/site-packages/dragonn-0.1.3-py2.7.egg
Processing dependencies for dragonn==0.1.3
Searching for pydot_ng==1.0.0
Reading https://pypi.python.org/simple/pydot_ng/
Downloading https://pypi.python.org/packages/de/64/86b0502c3644190c0b9fed0e378ee18f31b1f0262bdead1eb9ac1d404529/pydot_ng-1.0.0.tar.gz#md5=1b0b7028609fead4199a3f99d8527b3b
Best match: pydot-ng 1.0.0
Processing pydot_ng-1.0.0.tar.gz
Writing /var/folders/_9/st14xsvd2kjfz65tyrcttylh003411/T/easy_install-rThOJC/pydot_ng-1.0.0/setup.cfg
Running pydot_ng-1.0.0/setup.py -q bdist_egg --dist-dir /var/folders/_9/st14xsvd2kjfz65tyrcttylh003411/T/easy_install-rThOJC/pydot_ng-1.0.0/egg-dist-tmp-32HcqW
zip_safe flag not set; analyzing archive contents...
Copying pydot_ng-1.0.0-py2.7.egg to /Users/yeung/anaconda2/lib/python2.7/site-packages
Adding pydot-ng 1.0.0 to easy-install.pth file

Installed /Users/yeung/anaconda2/lib/python2.7/site-packages/pydot_ng-1.0.0-py2.7.egg
Searching for matplotlib<=1.5.3
Reading https://pypi.python.org/simple/matplotlib/
Downloading https://pypi.python.org/packages/75/4e/2374eed18ac34421ccd7b4907080abd3009e112ca2c11b100c18961312e0/matplotlib-1.5.3.tar.gz#md5=ba993b06113040fee6628d74b80af0fd
Best match: matplotlib 1.5.3
Processing matplotlib-1.5.3.tar.gz
Writing /var/folders/_9/st14xsvd2kjfz65tyrcttylh003411/T/easy_install-Xp3ytW/matplotlib-1.5.3/setup.cfg
Running matplotlib-1.5.3/setup.py -q bdist_egg --dist-dir /var/folders/_9/st14xsvd2kjfz65tyrcttylh003411/T/easy_install-Xp3ytW/matplotlib-1.5.3/egg-dist-tmp-mwNxBE
============================================================================
Edit setup.cfg to change the build options

BUILDING MATPLOTLIB
            matplotlib: yes [1.5.3]
                python: yes [2.7.13 |Anaconda, Inc.| (default, Sep 21 2017,
                        17:38:20)  [GCC 4.2.1 Compatible Clang 4.0.1
                        (tags/RELEASE_401/final)]]
              platform: yes [darwin]

REQUIRED DEPENDENCIES AND EXTENSIONS
                 numpy: yes [version 1.13.1]
              dateutil: yes [using dateutil version 2.6.1]
                  pytz: yes [using pytz version 2017.2]
                cycler: yes [using cycler version 0.10.0]
               tornado: yes [using tornado version 4.5.2]
             pyparsing: yes [using pyparsing version 2.2.0]
                libagg: yes [pkg-config information for 'libagg' could not
                        be found. Using local copy.]
              freetype: yes [version 2.8.0]
                   png: yes [version 1.6.32]
                 qhull: yes [pkg-config information for 'qhull' could not be
                        found. Using local copy.]

OPTIONAL SUBPACKAGES
           sample_data: yes [installing]
              toolkits: yes [installing]
                 tests: yes [using nose version 1.3.7 / mock is required to
                        run the matplotlib test suite. Please install it
                        with pip or your preferred tool to run the test
                        suite]
        toolkits_tests: yes [using nose version 1.3.7 / mock is required to
                        run the matplotlib test suite. Please install it
                        with pip or your preferred tool to run the test
                        suite]

OPTIONAL BACKEND EXTENSIONS
                macosx: yes [installing, darwin]
                qt5agg: yes [installing, Qt: 5.6.2, PyQt: 5.6.2]
                qt4agg: no  [PySide not found; PyQt4 not found]
               gtk3agg: no  [Requires pygobject to be installed.]
             gtk3cairo: no  [Requires cairocffi or pycairo to be installed.]
                gtkagg: no  [Requires pygtk]
                 tkagg: yes [installing; run-time loading from Python Tcl /
                        Tk]
                 wxagg: no  [requires wxPython]
                   gtk: no  [Requires pygtk]
                   agg: yes [installing]
                 cairo: no  [cairocffi or pycairo not found]
             windowing: no  [Microsoft Windows only]

OPTIONAL LATEX DEPENDENCIES
                dvipng: no
           ghostscript: yes [version 9.10]
                 latex: no
               pdftops: no

OPTIONAL PACKAGE DATA
                  dlls: no  [skipping due to configuration]

UPDATING build/lib.macosx-10.9-x86_64-2.7/matplotlib/_version.py
set build/lib.macosx-10.9-x86_64-2.7/matplotlib/_version.py to '1.5.3'
clang: error: unknown argument: '-fstack-protector-strong'
error: Setup script exited with error: command '/usr/bin/clang' failed with exit status 1

Deep network for retroelement insertion motifs

Retroelement insertion has been a major contributor to genomic evolutuion across mammalian species, and Identifying sequence patterns targeted for retroelement insertion is a tractable problem. I've visualized these non-linear patterns by dot-plot and made some pictures (eg http://www.cell.com/pictureshow/bestof2013), but machine learning is what will solve the problem and open major new roads in genomics and it requires a team effort.

For example, the 300 n.t. AluY is unique to the human genome where it is inserted 8 million times in intronic A/T rich areas.

If anyone with expertise in convolutional design feels the same way about it, I would be delighted to curate data in silico and assist in deep network development.

Avi Friedlich MD

image

workshop_tutorial get_SequenceDNN() ValueError

I have been walking through the workshop on my own machine. I have changed nothing and am simply running each cell successively.

process:
conda install -c kundajelab dragonn
cd dragonn
python setup.py install
cd examples
ipython notebook ...
{run each cell}
imports, gpu registration etc all work fine

code cell 9: one_filter_dragonn = get_SequenceDNN(one_filter_dragonn_parameters)
Fails: ValueError

If I have time this weekend, I'll try to debug this, but I thought someone else might want to look too

Trace:

ValueError Traceback (most recent call last)
in ()
----> 1 one_filter_dragonn = get_SequenceDNN(one_filter_dragonn_parameters)

/home/t-benorg/anaconda/lib/python2.7/site-packages/dragonn/tutorial_utils.pyc in get_SequenceDNN(SequenceDNN_parameters)
80
81 def get_SequenceDNN(SequenceDNN_parameters):
---> 82 return SequenceDNN(**SequenceDNN_parameters)
83
84

/home/t-benorg/anaconda/lib/python2.7/site-packages/dragonn/models.pyc in init(self, seq_length, use_deep_CNN, use_RNN, num_tasks, num_filters, conv_width, num_filters_2, conv_width_2, num_filters_3, conv_width_3, pool_width, L1, dropout, GRU_size, TDD_size, verbose)
129 nb_filter=num_filters, nb_row=4,
130 nb_col=conv_width, activation='linear',
--> 131 init='he_normal', input_shape=self.input_shape))
132 self.model.add(Activation('relu'))
133 self.model.add(Dropout(dropout))

/home/t-benorg/anaconda/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/layers/convolutional.pyc in init(self, nb_filter, nb_row, nb_col, init, activation, weights, border_mode, subsample, dim_ordering, W_regularizer, b_regularizer, activity_regularizer, W_constraint, b_constraint, **kwargs)
253 self.initial_weights = weights
254 self.input = K.placeholder(ndim=4)
--> 255 super(Convolution2D, self).init(**kwargs)
256
257 def build(self):

/home/t-benorg/anaconda/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/layers/core.pyc in init(self, **kwargs)
49 self.set_input_shape(tuple(kwargs['batch_input_shape']))
50 elif 'input_shape' in kwargs:
---> 51 self.set_input_shape((None,) + tuple(kwargs['input_shape']))
52 self.trainable = True
53 if 'trainable' in kwargs:

/home/t-benorg/anaconda/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/layers/core.pyc in set_input_shape(self, input_shape)
155 self._input_shape = input_shape
156 self.input = K.placeholder(shape=self._input_shape)
--> 157 self.build()
158
159 @Property

/home/t-benorg/anaconda/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/layers/convolutional.pyc in build(self)
264 else:
265 raise Exception('Invalid dim_ordering: ' + self.dim_ordering)
--> 266 self.W = self.init(self.W_shape)
267 self.b = K.zeros((self.nb_filter,))
268 self.trainable_weights = [self.W, self.b]

/home/t-benorg/anaconda/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/initializations.pyc in he_normal(shape, name)
46 ''' Reference: He et al., http://arxiv.org/abs/1502.01852
47 '''
---> 48 fan_in, fan_out = get_fans(shape)
49 s = np.sqrt(2. / fan_in)
50 return normal(shape, s, name=name)

/home/t-benorg/anaconda/lib/python2.7/site-packages/Keras-0.3.2-py2.7.egg/keras/initializations.pyc in get_fans(shape)
5
6 def get_fans(shape):
----> 7 fan_in = shape[0] if len(shape) == 2 else np.prod(shape[1:])
8 fan_out = shape[1] if len(shape) == 2 else shape[0]
9 return fan_in, fan_out

/home/t-benorg/anaconda/lib/python2.7/site-packages/numpy/core/fromnumeric.pyc in prod(a, axis, dtype, out, keepdims)
2490 except AttributeError:
2491 return _methods._prod(a, axis=axis, dtype=dtype,
-> 2492 out=out, keepdims=keepdims)
2493 return prod(axis=axis, dtype=dtype, out=out)
2494 else:

/home/t-benorg/anaconda/lib/python2.7/site-packages/numpy/core/_methods.pyc in _prod(a, axis, dtype, out, keepdims)
33
34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):
---> 35 return umr_prod(a, axis, dtype, out, keepdims)
36
37 def _any(a, axis=None, dtype=None, out=None, keepdims=False):

ValueError: setting an array element with a sequence.

Simulation guidance

I've been using DragoNN to create homework problems for my bioinformatics class, and it has really excellent overall. There were two aspects of simulating data that confused me at first and could perhaps be modified to assist newcomers:

  1. I wasn't sure what valid motif names I could provide to the simulation functions or which would lead to interesting training datasets. I ended up enumerating simdna.simulations.loaded_motifs and generating plots for subsets of them with dragonn.plot.plot_motif. Do you have any alterative suggestions?
  2. Unlike the other simulation functions, simulate_multi_motif_embedding only returns positive instances. I suggest expanding the num_seqs parameter into num_pos and num_neg to be consistent with the other simulations. Per the tutorial figure I used a simple_motif_embedding for the negative instances with 'motif_name'=None, but I wasn't sure if that is what you'd recommend. I could make a pull request to implement this, but I'm not sure how exactly you'd want to change the function because simulate_heterodimer_grammar and simulate_differential_accessibility currently use the positives-only version of simple_motif_embedding.

Thanks for the great packages! I'm cc'ing @AvantiShri because these comments relate to simdna as much as DragoNN.

problem with interpret

Hi,
When I try to run the example interpretation I get an error (see below).
Any idea? something wrong with my installation? train and predict seems to be fine.

getting deeplift scores...
Traceback (most recent call last):
File "/root/miniconda2/bin/dragonn", line 9, in
load_entry_point('dragonn==0.1.3', 'console_scripts', 'dragonn')()
File "build/bdist.linux-x86_64/egg/dragonn/main.py", line 203, in main
File "build/bdist.linux-x86_64/egg/dragonn/main.py", line 164, in main_interpret
File "build/bdist.linux-x86_64/egg/dragonn/models.py", line 186, in deeplift
File "build/bdist.linux-x86_64/egg/deeplift/models.py", line 113, in get_target_contribs_func
File "build/bdist.linux-x86_64/egg/deeplift/models.py", line 234, in _get_func
File "build/bdist.linux-x86_64/egg/deeplift/models.py", line 49, in _get_func
File "build/bdist.linux-x86_64/egg/deeplift/blobs/core.py", line 177, in update_mxts
File "build/bdist.linux-x86_64/egg/deeplift/blobs/core.py", line 177, in update_mxts
File "build/bdist.linux-x86_64/egg/deeplift/blobs/core.py", line 177, in update_mxts
File "build/bdist.linux-x86_64/egg/deeplift/blobs/core.py", line 177, in update_mxts
File "build/bdist.linux-x86_64/egg/deeplift/blobs/core.py", line 177, in update_mxts
File "build/bdist.linux-x86_64/egg/deeplift/blobs/core.py", line 177, in update_mxts
File "build/bdist.linux-x86_64/egg/deeplift/blobs/core.py", line 177, in update_mxts
File "build/bdist.linux-x86_64/egg/deeplift/blobs/core.py", line 177, in update_mxts
File "build/bdist.linux-x86_64/egg/deeplift/blobs/core.py", line 177, in update_mxts
File "build/bdist.linux-x86_64/egg/deeplift/blobs/core.py", line 178, in update_mxts
File "build/bdist.linux-x86_64/egg/deeplift/blobs/core.py", line 384, in _update_mxts_for_inputs
File "build/bdist.linux-x86_64/egg/deeplift/blobs/convolution.py", line 668, in _get_mxts_increments_for_inputs
File "build/bdist.linux-x86_64/egg/deeplift/blobs/convolution.py", line 631, in _get_input_grad_given_outgrad
File "build/bdist.linux-x86_64/egg/deeplift/backend/theano_backend.py", line 236, in pool2d_grad
AttributeError: 'Pool' object has no attribute 'grad'

Installation problem with conda

Hi there, thanks a lot for this package. Unfortunately, I am not able to install it. When I try to run conda install dragonn -c kundajelab I get the following error message on my Windows PC:

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions The following specifications were found to be incompatible with your system:

  - feature:/win-64::__cuda==11.0=0
  - feature:|@/win-64::__cuda==11.0=0

Your installed version is: 11.0

I also tried to run the command on Linux and did get a similar error message but without any hint what may be the cause of the problem:

Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                              
UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

I already tried different CUDA versions to make sure that TensorFlow can work accordingly but the error still remains. I would really appreciate any advice on how I can get around this problem.

Workshop Tutorial example not working

Hello!

Recently I've installed DragoNN with the help of anaconda (as explained on the installation page).
Generally it looks working fine without any kind of errors during execution.

However, when I wanted worked through the workshop_tutorial.ipynb file from examples folder I've observed different results from the ones suggested in the notebook.
What I mean by this is that in notebook there are presented three different models for class predictions of simulated data.
First 2 models (one layer one filter; one layer multi filter) were mentioned as not so good, 3rd one (multi layer multi filter CNN) according to the notebook should give very good results.
However, when I trained that 3rd model the result was not different from the previous models. Particularly the train accuracy was growing, train loss was decreasing, but at same time validation parameters were not changing a lot they were randomly jumping around some fixed values.
After this I've redone training of the model several times, the result was not any better.

I haven't done any changes to the code.

Attaching screen of the train results:
dragonnscreen

Any ideas what can be wrong?
Thanks in advance!

SequenceDNN intuition

I've been working on porting the dragonn tutorial over to deepchem (deepchem/deepchem#979) and have run into a couple basic issues about shapes and convolutions.

At the first layer, the SequenceDNN performs a keras.layers.Conv2D(nb_filter=1, nb_row=4, nb_col=15) (assuming conv_width of 15 and nb_filter=1). The training data generated in the tutorial forms an array of shape (n_samples, 1, 4, 1000) (assuming seq_length=1000). I think what's happening is that the genome is being viewed as an image of shape (4, 1000) with 1 channel, and the convolution of shape (4,15) is moved over this image.

Is this summary right? If so, is the image_data_format option in Keras set to channels_first somewhere to account for the data channel being before the width/height in the array shape?

Python setup.py installation issue for bleeding edge dragonn

Hi,

I git cloned the dragonn repo in a clean virtual environment and ran "pip install --editable dragonn/", and received the following dependency error:

(dragonn) [kameronr@sh-112-07 ~]$ pip install --editable dragonn/
Obtaining file:///home/users/kameronr/dragonn
Requirement already satisfied: numpy>=1.9 in ./anaconda2_correct/envs/dragonn/lib/python2.7/site-packages (from dragonn==0.1.3)
Requirement already satisfied: keras==0.3.3 in ./anaconda2_correct/envs/dragonn/lib/python2.7/site-packages (from dragonn==0.1.3)
Collecting deeplift==0.5.1-theano (from dragonn==0.1.3)
Could not find a version that satisfies the requirement deeplift==0.5.1-theano (from dragonn==0.1.3) (from versions: )
No matching distribution found for deeplift==0.5.1-theano (from dragonn==0.1.3)

Running "python setup.py install" gives a bunch of issues too; the command ends with:

In file included from /lscratch/kameronr/easy_install-hE1SBn/h5py-2.7.0/h5py/defs.c:515:0:
/lscratch/kameronr/easy_install-hE1SBn/h5py-2.7.0/h5py/api_compat.h:27:18: fatal error: hdf5.h: No such file or directory
#include "hdf5.h"
^
compilation terminated.

Would be great to hear about a fix for this. Thanks!

Missing dependency?

When I worked through the tutorial on my machine, I received an error because the shapely package was not available. Should this be added as a required package in setup.py?

Everything else was great!

Error when Training Model

Hi, I'm following the structure of the training example:

dragonn train --pos-sequences examples/example_pos_sequences.fa --neg-sequences examples/example_neg_sequences.fa --prefix training_example

but replacing the example fasta files with my own fasta files. I get this error:

Screen Shot 2019-08-06 at 2 45 10 PM

i'm not sure why and am wondering if the sequences are expected to be labeled in a specific way?

best

Michelle

Roadmap for 0.1.3 release

DragoNN aims to democratize deep learning for genomics by providing resources to both teach and learn about deep learning for genomics. In a twitter poll last month, I asked deep learning practitioners what they need to get started in genomics: The most common answer, 29 out of 79 votes, was "tutorials". In an ongoing twitter poll, I ask computational biology faculty what they need to teach deep learning for genomics in their classes: The most common answer so far, 24 out of 59 votes, is "starter teaching material". The demand is clear and, in collaboration with Nvidia, we took a big step last week to address this demand by debuting an online Nvidia deep learning for genomics class using DragoNN.

For the 0.1.3 release I would like to provide the minimal starter teaching resources here to enable faculty to teach this topic in their classes. Below is an initial set of todos based on feedback so far:

  • Easier installation. A pypi release may help in this regard.
  • Automated build/image for cloud usage. This is available now on the GTC branch and needs to be merged.
  • DragoNN cloud instances for GCP/Azure.
  • Tutorials with reasonable runtime on a laptop without a GPU. This is available now on the GTC branch and needs to be merged (same one as in the Nvidia online course).
  • Tutorials that show deep learning succeeding where simpler models such as PWMs fail.
  • Tutorials with more figures that could be followed without the need for accompanying slides.

Additional suggestions are always welcome so please feel free to comment and discuss!

We would also love contributions of specific features and/or tutorials. It would be great to have a collection of exercises and tutorials here others could reuse.

dragonn running issue

Thanks a lot for making this great package ! I installed the package using "conda install dragonn -c kundajelab" and it did not report any error, but when I run the example

dragonn train --pos-sequences examples/example_pos_sequences.fa --neg-sequences examples/example_neg_sequences.fa --prefix training_example

I got the following error:

loading sequence data...
initializing model...
Using Theano backend.
ERROR (theano.gof.opt): Optimization failure due to: constant_folding
ERROR (theano.gof.opt): node: DimShuffle{}(TensorConstant{35})
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/home/hu/anaconda2/lib/python2.7/site-packages/theano/gof/opt.py", line 1772, in process_node
replacements = lopt.transform(node)
File "/home/hu/anaconda2/lib/python2.7/site-packages/theano/tensor/opt.py", line 5863, in constant_folding
no_recycling=[])
File "/home/hu/anaconda2/lib/python2.7/site-packages/theano/gof/op.py", line 978, in make_thunk
no_recycling)
File "/home/hu/anaconda2/lib/python2.7/site-packages/theano/gof/op.py", line 881, in make_c_thunk
output_storage=node_output_storage)
File "/home/hu/anaconda2/lib/python2.7/site-packages/theano/gof/cc.py", line 1200, in make_thunk
keep_lock=keep_lock)
...

It seems the error is from keras package, but I am not sure about it. Any suggestions about how to fix this problem ?
The python version I use is Python 2.7.13 :: Anaconda custom (64-bit)

Thanks a lot !

Trouble running training example

I installed dragonn through:

conda install dragonn -c kundajelab

and then ran the example:

dragonn train --pos-sequences examples/example_pos_sequences.fa --neg-sequences examples/example_neg_sequences.fa --prefix training_example

but got this error
Screen Shot 2019-08-06 at 12 11 14 PM

I tried following a previous post about the same issue and tried:

conda remove pydot-ng
pip install pydot-ng==1.0.0

but got this issue:
Screen Shot 2019-08-06 at 12 14 23 PM

thanks in advance!

Error in load exist model

Hello,
I met a error when I loaded exist keras model with DragoNN. The error reported that the model have not been compiled. I guess the code for loading in models.py may be following:

 if keras_model is not None and seq_length is None: 
    self.model = keras_model 
    self.num_tasks = keras_model.layers[-1].output_shape[-1]
elif seq_length is not None and keras_model is None:

If I add

self.model.compile(optimizer='adam', loss='binary_crossentropy')

before "elif", I can load the exist model without error log.

As a new user of keras and DragoNN, I am not sure whether I should add "compile" and what parameter should I take when I compiled the existed model.

Thanks!

conflict error

When I was installing DragonNN on my mac. Below error popped up:

UnsatisfiableError: The following specifications were found to be in conflict:

  • dragonn
  • wrapt

issue when running dragonn train...

Dear all,
When I run

dragonn train --pos-sequences examples/example_pos_sequences.fa --neg-sequences examples/example_neg_sequences.fa --prefix training_example

I have met some errors as below:

loading sequence data...
initializing model...
Using TensorFlow backend.
2017-11-10 11:37:29.950537: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-10 11:37:29.950571: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-10 11:37:29.950577: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-11-10 11:37:29.950581: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-10 11:37:29.950585: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
  File "/anaconda2/bin/dragonn", line 11, in <module>
    load_entry_point('dragonn==0.1.3', 'console_scripts', 'dragonn')()
  File "build/bdist.macosx-10.6-x86_64/egg/dragonn/__main__.py", line 203, in main
  File "build/bdist.macosx-10.6-x86_64/egg/dragonn/__main__.py", line 94, in main_train
  File "build/bdist.macosx-10.6-x86_64/egg/dragonn/models.py", line 112, in __init__
  File "/anaconda2/lib/python2.7/site-packages/Keras-0.3.3-py2.7.egg/keras/models.py", line 522, in compile
    train_loss = weighted_loss(self.y, self.y_train, self.weights, mask)
  File "/anaconda2/lib/python2.7/site-packages/Keras-0.3.3-py2.7.egg/keras/models.py", line 82, in weighted
    score_array = fn(y_true, y_pred)
  File "/anaconda2/lib/python2.7/site-packages/Keras-0.3.3-py2.7.egg/keras/objectives.py", line 40, in binary_crossentropy
    return K.mean(K.binary_crossentropy(y_pred, y_true), axis=-1)
  File "/anaconda2/lib/python2.7/site-packages/Keras-0.3.3-py2.7.egg/keras/backend/tensorflow_backend.py", line 606, in binary_crossentropy
    return tf.nn.sigmoid_cross_entropy_with_logits(output, target)
  File "/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/nn_impl.py", line 147, in sigmoid_cross_entropy_with_logits
    _sentinel, labels, logits)
  File "/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1562, in _ensure_xent_args
    "named arguments (labels=..., logits=..., ...)" % name)
ValueError: Only call `sigmoid_cross_entropy_with_logits` with named arguments (labels=..., logits=..., ...)

Anyone who has the same problem? Please help me. Thank you

Anaconda installation ships with incorrect keras installation

Hi,

I installed dragonn with "conda install dragonn -c kundajelab", but I got the following error message upon trying to train the example model:

Traceback (most recent call last):
File “/Library/Frameworks/Python.framework/Versions/2.7/bin/dragonn”, line 6, in
from pkg_resources import load_entry_point
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/init.py”, line 2953, in
@_call_aside
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/init.py”, line 2939, in _call_aside
f(*args, **kwargs)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/init.py”, line 2966, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/init.py”, line 637, in _build_master
return cls._build_from_requirements(requires)
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/init.py”, line 650, in _build_from_requirements
dists = ws.resolve(reqs, Environment())
File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/init.py”, line 829, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The ‘keras==0.3.3’ distribution was not found and is required by dragonn

Looks like keras v0.3.2 ships with the install but keras v0.3.3 is necessary for training.

Thanks,
Rajiv

workshop_tutorial (.pngs won't load on vpn)

This is trivial, but perhaps important for distribution to a wide audience:

When running the workshop notebook on a vpn (or at least mine) the images cannot be loaded:

Ex: http://mitra.stanford.edu/kundaje/jisraeli/dragonn/play_button.png displays as a broken image while on the vpn. But I can load open that page in a browser when not on vpn

solution? The .png's you are using are pretty small. Perhaps just include them in $DRAGONN_HOME/Examples instead of linking them?

Installation Problem

Hello, first of all thank you for the package it looks very interesting, However, I get different type of library conflicts errors.

-First, I tried to clone your repository and install it through python setup.py install on my windows computer, but the dependencies are taking days to download and I got the following error message:

Getting requirements to build wheel ... error
ERROR: Command errored out with exit status 1:
command: 'C:\Users\bella\.conda\envs\dragonn\python.exe' 'C:\Users\bella\.conda\envs\dragonn\lib\sitepackages\pip\_vendor\pep517\in_process\_in_process.py' get_requires_for_build_wheel 'C:\Users\bella\AppData\Local\Temp\tmp1eh_0is4'
cwd: C:\Users\bella\AppData\Local\Temp\pip-install-m92jpbsc\pyproj_823eff2ba1974d2ebc43fcc65d092a2a
Complete output (1 lines):
Proj executable not found. Please set PROJ_DIR variable.
 ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/73/ef/53a7e9e98595baf4d7212aa731fcec256b432a3db60a55b58a027a4d9d47/pyproj-2.2.0.tar.gz#sha256=0a4f793cc93539c2292638c498e24422a2ec4b25cb47545addea07724b2a56e5 (from https://pypi.org/simple/pyproj/). Command errored out with exit status 1: 'C:\Users\bella\.conda\envs\dragonn\python.exe' 'C:\Users\bella\.conda\envs\dragonn\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py' get_requires_for_build_wheel 'C:\Users\bella\AppData\Local\Temp\tmp1eh_0is4' Check the logs for full command output.

INFO: pip is looking at multiple versions of pyparsing to determine which version is compatible with other requirements. This could take a while.
Collecting pyparsing>=2.0.1
Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking

-When I try to install through conda install dragonn -c kundajelab I get the following error message on my Windows PC:

UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versions The following specifications were found to be incompatible
  • I also tried with a Manjaro Linux computer without success, still getting library/dependencies related errors.

I was wondering if anyone had run the package recently or if you had any newer version. I would really appreciate if you had any advice on how to solve the problem if possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.