Giter Club home page Giter Club logo

dorina's Introduction

Build Status Coverage Status

doRiNA

Database of posttranscriptional regulatory elements and function

Installation

First, install bedtools, we recommend the latest release:

wget https://github.com/arq5x/bedtools2/archive/master.zip -O bedtools.zip
tar -xf bedtools.zip
cd bedtools2-master
make

Then, create a virtual environment and install doRiNA from source:

python3 -m venv dorina-dev
source activate dorina-dev
git clone https://github.com/dieterich-lab/dorina.git
cd dorina
pip install -r requirements.txt -r test_requirements.txt
pip install .

For doRiNA development, please use Python version > 3.4.

Usage

Dorina requires the files to be setup locally. Let /path/to/datasets/ be the the setup path. /path/to/datasets/ should contain :

  • /path/to/datasets/regulators
  • /path/to/datasets/genomes

Both sub-directories have the same structure:

  • /path/to/datasets/genomes/{organism}/{assembly}/
  • /path/to/datasets/regulators/{organism}/{assembly}/

Regulators can be obtained from (here)[http://dorina.mdc-berlin.de/regulators]. Each bed file should be accompanied with metadata file with the same name, but the .json extension:

{ "id": "d_melanogaster", "label": "Fly", "scientific": "Drosophila melanogaster", "weight": 3}

The genome directory contains genome annotation separated into several regions for further filtering.

Given the directory structure is correct, the following command should retrieve the miR-1247 regulators of human genome assembly hg19:

dorina run 'hg19' --seta 'hsa-miR-1247|CLASH' -p /path/to/datasets/ > miR-1247.bed

To list the avaiable data sources, use:

dorina genomes -p /path/to/datasets/ | less 
dorina regulators -p /path/to/datasets/ | less

Building an assembly

Please see here

Supported dataset

Regulatory

  • RNA Binding Proteins obtained by cross-linking and immunoprecipitation derived experiments (CLIP)
  • miRNA–RNA interactions obtained by clash experiments
  • Predicted RNA interactions

Expression

Dorina can filter the regulatory datasets by expression

Variation

Retrieves single nucleotide variants co-occurring with regulatory elements

License

dorina is licensed under the GNU General Public Licence (GPL) version 3. See LICENSE file for details.

dorina's People

Contributors

kblin avatar rekado avatar tbrittoborges avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

bimsbbioinfo

dorina's Issues

Pypy support

Pypy is an alternative for Just In Time compilation of Python code - that does not require any changes in pure Python code to work. Cython and C extension may require work.

I anticipate that Pybeedtools is not Pypy compatible but I will look into this issue further. Also, Pypy does not support Pandas and Numpy support is rather limited.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 510: ordinal not in range(128)

Upon starting apache2 on carina-dev, and attempting to access the website, access failed with an error (in /var/log/apache2/error.log) as follows:

Fri Jan 19 13:48:04.425583 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512] mod_wsgi (pid=25443): Target WSGI script '/home/carina/webdorina/webdorina/webdorina.wsgi' cannot be loaded as Python module.
[Fri Jan 19 13:48:04.425711 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512] mod_wsgi (pid=25443): Exception occurred processing WSGI script '/home/carina/webdorina/webdorina/webdorina.wsgi'.
[Fri Jan 19 13:48:04.425785 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512] Traceback (most recent call last):
[Fri Jan 19 13:48:04.426417 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]   File "/home/carina/webdorina/webdorina/webdorina.wsgi", line 16, in <module>
[Fri Jan 19 13:48:04.426475 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]     from webdorina.app import app as application
[Fri Jan 19 13:48:04.426937 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]   File "/home/carina/webdorina/webdorina/app.py", line 41, in <module>
[Fri Jan 19 13:48:04.426997 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]     Regulator.init(app.config['DATA_PATH'])
[Fri Jan 19 13:48:04.427659 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]   File "/usr/local/lib/python3.4/dist-packages/dorina/regulator.py", line 65, in init
[Fri Jan 19 13:48:04.427722 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]     parse_func)
[Fri Jan 19 13:48:04.427957 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]   File "/usr/local/lib/python3.4/dist-packages/dorina/utils.py", line 33, in walk_assembly_tree
[Fri Jan 19 13:48:04.428087 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]     species_dict[assembly] = parse_func(assembly_path)
[Fri Jan 19 13:48:04.428151 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]   File "/usr/local/lib/python3.4/dist-packages/dorina/regulator.py", line 55, in parse_func
[Fri Jan 19 13:48:04.428304 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]     experiments = parse_experiment(experiment_path)
[Fri Jan 19 13:48:04.428372 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]   File "/usr/local/lib/python3.4/dist-packages/dorina/regulator.py", line 32, in parse_experiment
[Fri Jan 19 13:48:04.428718 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]     return json.load(fh)
[Fri Jan 19 13:48:04.429431 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]   File "/usr/lib/python3.4/json/__init__.py", line 265, in load
[Fri Jan 19 13:48:04.429514 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]     return loads(fp.read(),
[Fri Jan 19 13:48:04.429729 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]   File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
[Fri Jan 19 13:48:04.429855 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512]     return codecs.ascii_decode(input, self.errors)[0]
[Fri Jan 19 13:48:04.429918 2018] [wsgi:error] [pid 25443:tid 140426388211456] [remote 129.206.148.116:512] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 510: ordinal not in range(128)

Support for new data types

Add support for the following types of data:

  • Polymorphism
  • Gene Expression
  • ENCODE eCLIP and Bind-n-seq data
  • Ribo-Seq
  • Proteomics

Python 3.6 support

Python 3.6 is the latest stable version of Python programming language. It's more secure and generally faster than Python 2.7.
This issue aims to refactor dorina and webdorina code for py36 without dropping py27 support.

Centralised logging

Both Dorina and webdorina lack a centralised logging system and the code is debugged with print statements.

Data update

Carina next milestone shifts the database focus on regulatory elements and RNA interaction in the Cardiac system and cardiovascular disease.

The shift will include literature systematic review for new data sets. Due to a large number of data sets and scientific articles on Cardiology, a semi-automated curation is recommended. The tool should first sift thought papers and databases for potential targets, which should be human curated.

Cross-species comparisons

Cross-species comparison is a relevant feature to the study of cardiovascular diseases. Carina should support:

  • Gene trees
  • RNA-RNA alignments Visualization

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.