Giter Club home page Giter Club logo

rbp-maps's Introduction

RBP Maps

RBP splice and feature maps

This has been tested on (requirements):

Module Version
pandas 0.20.1
pybedtools 0.7.8
bedtools 2.26.0
pysam 0.8.4
samtools 1.3.1
pyBigWig 0.3.5
matplotlib 2.0.2
seaborn 0.8
jupyter 4.2.0 (if you want to import)
cwltool 1.0.20170828135420 (if you want to use as a CWL tool)
tqdm 4.19.5
numpy 1.12.1
scipy 0.19.1

Installation:

Create the environment:

git clone https://github.com/yeolab/rbp-maps
cd rbp-maps;
conda env create -f conda_env.txt -n rbp-maps
source activate rbp-maps

Then, install:

cd rbp-maps;
python setup.py build
python setup.py install

Docker:

docker pull brianyee/rbp-maps

Usage:

Plotting density (*.bw files from the eCLIP bioinformatics pipeline)

plot_map --ip ip.bam \ # BAM file containing reads of your CLIp (make sure the .pos.bw and .neg.bw files are in this directory)
 --ip_pos_bw \ # positive bigwig file for CLIp
 --ip_neg_bw \ # negative bigwig file for CLIp
 --input input.bam \ # BAM file containing reads for size matched input (make sure the .pos.bw and .neg.bw files are in this directory)
 --input_pos_bw \ # positive bigwig file for INPUT
 --input_neg_bw \ # negative bigwig file for INPUT
 --annotations rmats_annotation1.JunctionCountOnly.txt rmats_annotation2.JunctionCountOnly.txt rmats_annotation3.JunctionCountOnly.txt \ # annotation files
 --annotation_type rmats rmats rmats \ # specifies the type of file for each of the above annotations (either 'rmats' or 'miso' options are supported)
 --output rbfox2.svg \ # either an 'svg' or 'png' file works
 --event se \ # can be either: 'se' (skipped exons), 'a3ss' (alternative 3' splice site), or 'a5ss' (alternative 5' splice site)
 --normalization_level 1 \ # numeric "code" used to determine the kind of normalization to output (see below)
 --testnums 0 1 \
 --bgnum 2 \
 --sigtest permutation

Plotting peaks (*.compressed.bed files from the eCLIP bioinformatics pipeline)

plot_map --peak peak.bb \  # peaks file as a bigbed
 --annotations rmats_annotation1.JunctionCountOnly.txt rmats_annotation2.JunctionCountOnly.txt rmats_annotation3.JunctionCountOnly.txt \ # annotation files
 --annotation_type rmats rmats rmats \ # specifies the type of file for each of the above annotations (either 'rmats' or 'miso' options are supported)
 --output rbfox2.svg \ # either an 'svg' or 'png' file works
 --event se # can be either: 'se' (skipped exons), 'a3ss' (alternative 3' splice site), or 'a5ss' (alternative 5' splice site)
 --normalization_level 0 \ # numeric "code" used to determine the kind of normalization to output (see below)
 --testnums 0 1 \
 --bgnum 2 \
 --sigtest fisher

Using a background & calculating significance.

In our above example, we've set a few optional parameters that you can set to determine significance given an optional background dataset.

  • --normalization_level 0: Just plot the IP density. If using normalized peaks, use this option to skip any more normalization (just report the peak overlaps).
  • --normalization_level 1 (default): Plot the IP density minus its input density
  • --normalization_level 2: Plot the Entropy-normalized IP over its input density
  • --normalization_level 3: Just plot the Input density
  • --bgnum 2: 0-based number of the background file (in this example, we use 2 to designate our 3rd file (rmats_annotation3.JunctionCountOnly.txt) as our background model.
  • --testnums 0 1: the 0-based number of the filenames of the test conditions (ie. rmats_annotation1.JunctionCountOnly.txt and rmats_annotation2.JunctionCountOnly.txt)
  • --sigtest permutation: By default, that setting is ‘permutation’, in which case we randomly sample from the background sets (typically the ‘native SE’ set, though you can set this to be other things) and then use the confidence interval from that permutation to draw confidence bounds around that native SE curve, and then the significance is calculated based on those permutation values. If this setting is set to "ks", "fisher", "zscore", or "mannwhitneyu" , then the significance between the curves is done using the specified test, and the confidence bounds are instead done as the standard error of the alt included or alt excluded events. Currently, only "fisher" is implemented for peak-based rbp-maps.

Links to files

You can refer to the 'examples/' directory for usage. These examples refer to BAM and BigWig files that can be downloaded from encodeproject.org

We also provide the script used to raw rMATS (hg19) outputs (based on inclusion junction count as described in paper). Here is an example commandline for filtering SE events from a file "SE.MATS.JunctionCountOnly.txt":

subset_jxc -i SE.MATS.JunctionCountOnly.txt \
-o SE.MATS.JunctionCountOnly.nr.txt \
-e se
  • Direct link to these rMATS (hg19) files, the "significant.nr" files are filtered for significance (PValue and IncLevelDifference <= 0.05, FDR <= 0.1) and overlapping event removal. "Positive" and "negative" files refer to files split by IncLevelDifference.
Other Options

--exon_offset: (untested) controls how many bases into an exon you would like to plot (default 50 bases)

--intron_offset: (untested) controls how many bases into an intron you would like to plot (default 300 bases)

--confidence: For each position, keep only this fraction of events to reduce noise caused by outliers (default 0.95)

Example Outputs

Skipped Exon

skippedexon

Alternative 3' Splice Sites

alt3prime

Alternative 5' Splice Sites

alt5prime

Retained Intron

retained

Intermediate files produced

The program will try and create as many intermediate files so you can do more downstream analysis, or plot your own maps, and things.

Other Notes

  • The script will automatically create intermediate raw and normalized matrix files for every condition you provide... the files can get big!! but they can be loaded into pandas if you wanted to look at a few events. They're comma separated.

  • At least for ENCODE, we set a cutoff of a minimum 100 events (rmats annotation file should have at least 100 lines), otherwise the signal will look messy

  • Interactive nodes are preferred, for annotations with a ton of events TSCC will run out of memory. I think it's fine for a few hundred thousand events or so, but I've tried with 700k and it didn't go over so well...

Publication

Alt Text

rbp-maps's People

Contributors

byee4 avatar j9dm avatar

Stargazers

marialui avatar mkkk avatar  avatar yanhong hong avatar Hang Chen avatar Miquel Anglada Girotto avatar Chang Y avatar Ken Chen avatar Sam Bryce-Smith avatar Hongjiang Liu avatar SimonY avatar Manoj Kumar Valluru avatar Geng Lee avatar  avatar slp avatar  avatar Arya avatar Emanuele Fumagalli avatar  avatar Ran Zhou avatar Clarence Mah avatar Fan Zheng avatar

Watchers

Alain Domissy avatar James Cloos avatar Gregor Rot avatar gene yeo avatar Boyko Kakaradov avatar  avatar Emily Wheeler avatar  avatar

rbp-maps's Issues

error with --event a3ss

command:

/home/bay001/projects/codebase/rbp-maps/maps/plot_density.py --ip /projects/ps-yeolab3/clip_not_encode/ecwheele/20170614_SRSF2_eirini_all_clip/pipeline_output_v1/EW34_SRSF2_n212_diff_IP.merged.r2.bam --input /projects/ps-yeolab3/clip_not_encode/ecwheele/20170614_SRSF2_eirini_all_clip/pipeline_output_v1/EW_diff_input_master_R1.adapterTrim.round2.rmRep.rmDup.sorted.r2.bam --annotations /projects/ps-yeolab3/ecwheele/20170627_srsf2_all_rnaseq_eirini/rmats/wt_vs_mut_diff_included_SE.txt /projects/ps-yeolab3/ecwheele/20170627_srsf2_all_rnaseq_eirini/rmats/wt_vs_mut_diff_skipped_SE.txt /projects/ps-yeolab3/ecwheele/20170627_srsf2_all_rnaseq_eirini/rmats/wt_vs_mut_diff_unchanged_SE.txt --annotation_type rmats rmats rmats --output /projects/ps-yeolab3/ecwheele/20170627_srsf2_all_rnaseq_eirini/rmats/rbpmaps_encode/se_entropy/EW34_SRSF2_n212_diff_IP.svg --event a3ss --normalization_level 1 --chrom_sizes /projects/ps-yeolab/genomes/hg19/hg19.chrom.sizes

Error:

  File "/home/bay001/projects/codebase/rbp-maps/maps/plot_density.py", line 394, in <module>
    main()
  File "/home/bay001/projects/codebase/rbp-maps/maps/plot_density.py", line 390, in main
    is_scaled, confidence, annotation_dict, files_to_test, background_file
  File "/home/bay001/projects/codebase/rbp-maps/maps/plot_density.py", line 132, in run_make_density
    map_obj.create_matrices()
  File "/home/bay001/processing_scripts/codebase/rbp-maps/maps/density/Map.py", line 721, in create_matrices
    annotation_type=filetype
  File "/home/bay001/processing_scripts/codebase/rbp-maps/maps/density/matrix.py", line 446, in alt_3p_splice_site
    three_upstream = pd.DataFrame(three_upstream).T
  File "/home/ecwheele/anaconda2/envs/rbp-maps/lib/python2.7/site-packages/pandas/core/frame.py", line 266, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/home/ecwheele/anaconda2/envs/rbp-maps/lib/python2.7/site-packages/pandas/core/frame.py", line 402, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/home/ecwheele/anaconda2/envs/rbp-maps/lib/python2.7/site-packages/pandas/core/frame.py", line 5398, in _arrays_to_mgr
    index = extract_index(arrays)
  File "/home/ecwheele/anaconda2/envs/rbp-maps/lib/python2.7/site-packages/pandas/core/frame.py", line 5446, in extract_index
    raise ValueError('arrays must all be same length')
ValueError: arrays must all be same length```

asked for help about 'TypeError' when running plot_map

Dear developers, I tried to use rbp-maps to plot splicing maps of specific RBPs. But I got an error about 'TypeError' and don't know how to resolve this.
Specifically, I installed rbp-maps by conda following the suggestions on GitHub. When I run the following code,

plot_map --inputbam rbp-control-1.bam \ 
--ip_pos_bw rbp-control-plus-1.bigWig \ 
--ip_neg_bw rbp-control-minus-1.bigWig \ 
--inputbam rbp-knockdown-1.bam \ 
--input_pos_bw rbp-knockdown-plus-1.bigWig \ 
--input_neg_bw rbp-knockdown-minus-1.bigWig \ 
--annotations rbp-rmats-output/se.included.upon.knockdown.txt rbp-rmats-output/se.excluded.upon.knockdown.txt \ 
--annotation_type rmats rmats \ 
--output rbp.svg

I got the following error,

Traceback (most recent call last):
  File "/.conda/envs/rbp-maps/bin/plot_map", line 11, in <module>
    load_entry_point('rbp-maps==0.1.4', 'console_scripts', 'plot_map')()
  File "build/bdist.linux-x86_64/egg/maps/plot_map.py", line 574, in main
  File "build/bdist.linux-x86_64/egg/maps/plot_map.py", line 292, in check_for_index
  File "/.conda/envs/rbp-maps/lib/python2.7/genericpath.py", line 26, in exists
    os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found

So it seems that I missed some nessesary file paths, I guess this error may be caused by the miss of native_cassette_exons_all,natively_included_cassette_exons,natively_excluded_cassette_exons,native_cassette_exons_all.
But I got the same error when I added these files into my command:

plot_map --inputbam rbp-control-1.bam \ 
--ip_pos_bw rbp-control-plus-1.bigWig \ 
--ip_neg_bw rbp-control-minus-1.bigWig \ 
--inputbam rbp-knockdown-1.bam \ 
--input_pos_bw rbp-knockdown-plus-1.bigWig \ 
--input_neg_bw rbp-knockdown-minus-1.bigWig \ 
--annotations rbp-rmats-output/se.included.upon.knockdown.txt rbp-rmats-output/se.excluded.upon.knockdown.txt \ 
k562.constitutive.exons.txt \ 
k562.natively.included.cassette.exons.txt \ 
k562.natively.excluded.cassette.exons.txt \ 
k562.natively.cassette.exons.all.txt \ 
--annotation_type rmats rmats tab tab tab tab\ 
--output rbp.svg

So this error is not introduced by missing files, I have no ideas about how to resove it. In addition, whether the data using to plot the splicing map can be exported? So I can plot by ggplot2. Thanks for any help you can provide.

heatmap is empty with se entropy normalization

Command:

/home/bay001/projects/codebase/rbp-maps/maps/plot_density.py --ip /projects/ps-yeolab3/clip_not_encode/ecwheele/20170614_SRSF2_eirini_all_clip/pipeline_output_v1/EW34_SRSF2_n212_diff_IP.merged.r2.bam --input /projects/ps-yeolab3/clip_not_encode/ecwheele/20170614_SRSF2_eirini_all_clip/pipeline_output_v1/EW_diff_input_master_R1.adapterTrim.round2.rmRep.rmDup.sorted.r2.bam --annotations /projects/ps-yeolab3/ecwheele/20170627_srsf2_all_rnaseq_eirini/rmats/wt_vs_mut_diff_included_SE.txt /projects/ps-yeolab3/ecwheele/20170627_srsf2_all_rnaseq_eirini/rmats/wt_vs_mut_diff_skipped_SE.txt /projects/ps-yeolab3/ecwheele/20170627_srsf2_all_rnaseq_eirini/rmats/wt_vs_mut_diff_unchanged_SE.txt --annotation_type rmats rmats rmats --output /projects/ps-yeolab3/ecwheele/20170627_srsf2_all_rnaseq_eirini/rmats/rbpmaps_encode/se_entropy/EW34_SRSF2_n212_diff_IP.svg --event se --normalization_level 2 --chrom_sizes /projects/ps-yeolab/genomes/hg19/hg19.chrom.sizes

output looks like this:

screen shot 2017-08-14 at 12 27 46 pm

plot_map fails if bam files are not indexed

  File "/home/ecwheele/anaconda2/bin/plot_map", line 11, in <module>
    load_entry_point('rbp-maps==0.1.4', 'console_scripts', 'plot_map')()
  File "build/bdist.linux-x86_64/egg/maps/plot_map.py", line 573, in main
  File "build/bdist.linux-x86_64/egg/maps/plot_map.py", line 301, in check_for_index
NameError: global name 'call' is not defined```

density won't import in jupyter notebook

After running the installs (python setup.py build and python setup.py install) the density module won't import in a jupyter notebook. It will NOT import in the ipython interpreter line. It will import in the python interpreter. Which plot_density points to: ~/anaconda2/envs/rbp-maps/bin/plot_density

Adding unchanging exon events to the SE maps

Is there a way to add a third list of exons to the SE maps? Ones that are not changed upon knockdown?

I am using this code:

annotations = {
    included_in_wt:included_in_wt_annotation_type, 
    included_in_mut:included_in_mut_annotation_type
}

ip_rd = ReadDensity.ReadDensity(pos=pos_ip_bw,neg=neg_ip_bw,bam=ip_bam)
in_rd = ReadDensity.ReadDensity(pos=pos_in_bw,neg=neg_in_bw,bam=inp_bam)

rbpmap = Map.SkippedExon(
    ip=ip_rd, 
    inp=in_rd, 
    output_filename=output_file,
    norm_function=norm_func,
    annotation=annotations,
    intron_offset=300,
    exon_offset=50,
)

rbpmap.create_matrices()
print('.'),
rbpmap.normalize_matrix()
print('.'),
rbpmap.set_means_and_sems()
print('.'),
rbpmap.write_intermediate_raw_matrices_to_csv()
print('.'),
rbpmap.plot()
print('DONE'),```

"AttributeError: Peak instance has no attribute 'peaks' " when plot peak overlaps instead of read density

Hi, I'm trying to make metagene plot described in this example: https://github.com/YeoLab/rbp-maps/blob/c529fb694fadc130537d84c295a66723edc14ab3/examples/metagene/run_metagene_551_01.sh.

Here is my scripts

CDS=/data/DDX6_eClip/11_metaGene/gencode.v37.basic.annotation_201.eCLIP_upProtein.CDS.bed
UTR5=/data/DDX6_eClip/11_metaGene/gencode.v37.basic.annotation_201.eCLIP.fiveUTRs.bed
UTR3=/data/DDX6_eClip/11_metaGene/gencode.v37.basic.annotation_201.eCLIP.threeUTRs.bed

peak=/data/DDX6_eClip/combineRep1Rep2.compressed.bed.sorted.bb

plot_map \
--peak ${peak} \
--event metagene \
--normalization_level 0 \
--annotations ${CDS} ${UTR3} ${UTR5} \
--annotation_type cds 3utr 5utr \
--output CLIP_gene.metagene.svg

The peak file is sorted bigbed format. The following is the error message:
[bwHdrRead] There was an error while reading in the header! [pyBwOpen] bw is NULL! 0%| | 0/41 [00:00<?, ?it/s]Traceback (most recent call last): File "/work/bio-wangmr/.conda/envs/rbp-map/bin/plot_map", line 11, in <module> load_entry_point('rbp-maps==0.1.4', 'console_scripts', 'plot_map')() File "build/bdist.linux-x86_64/egg/maps/plot_map.py", line 548, in main File "build/bdist.linux-x86_64/egg/maps/plot_map.py", line 104, in run_make_peak File "build/bdist.linux-x86_64/egg/density/Map.py", line 1076, in create_matrices File "build/bdist.linux-x86_64/egg/density/matrix.py", line 184, in meta File "build/bdist.linux-x86_64/egg/density/intervals.py", line 774, in generic_site File "build/bdist.linux-x86_64/egg/density/Peak.py", line 100, in values AttributeError: Peak instance has no attribute 'peaks'

Thanks!

CondaValueError: invalid package specification: name: rbp-maps

I ran the following commands, as suggested for installation:
git clone https://github.com/yeolab/rbp-maps cd rbp-maps; conda env create -f conda_env.txt -n rbp-maps
But received the following error message:
CondaValueError: invalid package specification: name: rbp-maps

In conda_env.txt, the line that causes the error is attempting a pip installation of rbp-maps==0.0.6, I believe.

UnboundLocalError: local variable 'interval' referenced before assignment

I was able to install rbp-maps using Docker using the command docker pull brianyee/rbp-maps:a5c9bd9 (but not docker pull brianyee/rbp-maps). However, I was not able to successfully run plot_map using either --ipbam for plotting density or --peak for plotting peaks.

For HPC, I need to use singularity, and this was one of my calls:
singularity run rbpmaps.sif plot_map --peak <path_to_bigbed> --annotations <path_to_SE.MATS.JC.nonoverlap.txt> <path_to_HepG2_native_cassette_exons_all> --annotation_type rmats rmats --output <path_to_png> --event se --normalization_level 0 --testnums 0 --bgnum 1 --sigtest fisher

I received the following message:
Traceback (most recent call last):
File "/opt/conda/bin/plot_map", line 11, in
load_entry_point('rbp-maps==0.1.4', 'console_scripts', 'plot_map')()
File "/opt/conda/lib/python2.7/site-packages/rbp_maps-0.1.4-py2.7.egg/maps/plot_map.py", line 548, in main
File "/opt/conda/lib/python2.7/site-packages/rbp_maps-0.1.4-py2.7.egg/maps/plot_map.py", line 104, in run_make_peak
File "/opt/conda/lib/python2.7/site-packages/rbp_maps-0.1.4-py2.7.egg/density/Map.py", line 701, in create_matrices
File "/opt/conda/lib/python2.7/site-packages/rbp_maps-0.1.4-py2.7.egg/density/matrix.py", line 562, in skipped_exon
UnboundLocalError: local variable 'interval' referenced before assignment

Coordinate files

Dear authors, thank you for this valuable resource!

Are the coordinates in files
examples/data/positive.a3ss.txt
examples/data/positive.a5ss.txt
examples/data/positive.mxe.txt
examples/data/positive.ri.txt
examples/data/positive.se.txt

mapped to the hg38 genome? It would make sense to mention this somewhere for future.

Kind regards,
Martin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.