Giter Club home page Giter Club logo

deq's Introduction

DEQ

DEteQtion of changes in m6a. An R package to conveniently run DESeq2, edgeR, and QNB for the detection of differential methylation in MeRIP/m6A-seq data.

Installation

Install in R using devtools::install_github("al-mcintyre/DEQ")

May need to import S4Vectors separately using library(S4Vectors)

Need to install dependencies separately for now.

BiocManager::install("DESeq2")
BiocManager::install("edgeR")
BiocManager::install("GenomicFeatures")
BiocManager::install("Rsamtools")

QNB is no longer supported on CRAN, and requires exomePeak. It can be installed from source.

BiocManager::install("exomePeak")
install.packages("https://cran.r-project.org/src/contrib/Archive/QNB/QNB_1.1.11.tar.gz", repos = NULL, type="source")

This package has been tested on MacOS (v10.13.6) and Linux (Red Hat Enterprise Linux 6.3), using R v3.6.0 and v3.6.1. Typical install time < 1 min

Running DEQ

Run using

deq(input.bams, ip.bams, treated.input.bams, treated.ip.bams, peak.files,
  gtf, paired.end = FALSE, outfi = "deq_results.txt", tool = "deq",
  compare.gene = TRUE, readlen = 100, fraglen = 100, nthreads = 1)

(see ?deq for details)

Output:

  1. Counts file: number of reads per peak for each input bam file (labelled as control or treatment replicates)
  2. Results file: chromosomal locations and annotations for peaks, along with significance predictions for any tools run (with and without p value adjustment), and log2 fold changes for IP reads within peaks (peak.l2fc) and input reads for associated genes (gene.l2fc) calculated using DESeq2, significance for gene expression changes between conditions from DESeq2, and the difference between peak.l2fc and gene.l2fc (diff.l2fc)

Demo data is provided. After downloading, run

input.bams <- c('untreated_total_input_1.chr21.star.sorted.bam','untreated_total_input_2.chr21.star.sorted.bam')
ip.bams <- c('untreated_total_IP_1.chr21.star.sorted.bam','untreated_total_IP_2.chr21.star.sorted.bam')
treated.input.bams <- c('heatshock_total_input_1.chr21.star.sorted.bam','heatshock_total_input_2.chr21.star.sorted.bam')
treated.ip.bams <- c('heatshock_total_IP_1.chr21.star.sorted.bam','heatshock_total_IP_2.chr21.star.sorted.bam')
peak.files <- 'peaks.bed'
#gtf <- location of hg38 gtf file
deq(input.bams, ip.bams, treated.input.bams, treated.ip.bams, peak.files,
  gtf, paired.end = FALSE, outfi = "deq_results.txt", tool = "deq",
  compare.gene = TRUE, readlen = 50, fraglen = 100, nthreads = 1)

from the appropriate folder (or adjust file paths). Must provide a gtf file for hg38. Run-time should be ~5 minutes for this demo.

Further scripts to generate figures for our bioRxiv paper are available here.

deq's People

Contributors

al-mcintyre avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deq's Issues

Error: logical subscript contains NAs (looks like could not annotate peak locations)

The program finished with an error:

Error: logical subscript contains NAs
In addition: Warning messages: (omitted)

After some additional investigation, this might be related to the genome/gtf file. The dataset is from mouse and I am using the gtf file from ensembl (mm10.ensGene.gtf). The the initial variables of the peaks object look like this:

$peaks
GRanges object with 44562 ranges and 11 metadata columns:
seqnames ranges strand | name exon
<Rle> <IRanges> <Rle> | <character> <character>
peak1 1 3215477-3215830 + | peak1 <NA>
peak2 1 3221883-3222122 + | peak2 <NA>
peak3 1 3242487-3242763 + | peak3 <NA>
peak4 1 3243106-3243396 + | peak4 <NA>
peak5 1 3250698-3250935 + | peak5 <NA>
... ... ... ... . ... ...
peak44558 Y 90742797-90743482 + | peak44558 <NA>
peak44559 Y 90743894-90744093 + | peak44559 <NA>
peak44560 Y 90793317-90793756 + | peak44560 <NA>
peak44561 Y 90811178-90812825 + | peak44561 <NA>
peak44562 Y 90816202-90816427 + | peak44562 <NA>
intron utr5 utr5* utr3 utr3*
<character> <character> <character> <character> <character>
peak1 <NA> <NA> <NA> <NA> \NA>
peak2 <NA> <NA> <NA> <NA> <NA>
peak3 \ <NA> <NA> <NA> <NA>
peak4 <NA> <NA> <NA> <NA> <NA>
peak5 <NA> <NA> <NA> <NA> <NA>
... ... ... ... ... ...
peak44558 <NA> <NA> <NA> <NA> <NA>
peak44559 \ <NA> <NA> <NA> <NA>
peak44560 <NA> <NA> <NA> <NA> <NA>
peak44561 <NA> <NA> <NA> <NA> <NA>
peak44562 \ \ <NA> <NA> <NA>
intergenic annot main.gene main.gene.id
<logical> <character> <logical> <logical>
peak1 TRUE intergenic <NA> <NA>
peak2 TRUE intergenic <NA> <NA>
peak3 TRUE intergenic <NA> \NA>
peak4 TRUE intergenic <NA> <NA>
peak5 TRUE intergenic <NA> <NA>
... ... ... ... ...
peak44558 TRUE intergenic <NA> <NA>
peak44559 TRUE intergenic <NA> <NA>
peak44560 TRUE intergenic <NA> <NA>
peak44561 TRUE intergenic <NA> <NA>
peak44562 TRUE intergenic <NA> <NA>


seqinfo: 22 sequences from an unspecified genome; no seqlengths

$overlaps
[1] feature name gene_name
<0 rows> (or 0-length row.names)

So it looks like it could not match the location of a peak with a particular gene.

For completion, the top of the following variables look like this (not sure why the NA values for pval and adj.pval in some peaks/genes:

peak.counts:

P10_ctr.bam P18_ctr.bam P48_ctr.bam P10_ctr.merip.bam P18_ctr.merip.bam P48_ctr.merip.bam P10_noMettl14.bam P18_noMettl14.bam P48_noMettl14.bam P10_noMettl14.merip.bam P18_noMettl14.merip.bam P48_noMettl14.merip.bam
peak1 29 14 16 59 42 35 19 18 17 11 17 0
peak2 4 8 10 9 5 12 3 9 1 2 3 0
peak3 0 2 1 7 25 9 1 0 0 4 1 48
peak4 0 1 1 7 13 3 0 1 0 15 5 1
peak5 4 2 2 7 8 2 4 2 3 11 1 0
peak6 6 19 12 6 6 1 4 6 6 2 5 173

gene.counts:

P10_ctr.bam P18_ctr.bam P48_ctr.bam P10_noMettl14.bam P18_noMettl14.bam P48_noMettl14.bam
ENSMUSG00000102693 0 0 0 0 0 0
ENSMUSG00000064842 0 0 0 0 0 0
ENSMUSG00000051951 513 375 377 430 261 285
ENSMUSG00000102851 9 11 10 7 5 3
ENSMUSG00000103377 6 14 5 5 12 4
ENSMUSG00000104017 4 2 7 3 5 4
ENSMUSG00000103025 4 11 10 10 6 7
ENSMUSG00000089699 4 3 5 1 2 6
ENSMUSG00000103201 1 22 25 6 9 4

peak.de:

peak.l2fc peak.p peak.padj
peak1 -1.67341657435476 0.148387931542962 0.996843961582788
peak2 -1.83535374311587 0.217938419923026 NA
peak3 1.06905802400045 0.436821611779313 0.996843961582788
peak4 0.434711971036041 0.735856828666445 NA
peak5 0.0410108544099603 0.98025271421217 NA

gene.de:

gene.l2fc gene.p gene.padj
ENSMUSG00000102693 NA NA NA
ENSMUSG00000064842 NA NA NA
ENSMUSG00000051951 -0.395917486181937 0.118312520014552 0.615789730185453
ENSMUSG00000102851 -1.0261528515366 0.162465513218275 0.698211601649789
ENSMUSG00000103377 -0.27116537774478 0.76605952647633 0.995022010614611
ENSMUSG00000104017 -0.098912248225897 0.918869708541386 NA

Thanks.

Eduardo

PS. I hope the formatting looks OK. It was not easy.

problem in running DEQ with GTF

Dear developer,
I tried to run DEQ as decribed but I got this error:

Error in `rownames<-`(`*tmp*`, value = c("DDX11L1", "WASH7P", "MIR6859-1",  : 
  missing values not allowed in rownames
Calls: deq ... elementMetadata -> .local -> rownames<- -> rownames<-
In addition: Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
Execution halted

can you help me in solving this, please?

problem about replicates

Hey!can I use your method for detecting differential RNA methylation sites without replicates?since I know that differential expression analysis by DEseq2 dose need replicates

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.