vivianstats / maaper Goto Github PK

View Code? Open in Web Editor NEW

6.0 1.0 4.0 197 KB

Model-based analysis of APA using 3' end-linked reads

Home Page: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02429-5

R 100.00%

rna-seq alternative-polyadenylation bioinformatics-tool

maaper's People

Contributors

Stargazers

Watchers

Forkers

vallurumk haroon123 liguowei-cas ekstroup

maaper's Issues

Comparing celltypes from multiple samples.

Hi Vivian,
I'm trying to compare two celltypes. but reads are coming from multiple samples. do you suggest I extract the cells from each sample to a single bam file or multiple bam files?
to clarify I have ~1000 cells from 4 samples. I can extract celltypes A and B from these 4 Bam files and end up with 4 Cell A bam files and 4 Cell B bam files. then I can input these bam files to the bam_c1 and 2.
Alternatively, I can merge these bam files to one bam file and end up with one bam file per cell type. Which one is preferred for the MAPPER analysis or it doesn't matter?

Effects of parameters

Thank you for providing such a nice tool. But I have observed that the tool needs to set some parameters. Could you please provide detailed instructions for these parameters?

How do you determine fragment length

How do you determine fragment length. I have checked my samples, the bam rows do not exceed 100 characters. I get this error even thouhg I stretch my fragment lenght to 100000

Prepare reference genome ...
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Sequence levels in GTF:
 [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17"
[18] "chr18" "chr19" "chr20" "chr21" "chr22" "chrX"  "chrY"  "chrM" 
Prepare PAS annotation ...
19355 genes!
Start training for condition c1 - sample 1 ...
3032 genes used for training ...
1 fragments longer than 1000 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +  : 
  argument 'x' must be numeric
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '_run_one' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
  all scheduled cores encountered errors in user code

1 fragments longer than 600 ... Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric

Hi,

I am getting an error while I am trying to run the package with our data. I have seen the same error on another issue but couldnt see the result. I have checked the read length.

Thank you so much,

Onur.

Error:

Prepare reference genome ...
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Sequence levels in GTF:
 [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9" 
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chr21" "chr22" "chrX"  "chrY"  "chrM" 
Prepare PAS annotation ...
19130 genes!
Start training for condition c1 - sample 1 ...
2985 genes used for training ...
1 fragments longer than 600 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +  : 
  argument 'x' must be numeric
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '.' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
  all scheduled cores encountered errors in user code

Code:


> library(MAAPER)
> 
> pas_annotation = readRDS("./human.PAS.hg38.rds")
> gtf = "gencode.v38.annotation.gtf" #downloaded from gencode
> # bam file of condition 1 (could be a vector if there are multiple samples)
> bam_c1 = "E_C1F.fastq.gzAligned.sortedByCoord.out.bam"
> # bam file of condition 2 (could be a vector if there are multiple samples)
> bam_c2 = "E_C1M.fastq.gzAligned.sortedByCoord.out.bam"
> 
> maaper(gtf, # full path of the GTF file
>        pas_annotation, # PAS annotation
>        output_dir = "./", # output directory
>        bam_c1, bam_c2,
>        read_len = 85,
>        ncores=12# number of cores used for parallel computation 
> )

running maaper for more than one sample

Hi Vivian,
Thanks for the great tool. When I run Maaper with one sample per condition it works fine but when I input multiple bam files as a vector for each condition I get the error below, would you please help me with this?
In the last issue, you mentioned that it is preferred to input multiple bam files (per sample) per condition to Maaper rather than one pseudo-bulk.

Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric
Traceback:
. maaper(reference.gtf, apa_atlas, bam_c1 = ast_fb_ALSFTLD_bams,
. bam_c2 = ast_fb_control_bams, output_dir = ast_map_out, read_len = 76,
. ncores = 16)
2. wrap(pas_by_gene_single, pas_by_gene = pas_annotation, exons_gr,
. bam_c1, bam_c2, density_train_path, dist_thre, num_thre,
. read_len, num_pas_thre, frac_pas_thre, ncores, save_path,
. run, subset, region, verbose, output_dir, paired = paired)
3. lapply(conds, function(con) {
. if (con == "c1") {
. bam_paths = bam_c1
. }
. if (con == "c2") {
. bam_paths = bam_c2
. }
. ss = length(bam_paths)
. density_con = lapply(1:ss, function(k) {
. message(paste("Start training for condition", con, "-",
. "sample", k, "..."))
. bam_path = bam_paths[k]
. pdist = get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single,
. num_thre, dist_thre, ncores)
. gc()
. return(pdist)
. })
. })
4. FUN(X[[i]], ...)
5. lapply(1:ss, function(k) {
. message(paste("Start training for condition", con, "-", "sample",
. k, "..."))
. bam_path = bam_paths[k]
. pdist = get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single,
. num_thre, dist_thre, ncores)
. gc()
. return(pdist)
. })
6. FUN(X[[i]], ...)
7. get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single, num_thre,
. dist_thre, ncores)
8. density(dist_all, from = 0, to = dist_thre, n = dist_thre + 1)
9. density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +
. 1)
10. stop("argument 'x' must be numeric")

Individual Condition Tracks

Hello!

Would it be possible for MAAPER to output a bedgraph file for all the genes with multiple PAS in each of the conditions separately? I would be interested in visualising the coverage track for each of my samples. Thank you!

How to study other species?

hi!
How to produce a PAS annotation file? My research species is fish, if i have a PAS annotation file of fish, could i use this software?
THANK YOU1

RLDi and REDi different effect direction

Hi Vivian,

I am working on a monocytes dataset in a clinical framework and I have a technical question:

When the direction of the RLDi and REDi is opposite how would you try to interpret it?

This seems to happen when a single 3'-most exon site is present for a gene while mutiple intronic/internal PASs are present for the same gene.

Would you consider this situation ambiguous?

Thank you very much in advance!

Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric

Hi Vivian,
I got an error while using the offered example data to run the package, the same problem with Onur's.

Thank you,
Huoo

Code:

library(MAAPER) 
pas_anno <- readRDS('./mouse.PAS.mm9.rds')
gtf <- "./gencode.mm9.chr19.gtf"
bam_c1 <- "./NT_chr19_example.bam"
bam_c2 <- "./AS_4h_chr19_example.bam"
maaper(gtf, pas_anno, output_dir = "./",
       bam_c1, bam_c2,
       read_len = 76, ncores = 12)

Error:

Start training for condition c1 - sample 1 ...
86 genes used for training ...
1 fragments longer than 600 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +  : 
  argument 'x' must be numeric
Calls: maaper ... FUN -> get_pdist_singlePAS -> density -> density.default
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '.' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
  all scheduled cores encountered errors in user code
Execution halted

vivianstats / maaper Goto Github PK

maaper's People

Contributors

Stargazers

Watchers

Forkers

maaper's Issues

Comparing celltypes from multiple samples.

Effects of parameters

How do you determine fragment length

1 fragments longer than 600 ... Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric

running maaper for more than one sample

Individual Condition Tracks

How to study other species?

RLDi and REDi different effect direction

Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent