vivianstats / maaper Goto Github PK

Model-based analysis of APA using 3' end-linked reads

Home Page: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02429-5

R 100.00%

rna-seq alternative-polyadenylation bioinformatics-tool

maaper's Issues

running maaper for more than one sample

Hi Vivian,
Thanks for the great tool. When I run Maaper with one sample per condition it works fine but when I input multiple bam files as a vector for each condition I get the error below, would you please help me with this?
In the last issue, you mentioned that it is preferred to input multiple bam files (per sample) per condition to Maaper rather than one pseudo-bulk.

Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric
Traceback:
. maaper(reference.gtf, apa_atlas, bam_c1 = ast_fb_ALSFTLD_bams,
. bam_c2 = ast_fb_control_bams, output_dir = ast_map_out, read_len = 76,
. ncores = 16)
2. wrap(pas_by_gene_single, pas_by_gene = pas_annotation, exons_gr,
. bam_c1, bam_c2, density_train_path, dist_thre, num_thre,
. read_len, num_pas_thre, frac_pas_thre, ncores, save_path,
. run, subset, region, verbose, output_dir, paired = paired)
3. lapply(conds, function(con) {
. if (con == "c1") {
. bam_paths = bam_c1
. }
. if (con == "c2") {
. bam_paths = bam_c2
. }
. ss = length(bam_paths)
. density_con = lapply(1:ss, function(k) {
. message(paste("Start training for condition", con, "-",
. "sample", k, "..."))
. bam_path = bam_paths[k]
. pdist = get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single,
. num_thre, dist_thre, ncores)
. gc()
. return(pdist)
. })
. })
4. FUN(X[[i]], ...)
5. lapply(1:ss, function(k) {
. message(paste("Start training for condition", con, "-", "sample",
. k, "..."))
. bam_path = bam_paths[k]
. pdist = get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single,
. num_thre, dist_thre, ncores)
. gc()
. return(pdist)
. })
6. FUN(X[[i]], ...)
7. get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single, num_thre,
. dist_thre, ncores)
8. density(dist_all, from = 0, to = dist_thre, n = dist_thre + 1)
9. density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +
. 1)
10. stop("argument 'x' must be numeric")

RLDi and REDi different effect direction

Hi Vivian,

I am working on a monocytes dataset in a clinical framework and I have a technical question:

When the direction of the RLDi and REDi is opposite how would you try to interpret it?

This seems to happen when a single 3'-most exon site is present for a gene while mutiple intronic/internal PASs are present for the same gene.

Would you consider this situation ambiguous?

Thank you very much in advance!

How to study other species?

hi!
How to produce a PAS annotation file? My research species is fish, if i have a PAS annotation file of fish, could i use this software?
THANK YOU1

How do you determine fragment length

How do you determine fragment length. I have checked my samples, the bam rows do not exceed 100 characters. I get this error even thouhg I stretch my fragment lenght to 100000

Prepare reference genome ...
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Sequence levels in GTF:
 [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17"
[18] "chr18" "chr19" "chr20" "chr21" "chr22" "chrX"  "chrY"  "chrM" 
Prepare PAS annotation ...
19355 genes!
Start training for condition c1 - sample 1 ...
3032 genes used for training ...
1 fragments longer than 1000 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +  : 
  argument 'x' must be numeric
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '_run_one' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
  all scheduled cores encountered errors in user code

Effects of parameters

Thank you for providing such a nice tool. But I have observed that the tool needs to set some parameters. Could you please provide detailed instructions for these parameters?

Individual Condition Tracks

Hello!

Would it be possible for MAAPER to output a bedgraph file for all the genes with multiple PAS in each of the conditions separately? I would be interested in visualising the coverage track for each of my samples. Thank you!

1 fragments longer than 600 ... Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric

Hi,

I am getting an error while I am trying to run the package with our data. I have seen the same error on another issue but couldnt see the result. I have checked the read length.

Thank you so much,

Onur.

Error:

Prepare reference genome ...
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Sequence levels in GTF:
 [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9" 
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chr21" "chr22" "chrX"  "chrY"  "chrM" 
Prepare PAS annotation ...
19130 genes!
Start training for condition c1 - sample 1 ...
2985 genes used for training ...
1 fragments longer than 600 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +  : 
  argument 'x' must be numeric
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '.' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
  all scheduled cores encountered errors in user code

Code:


> library(MAAPER)
> 
> pas_annotation = readRDS("./human.PAS.hg38.rds")
> gtf = "gencode.v38.annotation.gtf" #downloaded from gencode
> # bam file of condition 1 (could be a vector if there are multiple samples)
> bam_c1 = "E_C1F.fastq.gzAligned.sortedByCoord.out.bam"
> # bam file of condition 2 (could be a vector if there are multiple samples)
> bam_c2 = "E_C1M.fastq.gzAligned.sortedByCoord.out.bam"
> 
> maaper(gtf, # full path of the GTF file
>        pas_annotation, # PAS annotation
>        output_dir = "./", # output directory
>        bam_c1, bam_c2,
>        read_len = 85,
>        ncores=12# number of cores used for parallel computation 
> )

Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric

Hi Vivian,
I got an error while using the offered example data to run the package, the same problem with Onur's.

Thank you,
Huoo

Code:

library(MAAPER) 
pas_anno <- readRDS('./mouse.PAS.mm9.rds')
gtf <- "./gencode.mm9.chr19.gtf"
bam_c1 <- "./NT_chr19_example.bam"
bam_c2 <- "./AS_4h_chr19_example.bam"
maaper(gtf, pas_anno, output_dir = "./",
       bam_c1, bam_c2,
       read_len = 76, ncores = 12)

Error:

Start training for condition c1 - sample 1 ...
86 genes used for training ...
1 fragments longer than 600 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +  : 
  argument 'x' must be numeric
Calls: maaper ... FUN -> get_pdist_singlePAS -> density -> density.default
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '.' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
  all scheduled cores encountered errors in user code
Execution halted

Comparing celltypes from multiple samples.

Hi Vivian,
I'm trying to compare two celltypes. but reads are coming from multiple samples. do you suggest I extract the cells from each sample to a single bam file or multiple bam files?
to clarify I have ~1000 cells from 4 samples. I can extract celltypes A and B from these 4 Bam files and end up with 4 Cell A bam files and 4 Cell B bam files. then I can input these bam files to the bam_c1 and 2.
Alternatively, I can merge these bam files to one bam file and end up with one bam file per cell type. Which one is preferred for the MAPPER analysis or it doesn't matter?

vivianstats / maaper Goto Github PK

maaper's Issues

running maaper for more than one sample

RLDi and REDi different effect direction

How to study other species?

How do you determine fragment length

Effects of parameters

Individual Condition Tracks

1 fragments longer than 600 ... Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric

Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric

Comparing celltypes from multiple samples.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent