vivianstats / maaper Goto Github PK
View Code? Open in Web Editor NEWModel-based analysis of APA using 3' end-linked reads
Home Page: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02429-5
Model-based analysis of APA using 3' end-linked reads
Home Page: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02429-5
Hi Vivian,
I'm trying to compare two celltypes. but reads are coming from multiple samples. do you suggest I extract the cells from each sample to a single bam file or multiple bam files?
to clarify I have ~1000 cells from 4 samples. I can extract celltypes A and B from these 4 Bam files and end up with 4 Cell A bam files and 4 Cell B bam files. then I can input these bam files to the bam_c1 and 2.
Alternatively, I can merge these bam files to one bam file and end up with one bam file per cell type. Which one is preferred for the MAPPER analysis or it doesn't matter?
How do you determine fragment length. I have checked my samples, the bam rows do not exceed 100 characters. I get this error even thouhg I stretch my fragment lenght to 100000
Prepare reference genome ...
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Sequence levels in GTF:
[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17"
[18] "chr18" "chr19" "chr20" "chr21" "chr22" "chrX" "chrY" "chrM"
Prepare PAS annotation ...
19355 genes!
Start training for condition c1 - sample 1 ...
3032 genes used for training ...
1 fragments longer than 1000 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + :
argument 'x' must be numeric
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '_run_one' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
all scheduled cores encountered errors in user code
Hi,
I am getting an error while I am trying to run the package with our data. I have seen the same error on another issue but couldnt see the result. I have checked the read length.
Thank you so much,
Onur.
Error:
Prepare reference genome ...
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Sequence levels in GTF:
[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9"
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chr21" "chr22" "chrX" "chrY" "chrM"
Prepare PAS annotation ...
19130 genes!
Start training for condition c1 - sample 1 ...
2985 genes used for training ...
1 fragments longer than 600 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + :
argument 'x' must be numeric
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '.' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
all scheduled cores encountered errors in user code
Code:
> library(MAAPER)
>
> pas_annotation = readRDS("./human.PAS.hg38.rds")
> gtf = "gencode.v38.annotation.gtf" #downloaded from gencode
> # bam file of condition 1 (could be a vector if there are multiple samples)
> bam_c1 = "E_C1F.fastq.gzAligned.sortedByCoord.out.bam"
> # bam file of condition 2 (could be a vector if there are multiple samples)
> bam_c2 = "E_C1M.fastq.gzAligned.sortedByCoord.out.bam"
>
> maaper(gtf, # full path of the GTF file
> pas_annotation, # PAS annotation
> output_dir = "./", # output directory
> bam_c1, bam_c2,
> read_len = 85,
> ncores=12# number of cores used for parallel computation
> )
Hi Vivian,
Thanks for the great tool. When I run Maaper with one sample per condition it works fine but when I input multiple bam files as a vector for each condition I get the error below, would you please help me with this?
In the last issue, you mentioned that it is preferred to input multiple bam files (per sample) per condition to Maaper rather than one pseudo-bulk.
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric
Traceback:
. maaper(reference.gtf, apa_atlas, bam_c1 = ast_fb_ALSFTLD_bams,
. bam_c2 = ast_fb_control_bams, output_dir = ast_map_out, read_len = 76,
. ncores = 16)
2. wrap(pas_by_gene_single, pas_by_gene = pas_annotation, exons_gr,
. bam_c1, bam_c2, density_train_path, dist_thre, num_thre,
. read_len, num_pas_thre, frac_pas_thre, ncores, save_path,
. run, subset, region, verbose, output_dir, paired = paired)
3. lapply(conds, function(con) {
. if (con == "c1") {
. bam_paths = bam_c1
. }
. if (con == "c2") {
. bam_paths = bam_c2
. }
. ss = length(bam_paths)
. density_con = lapply(1:ss, function(k) {
. message(paste("Start training for condition", con, "-",
. "sample", k, "..."))
. bam_path = bam_paths[k]
. pdist = get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single,
. num_thre, dist_thre, ncores)
. gc()
. return(pdist)
. })
. })
4. FUN(X[[i]], ...)
5. lapply(1:ss, function(k) {
. message(paste("Start training for condition", con, "-", "sample",
. k, "..."))
. bam_path = bam_paths[k]
. pdist = get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single,
. num_thre, dist_thre, ncores)
. gc()
. return(pdist)
. })
6. FUN(X[[i]], ...)
7. get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single, num_thre,
. dist_thre, ncores)
8. density(dist_all, from = 0, to = dist_thre, n = dist_thre + 1)
9. density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +
. 1)
10. stop("argument 'x' must be numeric")
Hello!
Would it be possible for MAAPER to output a bedgraph file for all the genes with multiple PAS in each of the conditions separately? I would be interested in visualising the coverage track for each of my samples. Thank you!
hi!
How to produce a PAS annotation file? My research species is fish, if i have a PAS annotation file of fish, could i use this software?
THANK YOU1
Hi Vivian,
I am working on a monocytes dataset in a clinical framework and I have a technical question:
When the direction of the RLDi and REDi is opposite how would you try to interpret it?
This seems to happen when a single 3'-most exon site is present for a gene while mutiple intronic/internal PASs are present for the same gene.
Would you consider this situation ambiguous?
Thank you very much in advance!
Hi Vivian,
I got an error while using the offered example data to run the package, the same problem with Onur's.
Thank you,
Huoo
Code:
library(MAAPER)
pas_anno <- readRDS('./mouse.PAS.mm9.rds')
gtf <- "./gencode.mm9.chr19.gtf"
bam_c1 <- "./NT_chr19_example.bam"
bam_c2 <- "./AS_4h_chr19_example.bam"
maaper(gtf, pas_anno, output_dir = "./",
bam_c1, bam_c2,
read_len = 76, ncores = 12)
Error:
Start training for condition c1 - sample 1 ...
86 genes used for training ...
1 fragments longer than 600 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + :
argument 'x' must be numeric
Calls: maaper ... FUN -> get_pdist_singlePAS -> density -> density.default
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '.' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
all scheduled cores encountered errors in user code
Execution halted
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.