Giter Club home page Giter Club logo

maaper's Issues

running maaper for more than one sample

Hi Vivian,
Thanks for the great tool. When I run Maaper with one sample per condition it works fine but when I input multiple bam files as a vector for each condition I get the error below, would you please help me with this?
In the last issue, you mentioned that it is preferred to input multiple bam files (per sample) per condition to Maaper rather than one pseudo-bulk.

Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric
Traceback:
. maaper(reference.gtf, apa_atlas, bam_c1 = ast_fb_ALSFTLD_bams,
. bam_c2 = ast_fb_control_bams, output_dir = ast_map_out, read_len = 76,
. ncores = 16)
2. wrap(pas_by_gene_single, pas_by_gene = pas_annotation, exons_gr,
. bam_c1, bam_c2, density_train_path, dist_thre, num_thre,
. read_len, num_pas_thre, frac_pas_thre, ncores, save_path,
. run, subset, region, verbose, output_dir, paired = paired)
3. lapply(conds, function(con) {
. if (con == "c1") {
. bam_paths = bam_c1
. }
. if (con == "c2") {
. bam_paths = bam_c2
. }
. ss = length(bam_paths)
. density_con = lapply(1:ss, function(k) {
. message(paste("Start training for condition", con, "-",
. "sample", k, "..."))
. bam_path = bam_paths[k]
. pdist = get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single,
. num_thre, dist_thre, ncores)
. gc()
. return(pdist)
. })
. })
4. FUN(X[[i]], ...)
5. lapply(1:ss, function(k) {
. message(paste("Start training for condition", con, "-", "sample",
. k, "..."))
. bam_path = bam_paths[k]
. pdist = get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single,
. num_thre, dist_thre, ncores)
. gc()
. return(pdist)
. })
6. FUN(X[[i]], ...)
7. get_pdist_singlePAS(bam_path, exons_gr, pas_by_gene_single, num_thre,
. dist_thre, ncores)
8. density(dist_all, from = 0, to = dist_thre, n = dist_thre + 1)
9. density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +
. 1)
10. stop("argument 'x' must be numeric")

RLDi and REDi different effect direction

Hi Vivian,

I am working on a monocytes dataset in a clinical framework and I have a technical question:

When the direction of the RLDi and REDi is opposite how would you try to interpret it?

This seems to happen when a single 3'-most exon site is present for a gene while mutiple intronic/internal PASs are present for the same gene.

Would you consider this situation ambiguous?

Thank you very much in advance!

How to study other species?

hi!
How to produce a PAS annotation file? My research species is fish, if i have a PAS annotation file of fish, could i use this software?
THANK YOU1

How do you determine fragment length

How do you determine fragment length. I have checked my samples, the bam rows do not exceed 100 characters. I get this error even thouhg I stretch my fragment lenght to 100000

Prepare reference genome ...
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Sequence levels in GTF:
 [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17"
[18] "chr18" "chr19" "chr20" "chr21" "chr22" "chrX"  "chrY"  "chrM" 
Prepare PAS annotation ...
19355 genes!
Start training for condition c1 - sample 1 ...
3032 genes used for training ...
1 fragments longer than 1000 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +  : 
  argument 'x' must be numeric
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '_run_one' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
  all scheduled cores encountered errors in user code

Effects of parameters

Thank you for providing such a nice tool. But I have observed that the tool needs to set some parameters. Could you please provide detailed instructions for these parameters?
image

Individual Condition Tracks

Hello!

Would it be possible for MAAPER to output a bedgraph file for all the genes with multiple PAS in each of the conditions separately? I would be interested in visualising the coverage track for each of my samples. Thank you!

1 fragments longer than 600 ... Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric

Hi,

I am getting an error while I am trying to run the package with our data. I have seen the same error on another issue but couldnt see the result. I have checked the read length.

Thank you so much,

Onur.

Error:

Prepare reference genome ...
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Sequence levels in GTF:
 [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9" 
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chr21" "chr22" "chrX"  "chrY"  "chrM" 
Prepare PAS annotation ...
19130 genes!
Start training for condition c1 - sample 1 ...
2985 genes used for training ...
1 fragments longer than 600 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +  : 
  argument 'x' must be numeric
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '.' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
  all scheduled cores encountered errors in user code

Code:


> library(MAAPER)
> 
> pas_annotation = readRDS("./human.PAS.hg38.rds")
> gtf = "gencode.v38.annotation.gtf" #downloaded from gencode
> # bam file of condition 1 (could be a vector if there are multiple samples)
> bam_c1 = "E_C1F.fastq.gzAligned.sortedByCoord.out.bam"
> # bam file of condition 2 (could be a vector if there are multiple samples)
> bam_c2 = "E_C1M.fastq.gzAligned.sortedByCoord.out.bam"
> 
> maaper(gtf, # full path of the GTF file
>        pas_annotation, # PAS annotation
>        output_dir = "./", # output directory
>        bam_c1, bam_c2,
>        read_len = 85,
>        ncores=12# number of cores used for parallel computation 
> )


Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre + : argument 'x' must be numeric

Hi Vivian,
I got an error while using the offered example data to run the package, the same problem with Onur's.

Thank you,
Huoo

Code:

library(MAAPER) 
pas_anno <- readRDS('./mouse.PAS.mm9.rds')
gtf <- "./gencode.mm9.chr19.gtf"
bam_c1 <- "./NT_chr19_example.bam"
bam_c2 <- "./AS_4h_chr19_example.bam"
maaper(gtf, pas_anno, output_dir = "./",
       bam_c1, bam_c2,
       read_len = 76, ncores = 12)

Error:

Start training for condition c1 - sample 1 ...
86 genes used for training ...
1 fragments longer than 600 ...
Error in density.default(dist_all, from = 0, to = dist_thre, n = dist_thre +  : 
  argument 'x' must be numeric
Calls: maaper ... FUN -> get_pdist_singlePAS -> density -> density.default
In addition: Warning messages:
1: In dir.create(output_dir, recursive = T) : '.' already exists
2: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
3: In mclapply(1:length(pas_by_gene_single), function(i) { :
  all scheduled cores encountered errors in user code
Execution halted

Comparing celltypes from multiple samples.

Hi Vivian,
I'm trying to compare two celltypes. but reads are coming from multiple samples. do you suggest I extract the cells from each sample to a single bam file or multiple bam files?
to clarify I have ~1000 cells from 4 samples. I can extract celltypes A and B from these 4 Bam files and end up with 4 Cell A bam files and 4 Cell B bam files. then I can input these bam files to the bam_c1 and 2.
Alternatively, I can merge these bam files to one bam file and end up with one bam file per cell type. Which one is preferred for the MAPPER analysis or it doesn't matter?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.