The aggregationde from pachterlab

aggregationde's Introduction

This repository contains the scripts and software for reproducing the results and figures of the paper "Gene-level differential analysis at transcript-level resolution" by Lynn Yi, Harold Pimentel, Nicolas L Bray and Lior Pachter. The code can also be used to apply the aggregation methods described in the paper to new datasets. The software in the repository was written by Lynn Yi.

R/Snakefile is an example pipeline for downloading fastq files, performing pseudoquant, and bootstraping on TCCs. The remainder processes for calling sleuth and aggregation p-values are performed in R scripts, tcc_pipeline.R and transcript_pipeline.R

R/aggregation.R contains logic for performing aggregation, incuding mapping TCCs to genes. R/tcc2bootstrap.R contains logic for performing bootstraps on TCCs and writing h5 files that sleuth can take as input.

The folders SRPXXXXX contain code for reproducing analysis for the two datasets in the paper. They include Snakefiles for read downloading and quantification, aggregation pipelines, and GO analysis. plot_transcripts.R include code for reproducing Figures 1 and 2 in paper. topGO.R include code for reproducing Figure 5 and performing GO analysis.

The folder simulation_pipeline contains code for reproducing the analyses of simulations described in the paper. pachterlab/sleuthpaperanalysis must be utilized first to create simulations. Then simulation_pipeline.R will run various differential expression and aggregation methods. averaging_fdrs.R and roc_curve.R will handle averaging FDR and sensitivities. Finally mamabear is invoked in mamabear.R to plot.

aggregationde's People

Contributors

Stargazers

Watchers

aggregationde's Issues

Regarding input files for simulation pipeline.

I was trying to run your pipeline but you have said that sleuthpaperanalysis must be utilized first to create simulations. Can you help in producing the simulations from sleuth paper analysis, as we are having issues running the pipeline to produce them?

The R code and (by proxy) the Snakefiles reference a variety of input directories and files which do not exist (such as finn_samples.txt), and of which we have no samples that we could imitate. Analyzing the code to determine the required format for these files appears to be quite an undertaking. Since you've managed to run these steps and have the experience we're lacking, could you share some insight on the exact nature of these files?

R script tcc2bootstrap2.R does not exist

The script referenced at https://github.com/pachterlab/aggregationDE/blob/master/R/Snakefile#L77 does not appear to exist in this repository. Can you point me to a location where I can download it from?

some question about paper

I have read your paper, but there is a problem that I don't quite understand. The p value of genes is estimated by the p value of transcript difference analysis, and basemean in transcript difference analysis results is used as weight. If one gene id corresponds to multiple transcript ids, how to choose between basemean and p value? In addition, is there any difference between directly converting transcript id into gene id for GO enrichment analysis and gene enrichment analysis based on p-value estimation?

Recommend Projects

pachterlab / aggregationde Goto Github PK

aggregationde's Introduction

aggregationde's People

Contributors

Stargazers

Watchers

Forkers

aggregationde's Issues

Regarding input files for simulation pipeline.

R script tcc2bootstrap2.R does not exist

some question about paper

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent