alyssafrazee / ballgown Goto Github PK

View Code? Open in Web Editor NEW

141.0 141.0 58.0 16.02 MB

Bioconductor package "ballgown", devel version. Isoform-level differential expression analysis in R.

Home Page: http://biorxiv.org/content/biorxiv/early/2014/09/05/003665.full.pdf

R 100.00%

ballgown's People

Stargazers

Watchers

ballgown's Issues

polyester: add ability to set seed

because reproducibility!

fix gexpr method

lapply statement (marked as "slow") is not actually necessary! yay!

looking for ballgown zip package to install on Windows

--Hi,

is somebody has already built ballgown package to install it on Windows ?

thank you --

plotTranscripts has a problem because structure(gown)$trans is a list

This may already be fixed but the problem is in line 54

ma = IRanges::as.data.frame(structure(gown)$trans)

add "estimated fold change" as part of stattest output

more "global" visualization functions?

MA plots, etc?

pData validity

in particular, make sure the first column of pData uniquely identifies replicates, or remove that requirement from (e.g.) subset.

constructor warnings

this is related to another issue (double checking data structure in constructor), but is a bit more subtle:

the constructor currently handles the case where the [exon or intron]-to-transcript table contains [exons or introns] that don't appear in the [e or i]_data.ctab table. said [exons or introns] are dropped from the structure component of the ballgown object but will still appear in the indexes components, for transparency. however, this behavior currently happens silently and possibly indicates a bug in the preprocessor.

so the issue is twofold: (1) print a warning when this behavior is invoked (easy fix) and (2) figure out why this occurs in the first place, which involves digging into the cufflinks preprocessor (more difficult).

Can't install ball gown on centOS 6

Dear ballgown developers,

I have installed devtools and GenomicsRanges, but I came upon the following errors when installing ballgown. Could you please help me? Thanks!

install_github('ballgown', 'alyssafrazee')
Installing github repo ballgown/master from alyssafrazee
Downloading ballgown.zip from https://github.com/alyssafrazee/ballgown/archive/master.zip
Installing package from /tmp/Rtmp9ueDZ4/ballgown.zip
arguments 'minimized' and 'invisible' are for Windows only
Installing ballgown
'/usr/lib64/R/bin/R' --vanilla CMD INSTALL
'/tmp/Rtmp9ueDZ4/devtoolsac944b3c67e3/ballgown-master'
--library='/usr/lib64/R/library' --install-tests

installing source package 'ballgown' ...
** R
** tests
** preparing package for lazy loading
Creating a new generic function for 'structure' in package 'ballgown'
Creating a new generic function for 'data' in package 'ballgown'
** help
*** installing help indices
converting help for package 'ballgown'
finding HTML links ... done
annotate_assembly html
Error: /tmp/Rtmp9ueDZ4/devtoolsac944b3c67e3/ballgown-master/man/annotate_assembly.Rd:28: Bad \link text
removing '/usr/lib64/R/library/ballgown'
Error: Command failed (1)

consider documenting group/time interaction models

ballgown function should only join on id, maybe use match instead of join from plyr

RSEM reader

Parser for sailfish

Add a parser for sailfish output:
http://www.cs.cmu.edu/~ckingsf/software/sailfish/

does not work if the data_directory is 'ballgown'

If the data directory name is ballgown, like data_directory = system.file('ballgown', package='ball gown'), it is not recognising the data dir.

plotTranscripts should have "user friendly defaults"

i'm thinking type="fpkm", sample="first sample", colorby="transcripts"

subset + exprfilter: examples in documentation

more explicit example of how to create pData in vignette

subset method

Subset method needs work. Currently:

problems parsing cond argument with multiple conditions (e.g., chr==22 & start>500)
can only subset by rows (e.g. can only restrict the set of genomic features to look at). would be nice to be able to subset by sample, outcome, or phenotype.

stringtie reader

note the prefix/"stringtie" in front of the file names

plotTranscripts should allow users to plot multiple samples in multi-panel plot

vignette: better example for adjusting for confounders

show adjustment for batch, just a 2-group comparison.

sampleNames issue

top level directory for the ctab files might not necessarily identify samples

biocLite("Biostrings") issue

There is a typo in the spelling of Biostrings. The "s" should not be capitalized on this page https://github.com/alyssafrazee/ballgown/blob/master/polyester/README.md

-Thanks

add [] method for ballgown objects

in addition to subset.

double-checking data structure in ballgown constructor

in datasets where not all the samples have the same set of transcripts in their t_data.ctab (etc) files, ballgown reads/parses the data without error/warning. I only found the issue when I got NAs in my transcript expression matrix. This should be fixed! :)

return *expr(bg, 'fpkm') and 'cov' as matrices (not data frames)

exon-level tests

differentiate them from gene-level tests (so you don't just get a bunch of significant exons if the whole gene is DE).

source code for tablemaker

plotTranscripts should accept type ="cov" or "FPKM" and not try to extract from sample name

allow "." character in folder names / sampleNames

this will involve not splitting on "." in eexpr, subset, etc, or at least not taking just the 2nd slot.

Issue with stattest

I get the following error when I try running stattest

Error in quantile.default(lognz, 0.75) :
missing values and NaN's not allowed if 'na.rm' is FALSE

How can I remove the NaNs from the bg before doing stattest?

stattest could possibly be improved, speed-wise

rowFtests instead of f.pvalue?

finish writing tests

and then have someone who knows how to write tests give you feedback on your tests.

Documenting functions

A few thoughts on file https://github.com/alyssafrazee/ballgown/blob/polyesterdocs/polyester/R/simulate_experiment.R:

You could tell users what each parameter is expecting by doing something like @param (logical) This parameter bla ba, where (logical) tells users that a logical is expected (of course ideally this expectation is checked within the function)
Watch out for lines longer than 100 characters. CRAN checks I think will throw errors at you (e.g, maybe this line)
Curious why this package is being imported within the function https://github.com/alyssafrazee/ballgown/blob/polyesterdocs/polyester/R/simulate_experiment.R#L44 ? Doesn't seem to be in the DESCRIPTION file
You asked about when knowing if param descriptions were too long and needed to be pushed to Details. They all seem appropriate to me. When I get to the point of making a list via \itemize or similar then I usually say See Details...
If you get super long param descriptions, you can move some or all of the text to .r files in a man-roxygen directory in the root of the pkg and just use @template yourtemplate.r in the function docs

add better description for multigroup analyses

strip quotes rather than remove first/last character

vignette: document "getFC" in the vignette

add "samples" argument to reader in vignette

add log option to plotTranscripts

and potentially extra flexibility as well

document polyester package

default for texpr meas argument should be "FPKM" not "all"

feature request: gene-level count tables

make assmb2annot faster

not usable on any large # of transcripts in its current state

unused argument (sampleNames = sample_IDs)

Hi,I have a question whiling making the ballgown object using function "samples",it is giving an error " unused argument (sampleNames = sample_IDs)",why?
In my run,
sample_IDs "NT.0345dde4-90c6-4c5e-ae82-562eccb652da","NT.03f64293-0f12-4aec-a1de-1dfd295ea95b",etc.
Thanks a lot
Charlin

polyester: upper limit on # of reads that can be simulated should be raised.

In the add_error function, the call to "unlist" means that a max of 2^31 nucleotides total can be simulated in the experiment, which limits the number of reads you can simulate (exact limit depends on read length). Reason: R can only store vectors with fewer than 2^31 entries.

I think it would be good if we could write the code differently so we don't run into the 2^31 limit quite so quickly.

reader for cuffnorm output

Handling replicates in ballgown

Hello,
How can I specify different replicates for each sample in Ballgown? I have a time-course study with wild- type and mutant. For each sample I have three replicates.
I see that in manual you have mentioned pData(bg) = data.frame(id=sampleNames(bg), group=rep(c(1,0), each=10)), which would be for distinguishing different groups such as WT and MT. However, I could not find anything in manual or Github for replicate handling in ballgown. What should my pData look like for a time-course for WT and Mutant with three replicates? Thanks.

Librarie issues installing tablemaker

Hi,

Quick note:
I have managed installing ballgown for R following the README. One additional step I had to do was the preliminary installation of "sva", which you may want to mention beside devtools in the README.

biocLite("sva")

Moving on to installing ballgown:

# I went to the directory:
cd ./ballgown/tablemaker/tablemaker-2.1.1
# The configure script fails to detect a required library
./configure
[more messages]
checking for boostlib >= 1.47.0... yes
checking for bamlib... configure: error: We could not detect the bam libraries (version  or higher). If you have a staged bam library (still not installed) please specify $BAM_ROOT in your environment and do not give a PATH to --with-bam option.  If you are sure you have bam installed, then check your version number looking in <bam/version.hpp>. See http://randspringer.de/bam for more documentation.

The strange thing is that I had the same error message for the boostlib library the first time I ran ./configure. Except it was telling me (version 1.47 or higher). For bamlib, you'll notice the version is white space or empty.
I solved the problem for boostlib by installing libboost1.48-dev through synaptic.
However, I tried the same for libbam-dev through synaptic, which is now installed, but the error message remains the same. Is it a version issue?

Many thanks already for your advice

Error in installation

Hi,

I have tried to install the devel as well as the release version and failed on both counts.

I have installed al the required dependencies and updated all other tools but it seems to require a newr version of GenomicRanges that is not available in Bioconductor.

"GenomicRanges' 1.16.3 is being loaded, but >= 1.17.25"

My R sessionInfo()

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] Biobase_2.24.0 rtracklayer_1.24.2 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9 BiocGenerics_0.10.0
[7] BiocInstaller_1.14.2

loaded via a namespace (and not attached):
Error in x[["Version"]] : subscript out of bounds
In addition: Warning message:
In FUN(c("BatchJobs", "BBmisc", "BiocParallel", "Biostrings", "bitops", :
DESCRIPTION file of package 'devtools' is missing or broken

simpler syntax for recovering gene ID for a transcript

maybe some shorter syntax for things like texpr(bg, 'all')$gene_id[match(blahblah),] and maybe even names(structure(bg)$trans) ?

alyssafrazee / ballgown Goto Github PK

ballgown's People

Stargazers

Watchers

Forkers

ballgown's Issues

Recommend Projects

Recommend Topics

Recommend Org