mani2012 / pathostat Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 9.0 13.12 MB

The purpose of this package is to perform Statistical Analysis on the PathoScope generated reports files.

R 100.00%

pathostat's People

Contributors

Stargazers

Watchers

Forkers

ecastron jasperyh wevanjohnson tseanlu tfaits wejlab ritah-nabunje jasonzhao0307 khemlalnirmalkar

pathostat's Issues

Load phyloseq object into shiny server (temporary)

Need some kind of object to work with until the PathoStat object is ready to go. We can just add a plain old phyloseq object for the time being.

Modularize code

Check out this article.
http://shiny.rstudio.com/articles/modules.html

In order to work together effectively, and to make PathoStats as flexible as possible, we are going to need to write code that can be easily plugged into the PathoStats core. In the above article, the Rstudio folks have written up one possible way to do it, and I think it makes sense. I'm also open to other ideas.

I've started to do this for the Relative Abundance module, and I put an idea of what this might look like in this branch: https://github.com/mani2012/PathoStat/blob/newRAplots/R/relativeAbundance.R

My proposal is that each "top level" (relative abundance, diversity, differential expression, etc.) will live in its own file (within R/ directory). The module will be loaded into the shiny server. If you want to make additional tabs within that module, I would suggest that you create these as sub modules of the top-level module. This way all the code as well as UI elements for a tab/module are in one place and should be easy to change in the future.

Create color scheme that can be used across modules

PathoStat should have unified color scheme. Plus a way to easily select complementary/contrasting colors.

For example, with abundance barplots, members of the same phylum could be the same base color, OTUs within each phylum can be a shade of that base color. This is easier to read than random/sequential colors for each OTU.

Example from schizophrenia paper:
fig1.pdf

Accommodate phyloseq-class objects throughout

Currently, PathoStat takes a data matrix, batch information, and condition information as inputs, and is limited to those options. Instead, we're going to coerce any input (PathoScope report, .biom files, etc) into a phyloseq-class object, and pass that object along. This allows for the user to attach any number of covariates or phenotypic data along with the data matrix. This also means a general overhaul of almost every function in the package.

Distinguish between discrete and continuous variables

In its current form, PathoStat accepts "batch" and "condition" as possible discrete variables, and gives the user the option to color/group data (in various plots) by either of those. However, we're adding functionality: PathoStat will accept any number of covariates, such as patient age, weight, race, disease status, whatever. We still want to let users color/group data based on these things, but that doesn't make much sense for continous variables. Without binning, how do you group people by weight? You can, however, order data by continuous variables. We want to at least distinguish between the two types, and we may want to add functionality for continuous variables.

Implement core microbiome heatmap

From coremicrobiome.R

Shini not working

Hi,

I launch pathostat interactive but the shini app does not plot anything (I tried with the example data too). You know what can be happening?

Thanks!

OTU filtering

For those not at BU, we had a conversation today about how different analyses have different filtering requirements for the data. For example, you should not filter low-abundance OTUs for alpha diversity calculations, but there are other situations where you might want to filter for analysis or visualization. So we concluded:

The entire raw PathoID reports should be read in and stored
We need a general purpose function for filtering the data. For example, get only the top 10 OTUs, or get all OTUs that account for >1% of the data, or remove OTUs that are only present in one sample.
There will be intermediate layer that performs this filtering. Functions should assume that it is being handed a properly filtered object.

There are other details that need to be sorted out, such as how to track if users upload pre-filtered data, etc.

Coerce input data into phyloseq-class objects

We want to let users submit a range of data formats to PathoStat, but we want all of the processing, analyses, and outputs to be of a standard format. If we can coerce input data types (PathoScope reports, .biom files, Qiime output, etc) into phyloseq objects, it will make everything easier to work with. This carries the added advantage of compatibility with a bunch of outside packages/tools.

Improve formatting of taxonomy table

Core OTU tab uses this, other modules probably will as well.

Remove dependency on "microbiome"

The core microbiome functions depend on functions in https://github.com/microbiome/microbiome, which introduces a lot of additional dependencies.

BatchQC dependency

runPathoStat fails if BatchQC is not installed:

Error in loadNamespace(name) : there is no package called ‘BatchQC’

Also found that batchQC can not be installed using biocLite, must use devtools::install_github.

Implement core microbiome 2D plot

From coremicrobiome.R

Implement temporal analysis and visualization tab

Metagenomics as a field is starting to move to longitudinal/temporal analysis and visualizations. Yet, the are not many tools or packages with this type of functionality. This would be a new feature for PathoStats in the form of a tab.

Implement analysis as those described in PMID: 26005845
Implement a visualization such as "alluvial plots"

example alluvial plot

Sample data and code for alluvial plot:

alluvial_paper.R
https://www.dropbox.com/s/d8hpyj0bt8ddtmv/sample_data.zip?dl=0

Taxonomy table data structure

There are a few issues with the way we have taxonomy table currently in PathoStat. Hope this can start a discussion about how we want to handle these things.

Should be able to get the ranks without hard coding.
tax.name is hard-coded in ui.R. I've solved this without hard-coding in the core OTU module by getting the ranks from the PathoStat object using rank_names(). However I don't think that phyloseq enforces whether the rank names are ordered. I suggest we override rank_names() and somehow enforce the hierarchical order.
Issues with "no rank".
Classifications that do not fall into the usual taxonomic ranks are called "no rank" by NCBI. However, these classifications do not correspond to the same level! Sometimes this is a group ("Terrabacteria group"), sometimes this is "cellular organisms", sometimes it is a strain or other classification. The way it is handled now, all these are treated as a taxonomic rank. Also, as currently implemented, OTUs with multiple "no rank" classifications are overwritten. This should be changed If we consider the "no rank" information to be valuable.
Issues with "others"
Not sure if this is a problem with the sample data or the way it is loaded, but there should not be "others" when loading the full data.
Missing data
Many OTUs have missing. Should we propagate data from higher taxonomic levels to fill-in this missing data? For example, if OTUs are missing data from the class level, but have information at the phylum level, should we fill in the "class" field with the phylum? (This is how I've dealt with this issue in previous analyses.)

Check that cached taxonomy matches with loaded reports

Taxonomy cache is loaded without checking that it makes sense with the PathoID reports. May need to go and fetch missing taxonomy IDs.

This should not be relevant once we start caching the PathoStat object.