Giter Club home page Giter Club logo

pathostat's People

Contributors

anfederico avatar ecastron avatar hpages avatar jasonzhao0307 avatar link-ny avatar mani2012 avatar mlbendall avatar nturaga avatar sanrrone avatar tfaits avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pathostat's Issues

Modularize code

Check out this article.
http://shiny.rstudio.com/articles/modules.html

In order to work together effectively, and to make PathoStats as flexible as possible, we are going to need to write code that can be easily plugged into the PathoStats core. In the above article, the Rstudio folks have written up one possible way to do it, and I think it makes sense. I'm also open to other ideas.

I've started to do this for the Relative Abundance module, and I put an idea of what this might look like in this branch: https://github.com/mani2012/PathoStat/blob/newRAplots/R/relativeAbundance.R

My proposal is that each "top level" (relative abundance, diversity, differential expression, etc.) will live in its own file (within R/ directory). The module will be loaded into the shiny server. If you want to make additional tabs within that module, I would suggest that you create these as sub modules of the top-level module. This way all the code as well as UI elements for a tab/module are in one place and should be easy to change in the future.

Create color scheme that can be used across modules

PathoStat should have unified color scheme. Plus a way to easily select complementary/contrasting colors.

For example, with abundance barplots, members of the same phylum could be the same base color, OTUs within each phylum can be a shade of that base color. This is easier to read than random/sequential colors for each OTU.

Example from schizophrenia paper:
fig1.pdf

Accommodate phyloseq-class objects throughout

Currently, PathoStat takes a data matrix, batch information, and condition information as inputs, and is limited to those options. Instead, we're going to coerce any input (PathoScope report, .biom files, etc) into a phyloseq-class object, and pass that object along. This allows for the user to attach any number of covariates or phenotypic data along with the data matrix. This also means a general overhaul of almost every function in the package.

Distinguish between discrete and continuous variables

In its current form, PathoStat accepts "batch" and "condition" as possible discrete variables, and gives the user the option to color/group data (in various plots) by either of those. However, we're adding functionality: PathoStat will accept any number of covariates, such as patient age, weight, race, disease status, whatever. We still want to let users color/group data based on these things, but that doesn't make much sense for continous variables. Without binning, how do you group people by weight? You can, however, order data by continuous variables. We want to at least distinguish between the two types, and we may want to add functionality for continuous variables.

Shini not working

Hi,

I launch pathostat interactive but the shini app does not plot anything (I tried with the example data too). You know what can be happening?

Thanks!

OTU filtering

For those not at BU, we had a conversation today about how different analyses have different filtering requirements for the data. For example, you should not filter low-abundance OTUs for alpha diversity calculations, but there are other situations where you might want to filter for analysis or visualization. So we concluded:

  1. The entire raw PathoID reports should be read in and stored
  2. We need a general purpose function for filtering the data. For example, get only the top 10 OTUs, or get all OTUs that account for >1% of the data, or remove OTUs that are only present in one sample.
  3. There will be intermediate layer that performs this filtering. Functions should assume that it is being handed a properly filtered object.

There are other details that need to be sorted out, such as how to track if users upload pre-filtered data, etc.

Coerce input data into phyloseq-class objects

We want to let users submit a range of data formats to PathoStat, but we want all of the processing, analyses, and outputs to be of a standard format. If we can coerce input data types (PathoScope reports, .biom files, Qiime output, etc) into phyloseq objects, it will make everything easier to work with. This carries the added advantage of compatibility with a bunch of outside packages/tools.

BatchQC dependency

runPathoStat fails if BatchQC is not installed:

Error in loadNamespace(name) : there is no package called ‘BatchQC’

Also found that batchQC can not be installed using biocLite, must use devtools::install_github.

Implement temporal analysis and visualization tab

Metagenomics as a field is starting to move to longitudinal/temporal analysis and visualizations. Yet, the are not many tools or packages with this type of functionality. This would be a new feature for PathoStats in the form of a tab.

  1. Implement analysis as those described in PMID: 26005845
  2. Implement a visualization such as "alluvial plots"

example alluvial plot

Sample data and code for alluvial plot:

alluvial_paper.R
https://www.dropbox.com/s/d8hpyj0bt8ddtmv/sample_data.zip?dl=0

Taxonomy table data structure

There are a few issues with the way we have taxonomy table currently in PathoStat. Hope this can start a discussion about how we want to handle these things.

  • Should be able to get the ranks without hard coding.
    tax.name is hard-coded in ui.R. I've solved this without hard-coding in the core OTU module by getting the ranks from the PathoStat object using rank_names(). However I don't think that phyloseq enforces whether the rank names are ordered. I suggest we override rank_names() and somehow enforce the hierarchical order.
  • Issues with "no rank".
    Classifications that do not fall into the usual taxonomic ranks are called "no rank" by NCBI. However, these classifications do not correspond to the same level! Sometimes this is a group ("Terrabacteria group"), sometimes this is "cellular organisms", sometimes it is a strain or other classification. The way it is handled now, all these are treated as a taxonomic rank. Also, as currently implemented, OTUs with multiple "no rank" classifications are overwritten. This should be changed If we consider the "no rank" information to be valuable.
  • Issues with "others"
    Not sure if this is a problem with the sample data or the way it is loaded, but there should not be "others" when loading the full data.
  • Missing data
    Many OTUs have missing. Should we propagate data from higher taxonomic levels to fill-in this missing data? For example, if OTUs are missing data from the class level, but have information at the phylum level, should we fill in the "class" field with the phylum? (This is how I've dealt with this issue in previous analyses.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.