qiime2 / q2-longitudinal Goto Github PK

View Code? Open in Web Editor NEW

9.0 11.0 18.0 10.64 MB

QIIME 2 plugin for paired sample comparisons

License: BSD 3-Clause "New" or "Revised" License

Python 95.34% HTML 3.18% Makefile 0.09% TeX 0.74% CSS 0.66%

hacktoberfest

q2-longitudinal's Introduction

q2-longitudinal

This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.

q2-longitudinal's People

Contributors

Stargazers

Watchers

Forkers

gregcaporaso ebolyen jairideout elong0527 jakereps turanoo stephanieorch chriskeefe oddant1 andrewsanchez timyerg sterrettjd nbokulich eldeveloper lizgehret hagenjp colinvwood ahderojas

q2-longitudinal's Issues

spaghetti sample information mouse-over

spaghetti is great, but (in the words of @gregcaporaso ):

users are going to want to know which subjects the outlier lines (or any lines, for that matter) in these plots are... For example, you might be able to achieve this with mouse-overs that highlight a specific line and give more information about it including the subject id.

improve error messages on bad column names

$ qiime intervention paired-differences     --m-metadata-file ecam_map_maturity.txt     --m-metadata-file ecam_shannon.qza     --p-metric shannon     --p-group-column delivery     --p-state-column month     --p-state-1 12     --p-state-2 0     --p-individual-id-column not-a-column     --o-visualization ecam-delivery-alpha     --p-no-drop-duplicates --verbose
Traceback (most recent call last):
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2442, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)
  File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)
KeyError: 'not-a-column'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/q2cli/commands.py", line 222, in __call__
    results = action(**arguments)
  File "<decorator-gen-251>", line 2, in paired_differences
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/qiime2/sdk/action.py", line 201, in callable_wrapper
    output_types, provenance)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/qiime2/sdk/action.py", line 392, in _callable_executor_
    ret_val = callable(output_dir=temp_dir, **view_args)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/q2_intervention/_intervention.py", line 38, in paired_differences
    drop_duplicates=drop_duplicates)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/q2_intervention/_utilities.py", line 36, in _get_group_pairs
    for individual_id in set(group_md[individual_id_column]):
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/frame.py", line 1964, in __getitem__
    return self._getitem_column(key)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/frame.py", line 1971, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/generic.py", line 1645, in _get_item_cache
    values = self._data.get(item)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/internals.py", line 3590, in get
    loc = self.items.get_loc(item)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2444, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)
  File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)
KeyError: 'not-a-column'

Plugin error from intervention:

  'not-a-column'

See above for debug info.

Should instead say something like: The individual column specified (not-a-column) is not a column name in the sample metadata. Available columns are: ...

Confirm that any time a category is passed as a type different than qiime2.MetadataCategory that you catch these issues.

ENH: new action: NMIT

https://www.ncbi.nlm.nih.gov/pubmed/28872698

suggestions for visualizations

In the paired-differences boxplot, the y-axis label should be Difference in {metric} (state 2 - state 1), or you could get more fancy with it and actually use the state_column, state1 and state2 variables, in which case it could be: Difference in {metric} ({state_column} {state2} - {state_column} {state2}) (e.g., Difference in shannon (month 12 - month 0))

In Paired difference tests table, can you include the test name and the test statistic name (currently it just says stat, but you should be able to keep a dict mapping test name to test statistic name so that this label could be more informative. See here for an example. Also, please make the P column label say P value and FDR P -> FDR P-value.

Can the Multiple group tests table be transposed so it matches the others?

I'm confused about what the difference is between the Multiple group tests and Pairwise comparison tests tables when there are only two groups. It might help to have a brief description of what each test is (and including the test name in each would help with this). When there are only two groups, should the results of these tests be different (they are in the README example, so just confirming that that is expected).

These should also be applied to the pairwise-distances visualization.

technicolor spaghetti

Improvement Description
spaghetti color is defined by the group value at the initial state.

ideally, spaghetti color should change dynamically. E.g., some metadata categories (like antibiotic use or other exposures) may change longitudinally for a subject. It would be nice to capture those.

Comments
That probably cannot be done easily... but it would be pretty cool if it could.

ENH: `pairwise-distances` test to compare within- and between-sample distances

References
This beta-dispersion test might be a good candidate.

`volatility`: plot subpanel with N per group

Proposed Behavior
show N per group per state as histogram or line plot sharing axis with main (volatility) plot.

and/or toggle sample size in x-axis label?

Comments
If that's difficult/ugly forget about it — but it might save folks from manually typing in this info for pub-ready figures.

Mirkat and PERMANOVA-S implementation

Improvement Description
Just wondering if there are plan to implement the two methods below. They can also be used for longitudinal microbiome analysis with a given distance matrix.

References

blank plots created if no samples found that correspond to a state value

This will be confusing for users if they accidentally specify an state value that doesn't correspond to something in their data...

You should probably throw an error if there are no paired samples being evaluated.

This also suggests that it's going to be important to tell the user how many samples were included in each test. Could you include n (number of paired samples per group) in all of the tables? See the pairwise table here for one example of where we do this.

BUG: LME plots fail to generate when single variable is provided

When a single independent variable is used for LME, plots fail to generate because a single AxesSubplot object is generated — the current code expects multiple variables/subplots, and the ability to index these.

Key error is here:

File "/Users/nbokulich/miniconda3/envs/qiime2-2017.8/lib/python3.5/site-packages/q2_longitudinal/_utilities.py", line 351, in _regplot_subplots_from_dataframe
    ax=axes[num], lowess=lowess, ci=ci)
TypeError: 'AxesSubplot' object does not support indexing

This bug is noted in this forum post.

BUG: volatility: label sorting on plots occasionally breaks

Most plots i've seen are in correct order, but not this subplot

Other subplots in the same plot work, including the following that contains these same data (but different labels), so this may be a labeling issue, not a sorting issue.

`volatility`: show/hide groups/individuals interactively

Improvement Behavior
click to show/hide individuals/groups.

Current Behavior
currently, can click on the group legend to show single groups, but this only does one at a time.

Proposed Behavior
Would be very useful to, e.g., drop one or more groups to focus on specific groups for comparison.

Comment
same with individuals (spaghetti) but no such feature exists. Would be very helpful to hide all spaghetti but one, for example, to compare an individual's trajectory vs. the group mean.

linear-mixed-effects: move LME results table to top of visualization

these are the actual test result! The figures are ornamental.

Standard way to import data in Python

I am exploring the way to import taxa and mapping data in python and develop new functions.

Could you show me a quick example how to import both taxa and mapping file. Then write a function (that follow _utility.py style) to calculate a beta diversity at Week 0?

Below is my code to include taxa information. But I am not sure what will be the standard way to import mapping information in Qiime2 artifact API.

from qiime2 import Artifact

taxa = Artifact.load("../tutorial_data/ecam-table-taxa.qza")
taxa_df = taxa.view(pd.DataFrame)

rename visualizers for clarity

The pairwise test visualizers should be renamed to pairwise-differences and pairwise-distances for clarity. @nbokulich and I discussed this offline.

ENH: add function to check for missing samples/time points for each individual

add download links for tsv of raw difference/distance data

it might be really useful for users to be able to download the raw distances/differences. this could look like a sample metadata file where the rows are:

sample-id <tab> {metric} difference <tab> group

sample-id <tab> {metric} distance <tab> group

ENH: new action: conditionally rare taxa

References
http://mbio.asm.org/content/5/4/e01371-14.full

volatility plots: sundry interactive features

zoom
toggle on/off 🍝
toggle on/off control limits
toggle on/off mean group trajectories
interactive color palette? Could be emperor style custom palette, or more likely just the selection of palettes currently exposed with the palette parameter
selection of different metrics and metadata grouping values (akin to alpha-rarefaction) could be useful, though this will require more structural changes to how these are handled at input

Actually, many of the parameters for this action could be useful as interactive features. E.g., interactively set x-tick intervals, yscale, xscale, but these are less important than those listed above.

`volatility`: add feature: zoom

References
https://vega.github.io/vega/examples/zoomable-scatter-plot/

new method: `explore_metadata`: plot longitudinal sample metadata categories/values

Example and idea provided by @elong0527:

X-axis = time (or other continuous metric)

y-axis = subject ID

points colored by group category (should accept categorical or continuous metadata, infer type, and color-code accordingly)

Could also add a parameter to change size or shape of points based on other optional metadata category inputs???

strange results if state 1 value equals state 2 value

I think all differences should be zero, but that's not what we're seeing:

update `paired-differences` example in readme to use a .qza as input

You mention that this is possible, but it'd be better to just use that in your example since that's the preferred way to do this (since it retains provenance where exporting the alpha diversity data wouldn't).

--m-metadata-file ecam_map_maturity.txt

You could also link to the metadata tutorial, which has a good description of this.

linear-mixed-effects: expose random slope/intercept parameters

ENH: new action: `compute-first-differences`

Computes first differences (differences in Y between sequential samples across time X)
X Y FD
1 1
2 3 2
3 7 4
4 11 4

Accepts metadata files, [optionally] distance matrix or feature table

Support value interpolation? Would be easy for series data but not for matrix!

`volatility`: make `group-column` optional

CLARIFICATION: without a group-column selected, mean lines should still be drawn, but calculated across all samples rather than aggregating by group. (edited 4/23/18)

new methods: longitudinal distance/difference from baseline

or other specified time point

ΔYt=Yt−Y0

Add Citations

Should use the new citation API in qiime2/qiime2#387

add raw data download for all plots

see @gregcaporaso 's comment in #36

One more thought: Would it be worth adding a download link for the data used to generate these plots? Since we're not including any statistics, it could be useful to allow the user to get that data to do statistics on their own. If we did that, I think you'd want a tsv file that looks something like:
delivery  month  studyid  shannon
vaginal  0  42  2.2
cesarian  0  43  3.0
(EDITED: to make the example file tab-separated text instead of comma-separated)

Strange plots resorting from different usage of q2-longitudinal

Hello,
I am using q2-longitudinal in a bit of a different way than it was originally perceived for and some strange behavior has resulted. This issue has arisen in creating a volatility plot using this code:
qiime longitudinal volatility
--m-metadata-file ../EXMP_Sample_metadata_3_17_2018.tsv
--m-metadata-file EXMP-200-4562-single-core-metrics-results/shannon_vector.qza
--p-metric shannon
--p-group-column activity
--p-state-column sample_number
--p-individual-id-column redcap_survey_identifier
--p-spaghetti yes
--o-visualization EXMP-200-4562-single-shannon-volatility.qzv
I am introducing an intervention period to be compared to a baseline period both prior to and after the intervention. As you can see this creates a volatility plot that is unusual:

It causes some strange behavior that I am not so sure is a bug, but rather arising from the differing way I am trying to do things.
Thanks,
Arron Shiffer

Drop README tutorial(s)

As with the other plugins in QIIME 2, we provide official tutorials as part of https://github.com/qiime2/docs, and unofficial tutorials on the forum. We should clear out this README of the existing tutorial content and ensure that things get moved to the appropriate location (docs or forum).

LME visualization suggestions

It would be helpful to link to a key that would help with interpreting the Model summary and Model results sections - we're going to get a lot of questions about interpreting these (I'm not exactly sure how to interpret them myself). I would recommend including links in those sections the visualization if possible, and if not expanding on the interpretation in the tutorial.

The meaning of Tutorial Data

How could I find the explanation of data in the folder "tutorial_data"?

I have few questions.

I assume ecam-table-taxa.qza contains species level taxa data
I assume ecam_map_maturity.txt contains mapping information.

Question:

What is ecam-table-maturity.qza?

HTML title displaying name of utility function instead of visualizer

This can be fixed by specifying a {% block title %} in the base template and passing in the correct title via the context (it looks like the title is generally already available, for other parts of the HTML document).

new method: longitudinally shared features

i.e., between an individual at time t and baseline (or another specified time)

LME singular matrix error should fail gracefully

Some inputs to LME will result in a singular matrix error:

File "/Users/nbokulich/miniconda3/envs/qiime2-2017.8/lib/python3.5/site-packages/numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular
    raise LinAlgError("Singular matrix")
numpy.linalg.linalg.LinAlgError: Singular matrix

This is not a bug — it is due to improper inputs to LME — but should fail gracefully so users can respond appropriately.

The issue seems to be that if the independent variables passed to LME are covariates, this results in a singular matrix error, either due to the correlation between covariates or to the lack of variance between group subcategories.

This issue was reported in this forum post.

`volatility`: download SVG

:spaghetti: plots: MPL -> D3

Blocking #79

replace print statements with warnings

see @gregcaporaso 's comment here

outdated documentation in `paired-differences`

I think this must be left over from before we had optional artifact support (the feature table is now optional in this visualizer, this documentation is just out of date): A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.

adjust alpha level in linear-mixed-effects scatterplots

Current Behavior
issues with glyph overdraw:

convert `metric` to a `MetadataCategory`?

(unless if MetadataCategories ever transpire).

collapse "pairwise-distances" and "paired-differences" into one action

use optional artifacts and "metric" to determine behavior

ENH: volatility: user-define or automatically scale x_tick intervals

Default is to show all ticks, which gets very ugly in long, frequently sampled experiments:

pairwise-differences: make `group-column` optional

Thus paired tests can be performed in a single group. E.g., does metric X change between states 1 and 2 in ALL samples (not stratified by group).

forum xref and xref

ENH: control charts and volatility analysis

Comments
this would be a new visualizer. thanks to @antgonza for the suggestion.

References
See compare_trajectories.py

ENH: volatility: add option to plot lines for each individual's trajectory in control charts

show all individuals in control charts, colored by group membership

use drop-down menu to select plot to display in visualization

Currently, linear-mixed-effects displays one plot for each independent factor input to the model. Instead of displaying all plots simultaneously, it could be useful to allow the user to choose which plot to display using a drop-down menu.

parameter suggestions

There are a lot of default metadata column names (eg. I would make these required parameters (without defaults) because it's unlikely that your defaults will be useful for most people, and if they're required it won't make users think that they need to rename the columns in their sample metadata.

Instead of using the term Metadata category, can you use Metadata column? We're switching our terminology since category doesn't necessarily make sense for continuous variables, so it'd be good to start making that change in documentation. For example: Metadata category on which to separate groups for comparison.

I recommend state-pre and state-post be renamed to state-1 and state-2, since pre/post aren't always relevant (e.g., you mention that "States" can also commonly be methodological).

I think non-parametric tests should always be the default:

  --p-parametric / --p-no-parametric
                                  [default: True]
                                  Perform parametric (ANOVA
                                  and t-tests) or non-parametric (Kruskal-
                                  Wallis, Wilcoxon, and Mann-Whitney U tests)
                                  statistical tests.

input variable plot titles in HTML template

This is currently hardcoded (below is LME plot):