Giter Club home page Giter Club logo

qiime2-to-bugbase's Introduction

Aim

This is a simple workflow that aims to help create a BugBase-compatible biom file from within Qiime2. The BugBase pre-print can be found here. For additional details see its documentation page. As BugBase utilizes PICRUSt for its functional category predictions, the output table here is therefore also PICRUSt-compatiable. Note that there is a PICRUSt2 which users should consider if they are not looking to use BugBase. I was/am not involved with BugBase or PICRUSt in any capacity, this is just a crude workaround that I found useful and thought others might benefit from.

Here I outline my approach for use with 16S data, not shotgun metagenomic data. I've tested this with Qiime2 version 2018.8.

Since BugBase requires as its input, an OTU table picked against the Greengenes database, I chose to cluster the DADA2-denoised reads from the Moving Pictures tutorial using vsearch.

You can use dereplicated sequences without denoising but I personally believe it’s better to utilise denoised reads even if OTU picking methods will ultimately be used.

First though we’ll need some files, which I’ll put into one folder. Download the feature table and representative sequences which are the output of DADA2 from the Moving Pictures tutorial.

rep-seqs-dada2.qza
table-dada2.qza

Next, we’ll need the Greengenes reference database, which we can download from the resource page. I downloaded the 13_5 version here.

We’re going to need to import the 97% OTU fasta file (from the rep-set folder) into Qiime2. You could use a different % here if you want, ex. 99%.

qiime tools import \
--type 'FeatureData[Sequence]' \
--input-path 97_otus.fasta \
--output-path gg_97_otus.qza

Now I use vsearch to cluster the sequences at 97% similarity threshold.

qiime vsearch cluster-features-closed-reference \
--i-sequences rep-seqs-dada2.qza \
--i-table table-dada2.qza \
--i-reference-sequences gg_97_otus.qza \
--p-perc-identity 0.97 \
--o-clustered-table table-cr-97.qza \
--o-clustered-sequences rep-seqs-cr-97.qza \
--o-unmatched-sequences unmatched-seqs \
--verbose

BugBase requires that the biom table be in version 1.0 JSON format, and have taxonomy annotations instead of OTU IDs, so we need to do some adjustments first.

To do this we’ll need the underlying biom table within the table-cr-97.qza artifact.

qiime tools export \
--input-path table-cr-97.qza \
--output-path $PWD

This saves the exported biom table and calls it feature-table.biom.

Right now, our biom table has OTU ID# and not taxonomic annotations. So we’ll go ahead and add those taxonomies in. For this we need the 97_otu_taxonomy.txt file (use the same % identity used at your clustering step) that can be found in the Greegenes files we downloaded earlier within the taxonomy sub-folder.

In order to add taxonomy to our biom file first we need to add a new header to the 97_otu_taxonomy.txt file. We need to add #OTUID and taxonomy to our first line.

echo -e "#OTUID\ttaxonomy" | cat - 97_otu_taxonomy.txt > 97_otu_taxonomy.txt

Now we're ready to add our taxonomy.

biom add-metadata -i feature-table.biom -o feature-table-tax.biom --observation-metadata-fp 97_otu_taxonomy.txt --sc-separated taxonomy

Finally, BugBase requires an older version of BIOM file (V1.0 JSON). I found a crude but simple solution to this which is to just convert our biom file to a .txt file then reconvert back using an older biom version.

First, convert this to a .txt file:

biom convert --table-type="OTU table" -i feature-table-tax.biom -o feature-table-tax.txt --to-tsv --header-key taxonomy

Then re-convert back to the older biom version we need.

biom convert -i feature-table-tax.txt -o feature-table-tax-biom1.biom --table-type="OTU table" --to-json --process-obs-metadata taxonomy

This table is now compatible with BugBase. I tested this by uploading it on the web-based version of BugBase. Optionally you can upload a metadata file though a couple of minor adjustments also need to be made here for BugBase compatibility. The metadata table for this dataset can be downloaded here. You will have to manually rename the first column to sample-id as per BugBase’s requirement, and also delete the second row which describes the column categories. Save this file as a .txt file and you’re set to go. This command will accomplish both these requirements by first deleting line 2 and then saving the file as .txt file.

sed '2d' metadata.tsv > metadata.txt

You can now run this through BugBase. The image below was obtained using the web version of BugBase with all default settings and groups set to BodySite. It shows the predicted phenotypic trait (Aerobic) of the different body sites.

qiime2-to-bugbase's People

Contributors

mestaki avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.