Giter Club home page Giter Club logo

microtrait-hmm's Introduction

microtrait-hmm database

microtrait-hmm is a database of profile families and associted profile hidden Markov models (HMMs) that underlie MicroTrait pipeline to extract fitness traits from microbial genomes.

Each microtrait-hmm model represents protein family sequence diversity accumulated in genomes and metagenomes in IMG/M.

HMM training pipeline

HMM benchmarking pipeline

microtrait-hmm's People

Contributors

ukaraoz avatar

Stargazers

Chi Liu avatar Jie Zhu avatar

Watchers

 avatar

Forkers

ellachenyc

microtrait-hmm's Issues

[Question] Discrepancy between traits on GitHub, supplementary tables, and hierarchy confusion?

I'm trying to understand how microtrait-hmm traits are organized and how to work my way up the hierarchy.

Here are the trait HMMs here:
https://github.com/ukaraoz/microtrait-hmm/blob/master/data

  • There are 1720 HMM that were downloaded (2 were excluded see Issue #4)
  • This table here: data/microtraithmm2dbxref.txt has 1594 traits. My understanding is these only contain traits that could be validated with KEGG.

Now separately, we have the Supplementary Tables from your manuscript (attached):

  • Table S2(ST2.microtrait_hmms) contains 2298 HMM traits. This is not a superset of the 1720 HMMs as there are 113 HMMs on the GitHub that are not in Table S2:
{'adh1',  'adh17',  'adh4',  'adh6',  'aer',  'aldh9A1',  'cheA',  'cheB',  'cheBR',  'cheC',  'cheD',  'cheR',  'cheV',  'cheW',  'cheX',  'cheY',  'cheZ',  'cys',  'dppA2',  'echs1',  'ehhadh',  'ffh',  'fha1',  'ftsY',  'gspC',  'gspD',  'gspE',  'gspF',  'gspG',  'gspH',  'gspI',  'gspJ',  'gspK',  'gspL',  'gspM',  'gspS',  'hadha',  'hcp',  'hlyB',  'hlyD',  'hoxF',  'hoxU',  'hydA',  'hydB',  'hydg',  'impK',  'impL',  'lbp',  'lhpp',  'mcp',  'narI',  'nit6',  'nr',  'ppkA',  'secA',  'secB',  'secD',  'secDF',  'secE',  'secF',  'secG',  'secM',  'secY',  'shlA',  'shlB',  'stp1',  'suld',  'tap',  'tar',  'tatA',  'tatB',  'tatC',  'tatE',  'tolC',  'trg',  'tsr',  'tst',  'ttrB',  'vacA',  'vasDa',  'vasG',  'vgrG',  'virB1',  'virB10',  'virB11',  'virB2',  'virB3',  'virB4',  'virB5',  'virB6',  'virB7',  'virB8',  'virB9',  'virD4',  'yadA',  'yadB_C',  'yajC',  'yidC',  'yscC',  'yscF',  'yscJ',  'yscL',  'yscN',  'yscO',  'yscP',  'yscQ',  'yscR',  'yscS',  'yscT',  'yscU',  'yscV',  'yscW',  'yscX'}

What is the cause of the discrepancy between HMMs?

  • Table S5 (ST5.microtrait_rule-to-traits) contains all the trait combinations that make up a rule. I've parsed all of the traits within here and there are 1414 HMMs.

Are some traits not associated with any rules?

Can you describe exactly what is meant between: binary, count, and count_by_substrate in the microtrait_rule-type field of Table S5? I read the paper Methods but this part wasn't clear to me. Binary I'm assuming is presence absence but what is meant by count and count_by_substrate?

Why are traits missing fields for micro trait_trait-name1, micro trait_trait-name2, micro trait_trait-name3? I'm confused between the usage of traits and rules here. Are the traits hierarchical or the rules? For example, there are 975 rules here but 628 of them are missing fields for those columns.

  • Table S7 (ST7.microtrait_traits) there are only 326 traits. Why is this different than the number of HMM traits on GitHub or the other tables?

Very interested in using this but I need more information before I can use this on our dataset.

Table 1.XLSX

Case sensitive HMMs of the same name have different trusted cutoffs/lengths (e.g., hydG & hydg)

https://github.com/ukaraoz/microtrait-hmm/blob/master/data/out/hmm/hydg.hmm

HMMER3/f [3.2 | June 2018]
NAME  hydg
LENG  318
ALPH  amino
RF    no
MM    no
CONS  yes
CS    no
MAP   yes
DATE  Tue May 28 08:13:00 2019
NSEQ  52
EFFN  0.701416
CKSUM 1456715597
TC    565 565;
STATS LOCAL MSV      -11.2468  0.70085
STATS LOCAL VITERBI  -11.8797  0.70085
STATS LOCAL FORWARD   -5.6529  0.70085

https://github.com/ukaraoz/microtrait-hmm/blob/master/data/out/hmm/hydG.hmm

HMMER3/f [3.2 | June 2018]
NAME  hydG
LENG  295
ALPH  amino
RF    no
MM    no
CONS  yes
CS    no
MAP   yes
DATE  Thu Jun 13 00:27:38 2019
NSEQ  67
EFFN  0.633850
CKSUM 3242727819
TC    438.3 438.3;
STATS LOCAL MSV      -10.8101  0.70145
STATS LOCAL VITERBI  -11.7817  0.70145
STATS LOCAL FORWARD   -5.4822  0.70145

In the metadata file, both are in there with the same entries:

hydG	1.1.1.298	K15039	3-hydroxypropionate dehydrogenase (NADP+) [EC:1.1.1.298] 
hydg	1.1.1.298	K15039	3-hydroxypropionate dehydrogenase (NADP+) [EC:1.1.1.298]

Which one should be used? When clone the repo locally, only the lower case ones downloaded because of conflicting case sensitive file paths.

This happened in the following cases:

  'data/in/faa/hydG.faa'
  'data/in/faa/hydg.faa'
  'data/in/faa/lhpP.faa'
  'data/in/faa/lhpp.faa'
  'data/out/hmm/hydG.hmm'
  'data/out/hmm/hydg.hmm'
  'data/out/hmm/lhpP.hmm'
  'data/out/hmm/lhpp.hmm'

Total number of HMM profiles.

Hi,

I am trying to use the HMM profiles generated in this study to profile the traits of a set of genomes of my own. I found that the numbers of HMM profiles you uploaded here do not match the list of HMM profiles (Supplementary Table 2) in your original publication. Also, I am wondering what's the difference between hmm profiles in data/out/hmm/ and data.kb_hmmer/hmm ?

Many thanks in advance!
Qiqi

error while downloading hmm data

I have successfully installed microtrait in R but when i followed the instructions :

download hmm assets from microtrait-hmm repository into the microtrait install directory
piggyback::pb_download("hmm.tar.gz",
repo = "ukaraoz/test",
dest = file.path(.libPaths(),"microtrait", "extdata"))

it gives me error:
Error in $<-.data.frame(*tmp*, "dest", value = c("/usr/local/lib/R/site-library/microtrait/extdata", :
replacement has 3 rows, data has 0
In addition: Warning messages:
1: In get_token() : Using default public GITHUB_TOKEN.
Please set your own token
2: In piggyback::pb_download("hmm.tar.gz", repo = "ukaraoz/test", dest = file.path(.libPaths(), :
file(s) hmm.tar.gz not found in repo ukaraoz/test

I found that there is no hmm.tar.gz file ukaraoz/test and nor in ukaraoz/microtrait-hmm

Please suggest possible solution

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.