Giter Club home page Giter Club logo

bigg_models's People

Contributors

aebrahim avatar jslu9 avatar npmcdn-to-unpkg-bot avatar npusarla avatar pillmill avatar zakandrewking avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bigg_models's Issues

accuracy

  • genes count are not correct due to copies.
    • find a way to logging.warn() these in the code
  • for all "old IDs", use the new OldIDModelSynonym table to store old ids
    • columns: synonym_id and model_id
    • cases where we record old ids:
      1. when COBRA gene matches a database Gene.name or Gene synonym
      2. If Theseus changes the reaction or metabolite ID
  • write a test script that loads published model and dump model into COBRApy and compares counts for reactions, metabolites, genes

Finding references for genes

Searching for geneProduct label "b3997" yields several synonyms:

SELECT *
FROM synonym s, gene g
WHERE s.ome_id = g.id and ome_id IN (
SELECT ome_id
FROM synonym
WHERE synonym = 'b3997');

But there is no entry in link_out and no further hint where this id originates from. However, it can be found in KEGG GENES: http://identifiers.org/kegg.genes/eco:b3997.

Question: how could a reference for a gene best be identified? In this example, the prefix "eco:" in the id and the link_out to KEGG GENES are missing.

Gene IDs and names

I'm not sure exactly what's going on, but these should match:

screen shot 2015-02-09 at 10 59 53 am

That's from this page: /models/RECON1/reactions/GAPD

Descriptive model names get lost

Some models have interesting descriptive reaction names (e.g. iJO1366), while others do not (e.g. iND750). If we load iND750 before iJO1366, then we lose all the reaction names. e.g. http://yersinia.ucsd.edu:8887/models/iJO1366/reactions/GAPD

When loading e.g. iJO1366, for each reaction, check if name == bigg_id in the existing Reaction row or if name == "". If the model that we're loading has a better name (not equal to bigg_id), then replace the existing name with the new one.

Model building features

Let's brainstorm ideas for building, curating, and extending models models with BiGG.

Issues:

  1. For curation, we need versioning.
  2. User levels: e.g., view, add, and full edit/remove

Short term additions before lab release

  • Add genome_id to model table
    • Using jons supplement for ecoli model genome_ids
    • Using links from old bigg website to figure out genome_ids for those models
  • Add synonyms table using 'gene_synonyms' field of the genbank files
  • Likely will need to use synonyms table to help in mapping GPRs from the older models on the old bigg site
  • Make the cosmetic change of merging compartmentalized_component with model_component, so that there is only model_component that now contains a compartment_id
    • This will make the schema on draw.io identical to zaks schema

Unintuitive gene search results

I am searching for the gene yqhD in E. coli, and I get this result:

screen shot 2015-03-20 at 2 57 45 pm

To a new user, this is almost incomprehensible. We should at least include the organism name here.

Model Dumping

On the back end, make sure models can be dumped.

On the front end, set up the correct SBML and JSON download links on model pages.

link to old bigg

when BiGG 2.0 goes up at bigg.ucsd.edu, we need a banner linking back to bigg.ucsd.edu/bigg1 which points to the old BiGG.

weird bug

After loading all the models:

In [23]: session.query(Component.bigg_id, Compartment.bigg_id, Model.bigg_id, Reaction.bigg_id).join(CompartmentalizedComponent, CompartmentalizedComponent.component_id==Component.id).join(Compartment, Compartment.id==CompartmentalizedComponent.compartment_id).join(ReactionMatrix, ReactionMatrix.compartmentalized_component_id==CompartmentalizedComponent.id).join(Reaction).join(ModelReaction).join(Model).filter(Model.bigg_id=='iAF1260').filter(Reaction.bigg_id.like('DADA')).all()
Out[23]:
[(u'dad__2', u'c', u'iAF1260', u'DADA'),
 (u'din', u'c', u'iAF1260', u'DADA'),
 (u'h2o', u'c', u'iAF1260', u'DADA'),
 (u'h', u'c', u'iAF1260', u'DADA'),
 (u'nh4', u'c', u'iAF1260', u'DADA'),
 (u'dad_2', u'c', u'iAF1260', u'DADA')]

This breaks my current version of test_loaded_data.py

Trying with just iAF1260 now to isolate the problem.

Staging server

  1. Two copies of BiGG in virtualenv environments
  2. Two databases in Postgres: bigg and bigg_stage
  3. Two servers on different ports

BIGG2 ID mapping feature for other models

We need to make it easier to use models from other groups with non-human-readable IDs, like the latest yeast models:

http://yeast.sourceforge.net/

Those models have KEGG Reaction IDs, so if we can map our BIGG2 reaction IDs to KEGG reaction IDs, then we can add BIGG2 ideas to any of those models.

This makes BIGG2 ideas way more important for the GEM community.

Unclear entries in data_source

The following resources from the data_source table are difficult to map to MIRIAM resources:

name Problem
EnsemblGenomes There are multiple databases: Bacteria, Fungi, Metazoa, Plants, Protists. I could lookup which one to use by analysing the linage of the organism based on its NCBI taxon id.
GI Unclear what database has this id.
IMGT/GENE-DB This could refer IMGT HLA or IMGT LIGM
MIM This could refer to one or neither of the databases ABS, MimoDB, OMIM, or Orphanet Rare Disease Ontology
PSEUDO I assume this refers to the Pseudomonas Genome Database, but it could also be Pathema (WARNING: deprecated!) or UniGene (WARNING: low up-time!)
UniProtKB/Swiss-Prot This could be UniProt Isoform or UniProt Knowledgebase. There are also other possible databases, but I assume thse are the main ones (see http://www.ebi.ac.uk/miriam/main/search?query=UniProtKB)
UniProtKB/TrEMBL Same as for the UniProtKB/Swiss-Prot case
old id What is this? Older BiGG release?

COBRApy fork for SBML input

Have to change line 158 of sbml.py from pop to get. In the future, we need to report this as a bug to COBRApy, or, if it is not a bug, then find another workaround.

SBML fixes

  • In COBRA, model.id should be the same as the xml filename (e.g. iND750), and this should become Model.bigg_id in the database.

SBML format for download

Hi,

I would like to comment on the SBML files that are provided for download:

  • GeneAssociations were only a suggestion in FBC version 1. Just recently they made it into the new standard FBC version 2. When validating the model iJO1366, for instance, the SBML validator complains that this format is invalid because it doesn't know the genes etc.
  • I found empty notes elements in species and no MIRIAM annotation. We could easily add references to identifiers that specify each model component, because the information is in BiGG.
  • We should also use a few SBO terms to annotate types of model components, such as "simple chemical", "physical compadrtment", "transport reaction", etc.
  • Subsystems could be specified by grouping elements together using the groups package in SBML.
  • The authors of models could be added in a model history, also the creation/modification dates, a link to the publication, and some general notes.

Thanks
Andreas

Open Design Questions and Bugs

BIGG2 Notes:

  1. Theseus doesn’t have a date for first created
  2. Grmit uses wids while theseus doesn’t
  3. Grmit loads one model at a time because of the renaming. I should automate renaming by parsing the strings
  4. No reaction full/official name
  5. Every list is displayed in a column of 3. We may want to vary the column depending on the type of results i.e. genes, metabolites, reactions, models
  6. I should change the reaction list on the metabolite page because the reaction list is really long sometimes.
  7. Gene handler, metabolite handler should show reactions that contain the gene/metabolite AND are from the specified model. (should be fixed)
  8. SBML, search, help, advance search doesn’t work
  9. Reaction string is different from theseus and grmit
    a. The number of occurrences should follow stoichiometry value (should be fixed)

Advance search with no keyword = bug

Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tornado/web.py", line 1346, in _when_complete
    callback()
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tornado/web.py", line 1367, in _execute_method
    self._when_complete(method(*self.path_args, **self.path_kwargs),
  File "server.py", line 209, in post
    reactionList.append(reaction.name)
UnboundLocalError: local variable 'reactionList' referenced before assignment

speed up search with IN operator

Instead of looping through models:

for model_bigg_id in model_list:

Use the in operator to filter for any matching models:

.filter(Model.bigg_id.in_(model_list))

determine which models were loaded

Still to do:

  • figure out which models are not loading
  • figure out which models do not have standard BiGG style IDs (e.g. look for models without pyr)

Make pretty names

iJO1366 has some ugly names. A few examples to fix:

M_lipidA_core_e_p   M_2_3_2'3'_Tetrakis_beta_hydroxymyristoyl__D_glucosaminyl_1_6_beta_D_glucosamine_1_4'_bisphosphate_C68H126N2O23P2   periplasm
M_2amsa_c   M_2_Aminomalonate_semialdehyde_C3H5NO3  cytosol
M_2mcit_c   M_2_Methylcitrate_C7H7O7    cytosol
M_ohpb_c    M_2_Oxo_3_hydroxy_4_phosphobutanoate_C4H4O8P    cytosol
M_23camp_e  M_2__3__Cyclic_AMP_C10H11N5O6P  extracellular
M_23camp_p  M_2__3__Cyclic_AMP_C10H11N5O6P  periplasm
M_3hoxpac_e M_3_Hydroxyphenylacetic_acid_C8H8O3 extracellular
M_3ntym_e   M_3_Nitrotyramine_C8H15N2   extracellular
M_35cgmp_c  M_3__5__Cyclic_GMP_C10H11N5O7P  cytosol
M_4ahmmp_c  M_4_Amino_5_hydroxymethyl_2_methylpyrimidine_C6H9N3O    cytosol
M_4abutn_c  M_4_Aminobutanal_C4H10NO    cytosol
M_4hthr_c   M_4_Hydroxy_L_threonine_C4H9NO4 cytosol
M_4hthr_e   M_4_Hydroxy_L_threonine_C4H9NO4 extracellular
M_4hbz_c    M_4_Hydroxybenzoate_C7H5O3  cytosol
M_4hoxpac_e M_4_Hydroxyphenylacetic_acid_C8H8O3 extracellular
M_5fthf_c   M_5_Formyltetrahydrofolate_C20H21N7O7   cytosol
M_6apa_e    M_6_Aminopenicillanic_acid_C8H12N2O3S   extracellular
M_acac_c    M_Acetoacetate_C4H5O3   cytosol
M_acac_e    M_Acetoacetate_C4H5O3   extracellular
M_acac_p    M_Acetoacetate_C4H5O3   periplasm

Probably the same for other models.

Fix duplicate IDs with leading underscores

Thanks @draeger

It seems that for 148 bigg ids there is a version that starts with an
underscore and also a version that does not start with an underscore:

bigg=# select count(c1.bigg_id) from component c1 where c1.bigg_id in
(select concat(‘_’, c2.bigg_id) from component c2);
 count
———
   148
(1 row)

Examples:

bigg=# select bigg_id from component where bigg_id like ‘%3fe4s%’;
 bigg_id
————
 3fe4s
 _3fe4s
(2 rows)

Solution: remove a leading underscore from all IDs during the loading process (probably in ome/loading/model_loading/parse.py:id_for_new_id_style())

Error loading

Pls push latest, to avoid this error:

Traceback (most recent call last):
  File "load_db.py", line 124, in <module>
    model_id,genome_id,model_creation_timestamp = line.rstrip('\n').split(',')
ValueError: too many values to unpack

Search bar completion

  • search by metabolite gene reaction and model
  • search by organism, locus id
  • search for universal reaction and metabolite, not model reaction and model metabolite
  • "Search" button next to search bar on the home page

BIGG2 as a CDN of sorts for models?

Thought it might be cool for people using cobrapy to be able to just request a model object. That way it can always be validated and the latest/greatest.

Universal tables

  • Loading universal components
    1. If KEGG_ID is new, then add a new universal component
    2. If KEGG_ID is not new, then add connect to existing universal components
      • New column in model_compartmentalized_component for old_bigg_id
    3. If no KEGG_ID, then add a new universal component, and flag the row
  • Loading universal reactions
    • Compare new reactions by stoichiometry
    • check each metabolite and coefficient

IDs causing trouble

Fix BiGG ID spec to fix these cases:

  1. M_sertrna_sec__c
  2. M_lipidA_core_e_p
  3. R_Ec_biomass_iJO1366_WT_53p95M

‘M_lipa_cold_e’. Invalid compartment code: ‘cold’
‘M_lipa_cold_p’. Invalid compartment code: ‘cold’
‘M_lipa_cold_c’. Invalid compartment code: ‘cold’
‘M_sertrna_sec__c’. Invalid compartment code: ‘sec-‘
‘M_lipidA_core_e_p’. Invalid compartment code: ‘core’
‘M_lipidA_core_e_p’. Invalid tissue code: ‘p’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘biomass’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘iJO1366’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘WT’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘53p95M’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘biomass’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘iJO1366’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘core’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘53p95M’
‘R_EX_lipa_cold_e’. Invalid compartment code: ‘cold’
‘R_ALATA_D2’. Invalid compartment code: ‘D2’
‘R_ALATA_L’. Invalid compartment code: ‘L’
‘R_ALATA_L2’. Invalid compartment code: ‘L2’
‘R_ASPt2_2pp’. Invalid compartment code: ‘2pp’
‘R_ASPt2_3pp’. Invalid compartment code: ‘3pp’
‘R_CLt3_2pp’. Invalid compartment code: ‘2pp’
‘R_CYTBO3_4pp’. Invalid compartment code: ‘4pp’
‘R_F6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_FUMt2_2pp’. Invalid compartment code: ‘2pp’
‘R_FUMt2_3pp’. Invalid compartment code: ‘3pp’
‘R_G6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_GAM6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_LDH_D’. Invalid compartment code: ‘D’
‘R_LDH_D2’. Invalid compartment code: ‘D2’
‘R_MALDt2_2pp’. Invalid compartment code: ‘2pp’
‘R_MALt2_2pp’. Invalid compartment code: ‘2pp’
‘R_MALt2_3pp’. Invalid compartment code: ‘3pp’
‘R_MAN6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_MG2t3_2pp’. Invalid compartment code: ‘2pp’
‘R_NAt3_1p5pp’. Invalid compartment code: ‘1p5pp’
‘R_NAt3_2pp’. Invalid compartment code: ‘2pp’
‘R_OROTt2_2pp’. Invalid compartment code: ‘2pp’
‘R_PFK_2’. Invalid compartment code: ‘2’
‘R_PFK_3’. Invalid compartment code: ‘3’
‘R_PSP_L’. Invalid compartment code: ‘L’
‘R_PSP_Lpp’. Invalid compartment code: ‘Lpp’
‘R_RBK_L1’. Invalid compartment code: ‘L1’
‘R_SERD_D’. Invalid compartment code: ‘D’
‘R_SERD_L’. Invalid compartment code: ‘L’
‘R_SUCCt2_2pp’. Invalid compartment code: ‘2pp’
‘R_SUCCt2_3pp’. Invalid compartment code: ‘3pp’
‘R_TARTt2_3pp’. Invalid compartment code: ‘3pp’
‘R_THRD_L’. Invalid compartment code: ‘L’
‘R_EX_acon_C_e’. Invalid compartment code: ‘C’
‘R_EX_btd_RR_e’. Invalid compartment code: ‘RR’
‘R_EX_lipidA_core_e’. Invalid compartment code: ‘core’

Suggestions from the lab

@jslu9 🎱

  • stoichiometry on universal reaction page
  • universal compartment page for Miriam compliance
    • empty pages for each compartment for now
  • search by linkout id
    • in advanced search, dropdown menu for linkout type and input box for linkout value
  • who to email after opening
    • write all comment data to a new table, and set up weekly digest emails
  • AF: search results: when reactions and metabolites are listed, actual name in addition to abbreviation
    • after ~20 characters, cut it off with a "..."
  • AD: add charge column (integer, nullable) to the Metabolite table

@zakandrewking :

  • escher map link bug
  • give Miriam early access to BiGG release
  • ask nate about metabolites image on homepage
  • Escher maps are slow!
  • add license from BiGG 1.0 (legalese)
  • funding, SBRG, UCSD logos
  • the NCBI taxon id for a species?
  • the authors of a model (i.e., given and family name, organization, e-mail address)?

other:

  • dropdown menu for comment types (bug vs. reconstruction issue)
  • download published and BiGG database SBML files
  • advanced search RECON1 not working

Quantify the completeness of universal tables

For iJO1366 and iAF1260:

  1. How many reactions are shared in Universal reactions?
  2. How many metabolites are shared in Universal metabolites?

For a given reaction, GAPD:

  1. How many models share that entry in the Universal reactions?

For a given metabolite, D-Glucose:

  1. How many models share that entry in the Universal metabolites?

Gene IDs and names

I'm not sure exactly what's going on, but these should match:

screen shot 2015-02-09 at 10 59 53 am

That's from this page: /models/RECON1/reactions/GAPD

Schema changes

We should make the following schema changes, so that the database schema matches COBRApy:

  • Add a description column to the Model table.
  • Rename modelversion to model_version.
  • Change Synonyms to Synonym
  • Metabolite.long_name to Metabolite.name
  • remove or rename Metabolite.flag
  • Reaction.biggid to Reaction.bigg_id, and make bigg_id the unique constraint, not name
  • remove Reaction.long_name
  • add a Component.bigg_id, and make bigg_id a unique constraint
  • add a Compartment.bigg_id
  • make a 'long' table in place of all the linkouts in Metabolite
  • change Gene.locus_id to Gene.bigg_id
  • add objective_coefficient, lower_bound, upper_bound to ModelReaction
  • change gpr to gene_reaction_rule and GPRMatrix to GeneReactionMatrix

Invalid external identifiers

I found several external ids that don't follow the pattern of external of the particular external data source and also several 'NA' values.

Some examples:

external_id external_source
NA PUBCHEMID
NA CHEBI
NA KEGGID
3449 KEGGID
5209 KEGGID
124005??3611 PUBCHEMID
124005??3611 CHEBI
124005??3611 KEGGID

General UX robustness

Let's make sure these things are working all the time, for the production server:

  • Search
  • Advanced search, with any combination of options
    • Require keywords for advanced search
  • Clicking around between all the pages

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.