sbrg / bigg_models Goto Github PK

View Code? Open in Web Editor NEW

75.0 75.0 18.0 67.64 MB

The BiGG Models website server

Home Page: http://bigg.ucsd.edu

License: Other

Python 51.83% XSLT 3.49% CSS 6.36% JavaScript 14.75% HTML 23.15% Shell 0.13% PLpgSQL 0.30%

bigg_models's People

Contributors

Stargazers

Watchers

Forkers

npmcdn-to-unpkg-bot gitter-badger tohid89 jlerman44 leandroagudelo189 jingyi-cai cnorsig squarefaceyao npusarla liupfskygre yangluom liudonglianghi thesamurai1997 kxu68 schmoho zhendongzhang-synbio youmustfight

bigg_models's Issues

accuracy

genes count are not correct due to copies.
- find a way to logging.warn() these in the code
for all "old IDs", use the new OldIDModelSynonym table to store old ids
- columns: synonym_id and model_id
- cases where we record old ids:
  1. when COBRA gene matches a database Gene.name or Gene synonym
  2. If Theseus changes the reaction or metabolite ID
write a test script that loads published model and dump model into COBRApy and compares counts for reactions, metabolites, genes

Finding references for genes

Searching for geneProduct label "b3997" yields several synonyms:

SELECT *
FROM synonym s, gene g
WHERE s.ome_id = g.id and ome_id IN (
SELECT ome_id
FROM synonym
WHERE synonym = 'b3997');

But there is no entry in link_out and no further hint where this id originates from. However, it can be found in KEGG GENES: http://identifiers.org/kegg.genes/eco:b3997.

Question: how could a reference for a gene best be identified? In this example, the prefix "eco:" in the id and the link_out to KEGG GENES are missing.

Gene IDs and names

I'm not sure exactly what's going on, but these should match:

That's from this page: /models/RECON1/reactions/GAPD

Descriptive model names get lost

Some models have interesting descriptive reaction names (e.g. iJO1366), while others do not (e.g. iND750). If we load iND750 before iJO1366, then we lose all the reaction names. e.g. http://yersinia.ucsd.edu:8887/models/iJO1366/reactions/GAPD

When loading e.g. iJO1366, for each reaction, check if name == bigg_id in the existing Reaction row or if name == "". If the model that we're loading has a better name (not equal to bigg_id), then replace the existing name with the new one.

Get yersinia going

stage on yersinia:8888 and production on yersinia:8887

Model building features

Let's brainstorm ideas for building, curating, and extending models models with BiGG.

Issues:

For curation, we need versioning.
User levels: e.g., view, add, and full edit/remove

biopath linkouts importing incorrectly; internal commas

 51696273 | 5                                                                 | BIOPATH         | 51671169 | metabolite
 51696274 | 10-Methylenetetrahydrofolate                                      | BIOPATH         | 51671169 | metabolite

Short term additions before lab release

Add genome_id to model table
- Using jons supplement for ecoli model genome_ids
- Using links from old bigg website to figure out genome_ids for those models
Add synonyms table using 'gene_synonyms' field of the genbank files
Likely will need to use synonyms table to help in mapping GPRs from the older models on the old bigg site
Make the cosmetic change of merging compartmentalized_component with model_component, so that there is only model_component that now contains a compartment_id
- This will make the schema on draw.io identical to zaks schema

Unintuitive gene search results

I am searching for the gene yqhD in E. coli, and I get this result:

To a new user, this is almost incomprehensible. We should at least include the organism name here.

Model Dumping

On the back end, make sure models can be dumped.

On the front end, set up the correct SBML and JSON download links on model pages.

link to old bigg

when BiGG 2.0 goes up at bigg.ucsd.edu, we need a banner linking back to bigg.ucsd.edu/bigg1 which points to the old BiGG.

weird bug

After loading all the models:

In [23]: session.query(Component.bigg_id, Compartment.bigg_id, Model.bigg_id, Reaction.bigg_id).join(CompartmentalizedComponent, CompartmentalizedComponent.component_id==Component.id).join(Compartment, Compartment.id==CompartmentalizedComponent.compartment_id).join(ReactionMatrix, ReactionMatrix.compartmentalized_component_id==CompartmentalizedComponent.id).join(Reaction).join(ModelReaction).join(Model).filter(Model.bigg_id=='iAF1260').filter(Reaction.bigg_id.like('DADA')).all()
Out[23]:
[(u'dad__2', u'c', u'iAF1260', u'DADA'),
 (u'din', u'c', u'iAF1260', u'DADA'),
 (u'h2o', u'c', u'iAF1260', u'DADA'),
 (u'h', u'c', u'iAF1260', u'DADA'),
 (u'nh4', u'c', u'iAF1260', u'DADA'),
 (u'dad_2', u'c', u'iAF1260', u'DADA')]

This breaks my current version of test_loaded_data.py

Trying with just iAF1260 now to isolate the problem.

link to universal reaction

From universal metabolite, link to universal reaction, not to ModelReactions.

Staging server

Two copies of BiGG in virtualenv environments
Two databases in Postgres: bigg and bigg_stage
Two servers on different ports

Publication links

We should add links to publications for each model page.

iJO1366 model test script: growth rate doesn't match

in ome/loading/model_loading/test/test_sbml.py

Also check which bigg_id's are being changed sure model loading, and log them.

BIGG2 ID mapping feature for other models

We need to make it easier to use models from other groups with non-human-readable IDs, like the latest yeast models:

http://yeast.sourceforge.net/

Those models have KEGG Reaction IDs, so if we can map our BIGG2 reaction IDs to KEGG reaction IDs, then we can add BIGG2 ideas to any of those models.

This makes BIGG2 ideas way more important for the GEM community.

Unclear entries in data_source

The following resources from the data_source table are difficult to map to MIRIAM resources:

name	Problem
EnsemblGenomes	There are multiple databases: Bacteria, Fungi, Metazoa, Plants, Protists. I could lookup which one to use by analysing the linage of the organism based on its NCBI taxon id.
GI	Unclear what database has this id.
IMGT/GENE-DB	This could refer IMGT HLA or IMGT LIGM
MIM	This could refer to one or neither of the databases ABS, MimoDB, OMIM, or Orphanet Rare Disease Ontology
PSEUDO	I assume this refers to the Pseudomonas Genome Database, but it could also be Pathema (WARNING: deprecated!) or UniGene (WARNING: low up-time!)
UniProtKB/Swiss-Prot	This could be UniProt Isoform or UniProt Knowledgebase. There are also other possible databases, but I assume thse are the main ones (see http://www.ebi.ac.uk/miriam/main/search?query=UniProtKB)
UniProtKB/TrEMBL	Same as for the UniProtKB/Swiss-Prot case
old id	What is this? Older BiGG release?

COBRApy fork for SBML input

Have to change line 158 of sbml.py from pop to get. In the future, we need to report this as a bug to COBRApy, or, if it is not a bug, then find another workaround.

SBML fixes

In COBRA, model.id should be the same as the xml filename (e.g. iND750), and this should become Model.bigg_id in the database.

SBML format for download

Hi,

I would like to comment on the SBML files that are provided for download:

GeneAssociations were only a suggestion in FBC version 1. Just recently they made it into the new standard FBC version 2. When validating the model iJO1366, for instance, the SBML validator complains that this format is invalid because it doesn't know the genes etc.
I found empty notes elements in species and no MIRIAM annotation. We could easily add references to identifiers that specify each model component, because the information is in BiGG.
We should also use a few SBO terms to annotate types of model components, such as "simple chemical", "physical compadrtment", "transport reaction", etc.
Subsystems could be specified by grouping elements together using the groups package in SBML.
The authors of models could be added in a model history, also the creation/modification dates, a link to the publication, and some general notes.

Thanks
Andreas

Group models into species for e.g. Jon's models

Talk more about about how to implement it.

Open Design Questions and Bugs

BIGG2 Notes:

Theseus doesn’t have a date for first created
Grmit uses wids while theseus doesn’t
Grmit loads one model at a time because of the renaming. I should automate renaming by parsing the strings
No reaction full/official name
Every list is displayed in a column of 3. We may want to vary the column depending on the type of results i.e. genes, metabolites, reactions, models
I should change the reaction list on the metabolite page because the reaction list is really long sometimes.
Gene handler, metabolite handler should show reactions that contain the gene/metabolite AND are from the specified model. (should be fixed)
SBML, search, help, advance search doesn’t work
Reaction string is different from theseus and grmit
a. The number of occurrences should follow stoichiometry value (should be fixed)

serve on bigg.ucsd.edu:8888

compartment names

http://yersinia.ucsd.edu:8888/api/v2/universal/compartments

Successfully load models for submission

iJO1366
iAF1260
RECON1

@jslu9 Can you put the other ones here?

Advance search with no keyword = bug

Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tornado/web.py", line 1346, in _when_complete
    callback()
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tornado/web.py", line 1367, in _execute_method
    self._when_complete(method(*self.path_args, **self.path_kwargs),
  File "server.py", line 209, in post
    reactionList.append(reaction.name)
UnboundLocalError: local variable 'reactionList' referenced before assignment

speed up search with IN operator

Instead of looping through models:

for model_bigg_id in model_list:

Use the in operator to filter for any matching models:

.filter(Model.bigg_id.in_(model_list))

Make examples more prevalent on main page

As discussed today in code talk we probably want to try and make the examples more prevalent on the main page so people know what to search for.

determine which models were loaded

Still to do:

figure out which models are not loading
figure out which models do not have standard BiGG style IDs (e.g. look for models without pyr)

unit testing

Make pretty names

iJO1366 has some ugly names. A few examples to fix:

M_lipidA_core_e_p   M_2_3_2'3'_Tetrakis_beta_hydroxymyristoyl__D_glucosaminyl_1_6_beta_D_glucosamine_1_4'_bisphosphate_C68H126N2O23P2   periplasm
M_2amsa_c   M_2_Aminomalonate_semialdehyde_C3H5NO3  cytosol
M_2mcit_c   M_2_Methylcitrate_C7H7O7    cytosol
M_ohpb_c    M_2_Oxo_3_hydroxy_4_phosphobutanoate_C4H4O8P    cytosol
M_23camp_e  M_2__3__Cyclic_AMP_C10H11N5O6P  extracellular
M_23camp_p  M_2__3__Cyclic_AMP_C10H11N5O6P  periplasm
M_3hoxpac_e M_3_Hydroxyphenylacetic_acid_C8H8O3 extracellular
M_3ntym_e   M_3_Nitrotyramine_C8H15N2   extracellular
M_35cgmp_c  M_3__5__Cyclic_GMP_C10H11N5O7P  cytosol
M_4ahmmp_c  M_4_Amino_5_hydroxymethyl_2_methylpyrimidine_C6H9N3O    cytosol
M_4abutn_c  M_4_Aminobutanal_C4H10NO    cytosol
M_4hthr_c   M_4_Hydroxy_L_threonine_C4H9NO4 cytosol
M_4hthr_e   M_4_Hydroxy_L_threonine_C4H9NO4 extracellular
M_4hbz_c    M_4_Hydroxybenzoate_C7H5O3  cytosol
M_4hoxpac_e M_4_Hydroxyphenylacetic_acid_C8H8O3 extracellular
M_5fthf_c   M_5_Formyltetrahydrofolate_C20H21N7O7   cytosol
M_6apa_e    M_6_Aminopenicillanic_acid_C8H12N2O3S   extracellular
M_acac_c    M_Acetoacetate_C4H5O3   cytosol
M_acac_e    M_Acetoacetate_C4H5O3   extracellular
M_acac_p    M_Acetoacetate_C4H5O3   periplasm

Probably the same for other models.

Fix duplicate IDs with leading underscores

Thanks @draeger

It seems that for 148 bigg ids there is a version that starts with an
underscore and also a version that does not start with an underscore:

bigg=# select count(c1.bigg_id) from component c1 where c1.bigg_id in
(select concat(‘_’, c2.bigg_id) from component c2);
 count
———
   148
(1 row)

Examples:

bigg=# select bigg_id from component where bigg_id like ‘%3fe4s%’;
 bigg_id
————
 3fe4s
 _3fe4s
(2 rows)

Solution: remove a leading underscore from all IDs during the loading process (probably in ome/loading/model_loading/parse.py:id_for_new_id_style())

Error loading

Pls push latest, to avoid this error:

Traceback (most recent call last):
  File "load_db.py", line 124, in <module>
    model_id,genome_id,model_creation_timestamp = line.rstrip('\n').split(',')
ValueError: too many values to unpack

Search bar completion

search by metabolite gene reaction and model
search by organism, locus id
search for universal reaction and metabolite, not model reaction and model metabolite
"Search" button next to search bar on the home page

BIGG2 as a CDN of sorts for models?

Thought it might be cool for people using cobrapy to be able to just request a model object. That way it can always be validated and the latest/greatest.

Make every Handler asynchronous

In server.py

Universal tables

Loading universal components
1. If KEGG_ID is new, then add a new universal component
2. If KEGG_ID is not new, then add connect to existing universal components
  - New column in model_compartmentalized_component for old_bigg_id
3. If no KEGG_ID, then add a new universal component, and flag the row
Loading universal reactions
- Compare new reactions by stoichiometry
- check each metabolite and coefficient

IDs causing trouble

Fix BiGG ID spec to fix these cases:

M_sertrna_sec__c
M_lipidA_core_e_p
R_Ec_biomass_iJO1366_WT_53p95M

‘M_lipa_cold_e’. Invalid compartment code: ‘cold’
‘M_lipa_cold_p’. Invalid compartment code: ‘cold’
‘M_lipa_cold_c’. Invalid compartment code: ‘cold’
‘M_sertrna_sec__c’. Invalid compartment code: ‘sec-‘
‘M_lipidA_core_e_p’. Invalid compartment code: ‘core’
‘M_lipidA_core_e_p’. Invalid tissue code: ‘p’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘biomass’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘iJO1366’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘WT’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘53p95M’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘biomass’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘iJO1366’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘core’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘53p95M’
‘R_EX_lipa_cold_e’. Invalid compartment code: ‘cold’
‘R_ALATA_D2’. Invalid compartment code: ‘D2’
‘R_ALATA_L’. Invalid compartment code: ‘L’
‘R_ALATA_L2’. Invalid compartment code: ‘L2’
‘R_ASPt2_2pp’. Invalid compartment code: ‘2pp’
‘R_ASPt2_3pp’. Invalid compartment code: ‘3pp’
‘R_CLt3_2pp’. Invalid compartment code: ‘2pp’
‘R_CYTBO3_4pp’. Invalid compartment code: ‘4pp’
‘R_F6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_FUMt2_2pp’. Invalid compartment code: ‘2pp’
‘R_FUMt2_3pp’. Invalid compartment code: ‘3pp’
‘R_G6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_GAM6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_LDH_D’. Invalid compartment code: ‘D’
‘R_LDH_D2’. Invalid compartment code: ‘D2’
‘R_MALDt2_2pp’. Invalid compartment code: ‘2pp’
‘R_MALt2_2pp’. Invalid compartment code: ‘2pp’
‘R_MALt2_3pp’. Invalid compartment code: ‘3pp’
‘R_MAN6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_MG2t3_2pp’. Invalid compartment code: ‘2pp’
‘R_NAt3_1p5pp’. Invalid compartment code: ‘1p5pp’
‘R_NAt3_2pp’. Invalid compartment code: ‘2pp’
‘R_OROTt2_2pp’. Invalid compartment code: ‘2pp’
‘R_PFK_2’. Invalid compartment code: ‘2’
‘R_PFK_3’. Invalid compartment code: ‘3’
‘R_PSP_L’. Invalid compartment code: ‘L’
‘R_PSP_Lpp’. Invalid compartment code: ‘Lpp’
‘R_RBK_L1’. Invalid compartment code: ‘L1’
‘R_SERD_D’. Invalid compartment code: ‘D’
‘R_SERD_L’. Invalid compartment code: ‘L’
‘R_SUCCt2_2pp’. Invalid compartment code: ‘2pp’
‘R_SUCCt2_3pp’. Invalid compartment code: ‘3pp’
‘R_TARTt2_3pp’. Invalid compartment code: ‘3pp’
‘R_THRD_L’. Invalid compartment code: ‘L’
‘R_EX_acon_C_e’. Invalid compartment code: ‘C’
‘R_EX_btd_RR_e’. Invalid compartment code: ‘RR’
‘R_EX_lipidA_core_e’. Invalid compartment code: ‘core’

Suggestions from the lab

@jslu9 🎱

stoichiometry on universal reaction page
universal compartment page for Miriam compliance
- empty pages for each compartment for now
search by linkout id
- in advanced search, dropdown menu for linkout type and input box for linkout value
who to email after opening
- write all comment data to a new table, and set up weekly digest emails
AF: search results: when reactions and metabolites are listed, actual name in addition to abbreviation
- after ~20 characters, cut it off with a "..."
AD: add charge column (integer, nullable) to the Metabolite table

@zakandrewking :

escher map link bug
give Miriam early access to BiGG release
ask nate about metabolites image on homepage
Escher maps are slow!
add license from BiGG 1.0 (legalese)
funding, SBRG, UCSD logos
the NCBI taxon id for a species?
the authors of a model (i.e., given and family name, organization, e-mail address)?

other: