sbrg / bigg_models Goto Github PK
View Code? Open in Web Editor NEWThe BiGG Models website server
Home Page: http://bigg.ucsd.edu
License: Other
The BiGG Models website server
Home Page: http://bigg.ucsd.edu
License: Other
Searching for geneProduct label "b3997" yields several synonyms:
SELECT *
FROM synonym s, gene g
WHERE s.ome_id = g.id and ome_id IN (
SELECT ome_id
FROM synonym
WHERE synonym = 'b3997');
But there is no entry in link_out and no further hint where this id originates from. However, it can be found in KEGG GENES: http://identifiers.org/kegg.genes/eco:b3997.
Question: how could a reference for a gene best be identified? In this example, the prefix "eco:" in the id and the link_out to KEGG GENES are missing.
Some models have interesting descriptive reaction names (e.g. iJO1366), while others do not (e.g. iND750). If we load iND750 before iJO1366, then we lose all the reaction names. e.g. http://yersinia.ucsd.edu:8887/models/iJO1366/reactions/GAPD
When loading e.g. iJO1366, for each reaction, check if name == bigg_id in the existing Reaction row or if name == "". If the model that we're loading has a better name (not equal to bigg_id), then replace the existing name with the new one.
stage on yersinia:8888 and production on yersinia:8887
Let's brainstorm ideas for building, curating, and extending models models with BiGG.
Issues:
51696273 | 5 | BIOPATH | 51671169 | metabolite
51696274 | 10-Methylenetetrahydrofolate | BIOPATH | 51671169 | metabolite
On the back end, make sure models can be dumped.
On the front end, set up the correct SBML and JSON download links on model pages.
when BiGG 2.0 goes up at bigg.ucsd.edu, we need a banner linking back to bigg.ucsd.edu/bigg1 which points to the old BiGG.
After loading all the models:
In [23]: session.query(Component.bigg_id, Compartment.bigg_id, Model.bigg_id, Reaction.bigg_id).join(CompartmentalizedComponent, CompartmentalizedComponent.component_id==Component.id).join(Compartment, Compartment.id==CompartmentalizedComponent.compartment_id).join(ReactionMatrix, ReactionMatrix.compartmentalized_component_id==CompartmentalizedComponent.id).join(Reaction).join(ModelReaction).join(Model).filter(Model.bigg_id=='iAF1260').filter(Reaction.bigg_id.like('DADA')).all()
Out[23]:
[(u'dad__2', u'c', u'iAF1260', u'DADA'),
(u'din', u'c', u'iAF1260', u'DADA'),
(u'h2o', u'c', u'iAF1260', u'DADA'),
(u'h', u'c', u'iAF1260', u'DADA'),
(u'nh4', u'c', u'iAF1260', u'DADA'),
(u'dad_2', u'c', u'iAF1260', u'DADA')]
This breaks my current version of test_loaded_data.py
Trying with just iAF1260 now to isolate the problem.
From universal metabolite, link to universal reaction, not to ModelReactions.
We should add links to publications for each model page.
in ome/loading/model_loading/test/test_sbml.py
Also check which bigg_id's are being changed sure model loading, and log them.
We need to make it easier to use models from other groups with non-human-readable IDs, like the latest yeast models:
Those models have KEGG Reaction IDs, so if we can map our BIGG2 reaction IDs to KEGG reaction IDs, then we can add BIGG2 ideas to any of those models.
This makes BIGG2 ideas way more important for the GEM community.
The following resources from the data_source table are difficult to map to MIRIAM resources:
name | Problem |
---|---|
EnsemblGenomes | There are multiple databases: Bacteria, Fungi, Metazoa, Plants, Protists. I could lookup which one to use by analysing the linage of the organism based on its NCBI taxon id. |
GI | Unclear what database has this id. |
IMGT/GENE-DB | This could refer IMGT HLA or IMGT LIGM |
MIM | This could refer to one or neither of the databases ABS, MimoDB, OMIM, or Orphanet Rare Disease Ontology |
PSEUDO | I assume this refers to the Pseudomonas Genome Database, but it could also be Pathema (WARNING: deprecated!) or UniGene (WARNING: low up-time!) |
UniProtKB/Swiss-Prot | This could be UniProt Isoform or UniProt Knowledgebase. There are also other possible databases, but I assume thse are the main ones (see http://www.ebi.ac.uk/miriam/main/search?query=UniProtKB) |
UniProtKB/TrEMBL | Same as for the UniProtKB/Swiss-Prot case |
old id | What is this? Older BiGG release? |
Have to change line 158 of sbml.py from pop
to get
. In the future, we need to report this as a bug to COBRApy, or, if it is not a bug, then find another workaround.
Hi,
I would like to comment on the SBML files that are provided for download:
Thanks
Andreas
Talk more about about how to implement it.
BIGG2 Notes:
@jslu9 Can you put the other ones here?
Traceback (most recent call last):
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tornado/web.py", line 1346, in _when_complete
callback()
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tornado/web.py", line 1367, in _execute_method
self._when_complete(method(*self.path_args, **self.path_kwargs),
File "server.py", line 209, in post
reactionList.append(reaction.name)
UnboundLocalError: local variable 'reactionList' referenced before assignment
Instead of looping through models:
for model_bigg_id in model_list:
Use the in operator to filter for any matching models:
.filter(Model.bigg_id.in_(model_list))
As discussed today in code talk we probably want to try and make the examples more prevalent on the main page so people know what to search for.
Still to do:
iJO1366 has some ugly names. A few examples to fix:
M_lipidA_core_e_p M_2_3_2'3'_Tetrakis_beta_hydroxymyristoyl__D_glucosaminyl_1_6_beta_D_glucosamine_1_4'_bisphosphate_C68H126N2O23P2 periplasm
M_2amsa_c M_2_Aminomalonate_semialdehyde_C3H5NO3 cytosol
M_2mcit_c M_2_Methylcitrate_C7H7O7 cytosol
M_ohpb_c M_2_Oxo_3_hydroxy_4_phosphobutanoate_C4H4O8P cytosol
M_23camp_e M_2__3__Cyclic_AMP_C10H11N5O6P extracellular
M_23camp_p M_2__3__Cyclic_AMP_C10H11N5O6P periplasm
M_3hoxpac_e M_3_Hydroxyphenylacetic_acid_C8H8O3 extracellular
M_3ntym_e M_3_Nitrotyramine_C8H15N2 extracellular
M_35cgmp_c M_3__5__Cyclic_GMP_C10H11N5O7P cytosol
M_4ahmmp_c M_4_Amino_5_hydroxymethyl_2_methylpyrimidine_C6H9N3O cytosol
M_4abutn_c M_4_Aminobutanal_C4H10NO cytosol
M_4hthr_c M_4_Hydroxy_L_threonine_C4H9NO4 cytosol
M_4hthr_e M_4_Hydroxy_L_threonine_C4H9NO4 extracellular
M_4hbz_c M_4_Hydroxybenzoate_C7H5O3 cytosol
M_4hoxpac_e M_4_Hydroxyphenylacetic_acid_C8H8O3 extracellular
M_5fthf_c M_5_Formyltetrahydrofolate_C20H21N7O7 cytosol
M_6apa_e M_6_Aminopenicillanic_acid_C8H12N2O3S extracellular
M_acac_c M_Acetoacetate_C4H5O3 cytosol
M_acac_e M_Acetoacetate_C4H5O3 extracellular
M_acac_p M_Acetoacetate_C4H5O3 periplasm
Probably the same for other models.
Thanks @draeger
It seems that for 148 bigg ids there is a version that starts with an
underscore and also a version that does not start with an underscore:
bigg=# select count(c1.bigg_id) from component c1 where c1.bigg_id in
(select concat(‘_’, c2.bigg_id) from component c2);
count
———
148
(1 row)
Examples:
bigg=# select bigg_id from component where bigg_id like ‘%3fe4s%’;
bigg_id
————
3fe4s
_3fe4s
(2 rows)
Solution: remove a leading underscore from all IDs during the loading process (probably in ome/loading/model_loading/parse.py:id_for_new_id_style()
)
Pls push latest, to avoid this error:
Traceback (most recent call last):
File "load_db.py", line 124, in <module>
model_id,genome_id,model_creation_timestamp = line.rstrip('\n').split(',')
ValueError: too many values to unpack
Thought it might be cool for people using cobrapy to be able to just request a model object. That way it can always be validated and the latest/greatest.
In server.py
Fix BiGG ID spec to fix these cases:
‘M_lipa_cold_e’. Invalid compartment code: ‘cold’
‘M_lipa_cold_p’. Invalid compartment code: ‘cold’
‘M_lipa_cold_c’. Invalid compartment code: ‘cold’
‘M_sertrna_sec__c’. Invalid compartment code: ‘sec-‘
‘M_lipidA_core_e_p’. Invalid compartment code: ‘core’
‘M_lipidA_core_e_p’. Invalid tissue code: ‘p’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘biomass’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘iJO1366’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘WT’
‘R_Ec_biomass_iJO1366_WT_53p95M’. Invalid compartment code: ‘53p95M’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘biomass’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘iJO1366’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘core’
‘R_Ec_biomass_iJO1366_core_53p95M’. Invalid compartment code: ‘53p95M’
‘R_EX_lipa_cold_e’. Invalid compartment code: ‘cold’
‘R_ALATA_D2’. Invalid compartment code: ‘D2’
‘R_ALATA_L’. Invalid compartment code: ‘L’
‘R_ALATA_L2’. Invalid compartment code: ‘L2’
‘R_ASPt2_2pp’. Invalid compartment code: ‘2pp’
‘R_ASPt2_3pp’. Invalid compartment code: ‘3pp’
‘R_CLt3_2pp’. Invalid compartment code: ‘2pp’
‘R_CYTBO3_4pp’. Invalid compartment code: ‘4pp’
‘R_F6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_FUMt2_2pp’. Invalid compartment code: ‘2pp’
‘R_FUMt2_3pp’. Invalid compartment code: ‘3pp’
‘R_G6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_GAM6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_LDH_D’. Invalid compartment code: ‘D’
‘R_LDH_D2’. Invalid compartment code: ‘D2’
‘R_MALDt2_2pp’. Invalid compartment code: ‘2pp’
‘R_MALt2_2pp’. Invalid compartment code: ‘2pp’
‘R_MALt2_3pp’. Invalid compartment code: ‘3pp’
‘R_MAN6Pt6_2pp’. Invalid compartment code: ‘2pp’
‘R_MG2t3_2pp’. Invalid compartment code: ‘2pp’
‘R_NAt3_1p5pp’. Invalid compartment code: ‘1p5pp’
‘R_NAt3_2pp’. Invalid compartment code: ‘2pp’
‘R_OROTt2_2pp’. Invalid compartment code: ‘2pp’
‘R_PFK_2’. Invalid compartment code: ‘2’
‘R_PFK_3’. Invalid compartment code: ‘3’
‘R_PSP_L’. Invalid compartment code: ‘L’
‘R_PSP_Lpp’. Invalid compartment code: ‘Lpp’
‘R_RBK_L1’. Invalid compartment code: ‘L1’
‘R_SERD_D’. Invalid compartment code: ‘D’
‘R_SERD_L’. Invalid compartment code: ‘L’
‘R_SUCCt2_2pp’. Invalid compartment code: ‘2pp’
‘R_SUCCt2_3pp’. Invalid compartment code: ‘3pp’
‘R_TARTt2_3pp’. Invalid compartment code: ‘3pp’
‘R_THRD_L’. Invalid compartment code: ‘L’
‘R_EX_acon_C_e’. Invalid compartment code: ‘C’
‘R_EX_btd_RR_e’. Invalid compartment code: ‘RR’
‘R_EX_lipidA_core_e’. Invalid compartment code: ‘core’
@jslu9 🎱
other:
For iJO1366 and iAF1260:
For a given reaction, GAPD:
For a given metabolite, D-Glucose:
As phil pointed out, this could be somewhat of an issue. I believe zak may have already partially or completely solved this.
We need to find a nice way to print the actual schema, so we can discuss it.
We should make the following schema changes, so that the database schema matches COBRApy:
In conjunction with zakandrewking/escher#62.
I found several external ids that don't follow the pattern of external of the particular external data source and also several 'NA' values.
Some examples:
external_id | external_source |
---|---|
NA | PUBCHEMID |
NA | CHEBI |
NA | KEGGID |
3449 | KEGGID |
5209 | KEGGID |
124005??3611 | PUBCHEMID |
124005??3611 | CHEBI |
124005??3611 | KEGGID |
On this page: http://yersinia.ucsd.edu:8887/search?query=recon1
The metabolites link has a typo in it: http://yersinia.ucsd.edu:8887/modelss/RECON1/metabolites
Let's make sure these things are working all the time, for the production server:
What's the best way to do this? Universal chromosomes table?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.