Giter Club home page Giter Club logo

neo's Issues

Some genes not available in autocomplete, symbol and id issues

Hi - I'm unable to find a C. elegans gene, cyk-7, that I'm trying to annotate in Noctua.

It is not available in the autocomplete in the form or graph editor.

Here is its entry in our gpi file:

WB WBGene00015591 cyk-7 CYtoKinesis defect CELE_C08C3.4 gene taxon:6239 UniProtKB:P34325

That can be found here:
ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/PRJNA13758/annotation/gene_product_info/c_elegans.PRJNA13758.current.gene_product_info.gpi.gz

Is the C. elegans gpi file being loaded into NEO?

We have discussed a similar issue with another gene in a separate ticket, but I don't think this ever got resolved:

#580

Need to check RGD identifiers in neo

In the Makefile, it looks like rgd is included in the list of sources for annotatable entity identifiers.

However, not all identifiers in the RGD GAF seem to be available for annotation in Noctua.

For example, RGD:1309181 has eight annotations in the current RGD GAF, but isn't in the autocomplete menu as an option. The eight annotations are seven IEAs and one ISO.

Do all entries for a group's GAF get included in NEO or is there some filtering step somewhere that excludes some?

Note that RGD does not submit gpad/gpi yet according to the rgd.yaml file.

@krchristie

NEO no longer building

Currently, the NEO ontology build no longer succeeds on errors like:

11:34:02  [Fatal Error] :1:1: Content is not allowed in prolog.
11:34:02  2021-05-24 11:34:02,183 ERROR (CommandRunner:4815) could not parse:target/neo-wb.obo
 11:34:02  org.semanticweb.owlapi.io.UnparsableOntologyException: Problem parsing file:/var/lib/jenkins/workspace/ology_pipeline_issue-35-neo-test/neo/target/neo-wb.obo

For examination, I've grabbed temporarily grabbed that neo-wb.obo file and made it available here: http://skyhook.berkeleybop.org/neo-wb.obo

It seems like there may be a WormBase issues that is related (an expansion to the WB GPI that happened in the right timeframe), but I've been unable to find it again; in my notes I have "WormBase/website/issues/8222", but this doesn't seem to correspond to anything. @vanaukenk , would you maybe know the correct public reference for this?

Tagging @balhoff @vanaukenk

Remove NEO build dependency on frozen datasets.json

The NEO build depends on the remote asset datasets.json, pushed from the now defunct build.berkeleybop.org.

Problematic:

datasets.json: trigger
	wget http://s3.amazonaws.com/go-public/metadata/datasets.json -O $@ && touch $@

ZFIN genes created in September 2019 and later are not available in Noctua (but available in our GPI)

I want to create an annotation to linc.terminator (ZFIN:ZDB-LINCRNAG-190911-1), but this gene is not available in Noctua. I am using the new Noctua form, but the issue is the same in the graph editor.

  1. I am unable to create an annotation using the ID alone - I don't know if this is an expected behavior.
  2. It looks like the new genes created in September 2019 and later are not available in Noctua.
    (examples: ZFIN:ZDB-LINCRNAG-190911-1, ZFIN:ZDB-GENE-190924-2, ZFIN:ZDB-GENE-200114-3). These genes are available in our GPI files.
    This makes me think that there has been a problem with our GPI files since September 2019:
  • either GOC has not been retrieving our latest GPI since September and/or there has been a problem to add the information in Noctua
  • or there is a problem with our GPI files and they create an error in GO/Noctua (we have not been notified about such issue, but it is a possibility it is happening).
    Could you please look into this? Thank you

Do not remove slashes from labels

When transforming gpi to neo, we normalize the labels. This is because in the past unusual non-ascii characters have slipped in messing up everything (need to report these upstream)

This is currently too strict, e.g. we strip /, resulting in:

id: PR:000037785
name: mEPRSPhos1 Mmus

There should be a slash in the label

For now @ukemi, just type the string without the slash (sorry)

SGD has incomplete GPI

SGD is currently only providing UniProt in the GPI in the metadata--taken directly from protein2go upstream as a "stub". At the time, SGD was not currently using Noctua beyond basic experimentation and it was decided that the stub was more information than none. Now that SGD is giving Noctua more use, the obvious identifier issue has come up and needs to be fixed as we proceed with more serious annotation.

As an example, in the Noctua Form, eg "STE3 Scer” pops up with the UniProt ID instead of SGD ID, so the “search database” doesn’t work.

The potential fix in this case: GO can derive a GPI from some other file.

ensure there is a link from to PR:000000001 to CHEBI:23367 ! molecular entity

Currently there is no way to autocomplete off of non-gene types in the enabled-by field in the annoton box in Noctua. This is because the field is (rightly) pinned to subclasses of CHEBI:23367 ! molecular entity

Currently neo does not import the connecting axioms from PRO:

 / BFO:0000040 ! material entity
  is_a BFO:0000030 ! object
   is_a CHEBI:23367 ! molecular entity
    is_a CHEBI:50047 ! organic amino compound
     is_a PR:000018263 ! amino acid chain
      is_a PR:000000001 ! protein *** 

The bridging axioms should be added to neo

until then, @ukemi should use the "Add individual" box (sorry)

NEO script fails if case is different for "name" (column 3 in gaf)

Didn't mean to open this without comment. Oops.

Anyway, I have 2 gafs that have the following for the first 3 columns of the gaf file that I'm trying to use neo to create a combined owl file to load in noctua:
GR_gene GR:0101186 Zep1
GR_gene GR:0101186 ZEP1

The only difference is the case. If I manually change one of them, make successfully completes.

The error I get on a failure:

Exception in thread "main" org.semanticweb.owlapi.model.OWLOntologyStorageException: org.obolibrary.oboformat.model.FrameStructureException: multiple name tags not allowed. in frame:Frame(http://purl.obolibrary.org/obo/GR_gene_GR%3A0101186 id( http://purl.obolibrary.org/obo/GR_gene_GR%3A0101186{}[])relationship( in_taxon NCBITaxon:4530{}[])name( ZEP1 Oryz{}[])synonym( ZEP1 BROAD{}[NCBITaxon:4530 ])synonym( Os04g0448900 Oryz EXACT{}[])synonym( micro RNA 806a Oryz EXACT{}[])synonym( Zeaxanthin epoxidase 1 Oryz EXACT{}[])synonym( Zep1 BROAD{}[NCBITaxon:4530 ])name( Zep1 Oryz{}[])is_a( CHEBI:23367{}[]))
        at org.coode.owlapi.oboformat.OBOFormatRenderer.render(OBOFormatRenderer.java:79)
        at org.coode.owlapi.oboformat.OBOFormatStorer.storeOntology(OBOFormatStorer.java:74)
        at org.semanticweb.owlapi.util.AbstractOWLOntologyStorer.storeOntology(AbstractOWLOntologyStorer.java:211)
        at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.saveOntology(OWLOntologyManagerImpl.java:1040)
        at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.saveOntology(OWLOntologyManagerImpl.java:1021)
        at owltools.io.ParserWrapper.saveOWL(ParserWrapper.java:265)
        at owltools.io.ParserWrapper.saveOWL(ParserWrapper.java:213)
        at owltools.cli.CommandRunner.runSingleIteration(CommandRunner.java:2922)
        at owltools.cli.CommandRunnerBase.run(CommandRunnerBase.java:76)
        at owltools.cli.CommandRunner.run(CommandRunner.java:237)
        at owltools.cli.CommandRunnerBase.run(CommandRunnerBase.java:68)
        at owltools.cli.CommandRunner.run(CommandRunner.java:237)
        at owltools.cli.CommandLineInterface.main(CommandLineInterface.java:12)
Caused by: org.obolibrary.oboformat.model.FrameStructureException: multiple name tags not allowed. in frame:Frame(http://purl.obolibrary.org/obo/GR_gene_GR%3A0101186 id( http://purl.obolibrary.org/obo/GR_gene_GR%3A0101186{}[])relationship( in_taxon NCBITaxon:4530{}[])name( ZEP1 Oryz{}[])synonym( ZEP1 BROAD{}[NCBITaxon:4530 ])synonym( Os04g0448900 Oryz EXACT{}[])synonym( micro RNA 806a Oryz EXACT{}[])synonym( Zeaxanthin epoxidase 1 Oryz EXACT{}[])synonym( Zep1 BROAD{}[NCBITaxon:4530 ])name( Zep1 Oryz{}[])is_a( CHEBI:23367{}[]))
        at org.obolibrary.oboformat.model.Frame.checkMaxOneCardinality(Frame.java:383)
        at org.obolibrary.oboformat.model.Frame.check(Frame.java:357)
        at org.obolibrary.oboformat.model.OBODoc.check(OBODoc.java:344)
        at org.obolibrary.oboformat.writer.OBOFormatWriter.write(OBOFormatWriter.java:205)
        at org.coode.owlapi.oboformat.OBOFormatRenderer.render(OBOFormatRenderer.java:76)
        ... 12 more
Makefile:28: recipe for target 'neo.obo' failed

Let me know if I need to supply any more info.

Metadata on Gene Product

Currently, NEO integrates metadata from Gene Product either through GPI files (when provided) or through GAF.

During the data ingestion stage, NEO should ensure that each Gene Product has a uniprot xref link for rapid access to additional meta data. This is especially useful for displaying tooltips on mouseover that can instantly fetch data from the uniprot REST API or SPARQL endpoint.

Unique Recommended Name for a GP

@cmungall I mentioned during the hackathon that some GPs have several recommended names (rdfs:label), which should not be the case (at least given the same language), since we have synonyms (oboInOwl:hasExact/BroadSynonym) for that.

Example from RGD (NEO metadata generated during GAF conversion):
SELECT * WHERE { <http://identifiers.org/rgd/1304707> rdfs:label ?label }
-> has Lrfn1 Rnor and Lrfn1

Example from MGI (NEO metadata generated using GPI):
SELECT * WHERE { <http://identifiers.org/mgi/MGI:3588192> rdfs:label ?label }
-> has 3 rdfs:label (Rtl4 Mmus, Rtl4, zcchc16 Mmus)

In the case of this MGI, the GPI file indicates Rtl4 for the name, and other things are synonyms:
MGI MGI:3588192 Rtl4 retrotransposon Gag like 4 C230031A03Rik|Mar4|Zcchc16 gene taxon:10090 UniProtKB:Q3URY0

Fixing that will ensure that we retrieve a single (and correct) recommended name for each GP as for the moment it's not certain.

Identify test cases for NEO (and autocomplete) builds

Generally, we'd like to have some more objective measure of whether a build of NEO (and the autocomplete/total entity space) has what we think it has. This applies to both the ontology builds and the load into Solr.

As a starting point, we'd like to:

  • get a set of identifiers for things we expect to see in the build

Ideally, this would slot into having some script to execute them against a product (owl? solr index?) in a pipeline so that failure can prevent publication

While not strictly NEO, we can start there and get a lot of work done. We can start with identifiers listed in #51 #52 #53.

Tagging @vanaukenk @goodb for feedback.

Tweak obo-uris hack to handle hash injected by OBO parser

From @pgaudet

Patrick pointed out that while the chains are available, their labels don’t show up – only the identifier. Can this be fixed?

From @balhoff :

This will be successful if the NEO ontology contains a term with label "nsp4 Scov2" and its IRI looks like http://identifiers.org/uniprot/P0DTD1-PRO_0000449622 instead of http://purl.obolibrary.org/obo/UniProtKB#_P0DTD1-PRO_0000449622.

Also, when this is loaded in Minerva, this model should no longer have gene products without labels.

Create an is-a closure version of NEO for Noctua autocomplete

For the ontology autocomplete fields in Noctua it would be good to just use is-a closure.
This would prevent potential confusion about what terms show up in the autocomplete menu and also possible annotation errors.

An example is what currently happens when typing in 'mechano' in the BP field of the form:

image

MF terms are also returned in this search due to the 'part of' relation between some MF and BP terms in the ontology.

See also:
geneontology/noctua-form#34
geneontology/noctua-form#19

Why are MGI assoc files a mix of gene and protein?

E.g. in the MGI GAF

MGI     MGI:1917015     1500004F05Rik           GO:0008150      MGI:MGI:2156816|GO_REF:0000015  ND              P       RIKEN cDNA 1500004F05 gene              gene    taxon:10090     20120430        MGI             
MGI     MGI:1923755     1500009C09Rik           GO:0003674      MGI:MGI:2156816|GO_REF:0000015  ND              F       RIKEN cDNA 1500009C09 gene              protein taxon:10090     20100209        MGI             VEGA:OTTMUSP00000045521

What makes the 2nd one a protein and the 1st a gene?

The page on JAX is kind of odd
http://www.informatics.jax.org/marker/MGI:1923755

"Feature Type protein coding gene"

Yet it's an ortholog of a lincRNA

I guess that's what conflict means

Looks like if there is a conflict, that results in the field in GO being 'protein' rather than 'gene'. But this is weird as the conflict apparently arises as the fact this is ncRNA...?

Either way: don't trust the type field in the MGI GAF

NEO no longer builds in the pipeline

Sometime between Nov 21st and Nov 27th, a change occurred in NEO (or something it brings in) that prevents the build with:

12:15:44  Exception in thread "main" org.semanticweb.owlapi.model.OWLOntologyStorageException: org.obolibrary.oboformat.model.FrameStructureException: multiple name tags not allowed. in frame:Frame(UniProtKB:Q06787-11 id( UniProtKB:Q06787-11)name( FMR1 Hsap)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProductIsoform)synonym( FMR1 RELATED)synonym( FMR1 BROAD)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/Protein)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine)name( Fmr1 isoform 11 Rnor)synonym( Q06787-11 RELATED)relationship( in_taxon NCBITaxon:9606)relationship( in_taxon NCBITaxon:10116)relationship( has_gene_template UniProtKB:Q06787)is_a( RGD:2623)is_a( CHEBI:36080))
12:15:44  	at org.semanticweb.owlapi.oboformat.OBOFormatRenderer.render(OBOFormatRenderer.java:90)
12:15:44  	at org.semanticweb.owlapi.oboformat.OBOFormatStorer.storeOntology(OBOFormatStorer.java:42)
12:15:44  	at org.semanticweb.owlapi.util.AbstractOWLStorer.storeOntology(AbstractOWLStorer.java:155)
12:15:44  	at org.semanticweb.owlapi.util.AbstractOWLStorer.storeOntology(AbstractOWLStorer.java:119)
12:15:44  	at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.saveOntology(OWLOntologyManagerImpl.java:1525)
12:15:44  	at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.saveOntology(OWLOntologyManagerImpl.java:1502)
12:15:44  	at owltools.io.ParserWrapper.saveOWL(ParserWrapper.java:289)
12:15:44  	at owltools.io.ParserWrapper.saveOWL(ParserWrapper.java:209)
12:15:44  	at owltools.cli.CommandRunner.runSingleIteration(CommandRunner.java:3712)
12:15:44  	at owltools.cli.CommandRunnerBase.run(CommandRunnerBase.java:76)
12:15:44  	at owltools.cli.CommandRunnerBase.run(CommandRunnerBase.java:68)
12:15:44  	at owltools.cli.CommandLineInterface.main(CommandLineInterface.java:12)
12:15:44  Caused by: org.obolibrary.oboformat.model.FrameStructureException: multiple name tags not allowed. in frame:Frame(UniProtKB:Q06787-11 id( UniProtKB:Q06787-11)name( FMR1 Hsap)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProductIsoform)synonym( FMR1 RELATED)synonym( FMR1 BROAD)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/Protein)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine)name( Fmr1 isoform 11 Rnor)synonym( Q06787-11 RELATED)relationship( in_taxon NCBITaxon:9606)relationship( in_taxon NCBITaxon:10116)relationship( has_gene_template UniProtKB:Q06787)is_a( RGD:2623)is_a( CHEBI:36080))
12:15:44  	at org.obolibrary.oboformat.model.Frame.checkMaxOneCardinality(Frame.java:424)
12:15:44  	at org.obolibrary.oboformat.model.Frame.check(Frame.java:405)
12:15:44  	at org.obolibrary.oboformat.model.OBODoc.check(OBODoc.java:390)
12:15:44  	at org.obolibrary.oboformat.writer.OBOFormatWriter.write(OBOFormatWriter.java:183)
12:15:44  	at org.semanticweb.owlapi.oboformat.OBOFormatRenderer.render(OBOFormatRenderer.java:88)
12:15:44  	... 11 more
12:15:44  Makefile:27: recipe for target 'neo.obo' failed
12:15:44  make: *** [neo.obo] Error 1

https://build.geneontology.org/job/geneontology/job/pipeline/job/issue-35-neo-test/97/console

Tagging @balhoff
Notice to @vanaukenk

Check type of mouse ncRNA genes - not recognized in Noctua as valid annotation objects

I'm not sure which is the best tracker for this issue, but am starting with NEO.

If a mouse ncRNA gene is used as an enabling entity or an input to a BP or MF, the nodes are being flagged by the ShEx validator because these ncRNA gene identifiers are not recognized as valid annotation objects.

In the MGI gpi file, these genes are typed as ncRNA genes using SO:0001263 according to the GPI2.0 spec.

@hdrabkin
@ukemi
@kltm
@balhoff

Note: if I check one of these gene ids, e.g. MGI:2676885, in noctua-amigo, the graph view seems to show the correct parentage.

Is NEO typing PRO identifiers correctly?

@hdrabkin showed me a model this morning where the ShEx is not validating PRO identifiers as chemical entities, i.e. 'has input' PR:nnnnnnnnnnn, for a MF gives a ShEx validation.

How are PRO identifiers currently typed in neo?

develop neo-lite for use in go-lego

As discussed in Berkeley October 2019, define a new neo build that contains the upper-level classes required to support inferences in Minerva.

@cmungall fill in details from board...

MGI miRNA identifiers unavailable in NEO?

Messaging with @hdrabkin

It appears that MGI miRNA identifiers are not available in Noctua.

I've checked on noctua-amigo and can't find them there either.

Here's an example:

MGI MGI:3711324 Mir291a microRNA 291a mmu-mir-291a|Mirn291a gene taxon:10090

Include life stage ontologies in neo

To be compliant with the ShEX specifications for 'happens during' and to allow for curation using various life stage ontologies, we want to make sure that we are importing external life stage ontologies, e.g. WBls, PO, etc. where needed.

For reference, see: geneontology/go-shapes#137

Add RNA central

RNAC provides various downloads. The GFF seems most complete. However, this doesn't seem to include MOD mappings. Where do these come from.

NEO no longer seems to build

NEO now fails on:

12:05:04  gzip -dc mirror/c_elegans.PRJNA13758.current.gene_product_info.gpi.gz | ./gpi2obo.pl -s Cele -n wb > target/neo-wb.obo.tmp && mv target/neo-wb.obo.tmp target/neo-wb.obo
12:05:04  make: *** No rule to make target 'target/neo-gramene_oryza.obo', needed by 'all_obo'.  Stop.
[Pipeline] }
12:05:04  ERROR: script returned exit code 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.