Comments (10)
Compaction now seems to be working on production.
from neo.
@vanaukenk Has volunteered to test once this gets to a server.
from neo.
apk3 is currently expanding to "http://arabidopsis.org/servlets/TairObject?type=locus&name=AT3G03900", and not a CURIE as other identifiers--so successfully switched, but same problem?
from neo.
@kltm which bit of software should be doing the shortening? Is Minerva sending it out uncompacted? It looks to be correctly in one of the built in context files: https://github.com/geneontology/minerva/blob/14f8f0cdaae608d561ac103ef2847743ac491c4e/minerva-core/src/main/resources/go_context.jsonld#L145
from neo.
@balhoff Good questions!
This is autocomplete, the the issue is on the NEO side of the fence. In the "neo.obo" file, these identifiers are still uncompressed (unCURIEed?), meaning that it's in the build somewhere. I'd note the owltools gets involved in there. So the order is something like:
- get upstream tair: https://www.arabidopsis.org/download_files/GO_and_PO_Annotations/Gene_Ontology_Annotations/gene_association.tair.gz (seems correct)
- run through gaf2obo.pl to produce neo-tair.obo (seems correct)
- use owltools to make neo.obo from the sources like neo-tair-.obo (incorrectly CURIEed)
I'm not sure if there is something "off" about the neo-tair.obo already or if this is an owltools or other type of problem. I assume there's a way to do this w/o owltools too?
from neo.
The identifiers are always incorrect in neo.obo
. owltools converts the obo to owl, and then the perl script does text substitution to fix the IRIs. I do see what you mean that they aren't compacted in the obo file, but I'm not sure that's a problem. Once it gets to the owl file, the IRIs look correct (I just downloaded and looked at the owl). I think that a downstream tool which is reading the owl file may be missing the CURIE definition. Is that possible?
from neo.
@balhoff That would unfortunately point back to owltools, which is in charge of loading the ontology for Solr here. I'm not sure why TAIR/AGI_LocusCode is the /one/ case here however; moreover in that it's not wrong, but uncompressed. Weird.
A couple of greps through the owltools code hasn't turned up anything yet.
I would note that the same things do work in "normal" Solr loads with owltools, so it would have to be particular to the ontology handling if it was there.
from neo.
@kltm I did some more testing; I think the core problem is the prefix contains an underscore, although I thought that used to be fine in OBO format. I better understand now that I guess the Solr load really does use the obo file, not the owl file. So we can either change the prefix to be one without an underscore, or try to make a fix in OWL API (long process).
from neo.
@balhoff To clarify, unless I'm missing something (always possible), owltools is using OWL files to load Solr: https://github.com/geneontology/pipeline/blob/issue-35-neo-test/Jenkinsfile#L84-L85 .
I agree that the underscore seems likely. It is in the spec though, right? https://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-NCName
from neo.
Thanks for clarifying; I have a potential fix in #113.
from neo.
Related Issues (20)
- Add some minimal QC to the NEO pipeline HOT 14
- Some Reactome Identifiers are not resolving HOT 11
- NEO GOlr / Solr load does not use "modern" loader, so does not build all closures for "modern" AmiGO
- Xenbase entities do not seem to propagate to the NEO build from the Xenbase GPI HOT 6
- Would like phage proteins to be available HOT 2
- Request to load more phage proteins HOT 2
- Many ChEBI chemicals not available for metabolic models HOT 17
- Isoform display in label (originally on the visual tool) HOT 4
- Clarify and implement rules on inclusion of RNA gene products / RNA central HOT 4
- RNAcentral IDs incorrectly use OBO PURL namespace HOT 5
- Some IDs we expect are not found in Noctua HOT 5
- Transient Xenbase issue dropped GPI file HOT 1
- Xenbase identifiers are not correct getting labels (in NEO?) HOT 5
- Some e.coli entries not found HOT 18
- In some cases bad identifiers are getting into the load HOT 5
- Ecoli data is gone from NEO as the upstream source changed HOT 13
- add additional species names to gene labels HOT 6
- Add FB Developmental and Anatomy Ontology HOT 14
- Unknown reduction in entities in NEO build HOT 15
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neo.