Comments (5)
Thank you for reporting!
The problem is that #Crew?oldid=2476206#Command_crew
in <https://memory-alpha.fandom.com/wiki/USS_Voyager#Crew?oldid=2476206#Command_crew>
is, strictly speaking, an invalid IRI part with #
followed by unescaped #
(and therefore the document is an invalid RDF, in a precise sense.)
Some library such as rdflib just ignores it, but Rio (Rust RDF library behind lightrdf) is rigid and raises an exception.
As the resume-after-exception feature is WIP in Rio, I think a possible workaround for now is to fix invalid IRIs before parsing, like:
sed -r 's/([^#]*)#/\1%23/2g' latest-all.ttl
(Use -i
to replace in-place and gsed
on Mac)
from lightrdf.
Thank you for the quick response, this is very helpful. I'm usually hesitant to manually patch source files, but this might be the best fix for the moment, agree. (thank you for the sed as well) I'm still looking at dbpedia ttls, it throws an error with that dataset as well, which I can't make sense of. At first I thought the problem was that it wasn't actually turtle format in their .ttl files, but as I start to review the spec, maybe it is turtle? (just a bare lazy dump with no prefixes?). Still looking and trying converting back and forth to other formats (e.g. with rapper)
from lightrdf.
I understand that huge datasets in RDF tend to be more or less malformed.
In my opinion, if an ntriples file is available, it is easier than turtle to find and "patch" problems and track the changes.
from lightrdf.
here is the (first) dbpedia (ttl file, but turtle?) issue, for reference
../dbpedia/ttl/revisions_lang=en_uris.ttl
lightrdf.Error: error while parsing IRI 'http://dbpedia.org/resource/󠄀': Invalid IRI code point '󠄀' on line 19841225 at position 35
$ sed -n '19841223,19841227p;19841228q' revisions_lang=en_uris.ttl
<http://dbpedia.org/resource/𨳒> <http://www.w3.org/ns/prov#wasDerivedFrom> <http://en.wikipedia.org/wiki/𨳒?oldid=786024110&ns=0> .
<http://dbpedia.org/resource/𩧢> <http://www.w3.org/ns/prov#wasDerivedFrom> <http://en.wikipedia.org/wiki/𩧢?oldid=951071761&ns=0> .
<http://dbpedia.org/resource/󠄀> <http://www.w3.org/ns/prov#wasDerivedFrom> <http://en.wikipedia.org/wiki/󠄀?oldid=949255578&ns=0> .
<http://dbpedia.org/resource/󠄁> <http://www.w3.org/ns/prov#wasDerivedFrom> <http://en.wikipedia.org/wiki/󠄁?oldid=949255580&ns=0> .
<http://dbpedia.org/resource/󠄂> <http://www.w3.org/ns/prov#wasDerivedFrom> <http://en.wikipedia.org/wiki/󠄂?oldid=949255609&ns=0> .
including a screen capture, because terminal seems to give more information about the characters in these 5 lines
from lightrdf.
I have tried with this sed
solution while parsing Wikidata, but:
lightrdf.Error: error while parsing IRI 'http://archive.is/EKEWo#34.7%': Invalid IRI percent encoding '%' on line 49533684 at position 41
Another:
lightrdf.Error: error while parsing language tag 'zh-classical': A subtag may be eight characters in length at maximum on line 59030363 at position 69
:(
from lightrdf.
Related Issues (10)
- Parse from String HOT 3
- lightrdf.Error: error while parsing IRI '': No scheme found in an absolute IRI HOT 2
- Incorrect parsing HOT 1
- expose trig support
- Serialize RDF HOT 2
- Add namespace support
- Add support for parsing objects into literals vs URIs vs blank nodes HOT 2
- Providing a Linux-arm64 wheel HOT 2
- Rio libraries need updating to fix a very weird bug HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightrdf.