Giter Club home page Giter Club logo

rdflib / pyrdfa3 Goto Github PK

View Code? Open in Web Editor NEW
67.0 67.0 23.0 2.83 MB

RDFa 1.1 distiller/parser library: can extract RDFa 1.1 (and RDFa 1.0, if properly set via a @version attribute) from (X)HTML, SVG, or XML in general. The module can be used to produce serialized versions of the extracted graph, or simply an RDFLib Graph.

Home Page: http://www.w3.org/2012/pyRdfa/

License: Other

HTML 77.66% CSS 0.89% JavaScript 0.29% Shell 0.01% Python 21.15%

pyrdfa3's Introduction

RDFLib

Build Status Documentation Status Coveralls branch

GitHub stars Downloads PyPI PyPI DOI

Contribute with Gitpod Gitter Matrix

RDFLib is a pure Python package for working with RDF. RDFLib contains most things you need to work with RDF, including:

  • parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, Trig and JSON-LD
  • a Graph interface which can be backed by any one of a number of Store implementations
  • store implementations for in-memory, persistent on disk (Berkeley DB) and remote SPARQL endpoints
  • a SPARQL 1.1 implementation - supporting SPARQL 1.1 Queries and Update statements
  • SPARQL function extension mechanisms

RDFlib Family of packages

The RDFlib community maintains many RDF-related Python code repositories with different purposes. For example:

  • rdflib - the RDFLib core
  • sparqlwrapper - a simple Python wrapper around a SPARQL service to remotely execute your queries
  • pyLODE - An OWL ontology documentation tool using Python and templating, based on LODE.
  • pyrdfa3 - RDFa 1.1 distiller/parser library: can extract RDFa 1.1/1.0 from (X)HTML, SVG, or XML in general.
  • pymicrodata - A module to extract RDF from an HTML5 page annotated with microdata.
  • pySHACL - A pure Python module which allows for the validation of RDF graphs against SHACL graphs.
  • OWL-RL - A simple implementation of the OWL2 RL Profile which expands the graph with all possible triples that OWL RL defines.

Please see the list for all packages/repositories here:

Help with maintenance of all of the RDFLib family of packages is always welcome and appreciated.

Versions & Releases

See https://rdflib.dev for the release overview.

Documentation

See https://rdflib.readthedocs.io for our documentation built from the code. Note that there are latest, stable 5.0.0 and 4.2.2 documentation versions, matching releases.

Installation

The stable release of RDFLib may be installed with Python's package management tool pip:

$ pip install rdflib

Some features of RDFLib require optional dependencies which may be installed using pip extras:

$ pip install rdflib[berkeleydb,networkx,html,lxml]

Alternatively manually download the package from the Python Package Index (PyPI) at https://pypi.python.org/pypi/rdflib

The current version of RDFLib is 7.0.0, see the CHANGELOG.md file for what's new in this release.

Installation of the current main branch (for developers)

With pip you can also install rdflib from the git repository with one of the following options:

$ pip install git+https://github.com/rdflib/rdflib@main

or

$ pip install -e git+https://github.com/rdflib/rdflib@main#egg=rdflib

or from your locally cloned repository you can install it with one of the following options:

$ poetry install  # installs into a poetry-managed venv

or

$ pip install -e .

Getting Started

RDFLib aims to be a pythonic RDF API. RDFLib's main data object is a Graph which is a Python collection of RDF Subject, Predicate, Object Triples:

To create graph and load it with RDF data from DBPedia then print the results:

from rdflib import Graph
g = Graph()
g.parse('http://dbpedia.org/resource/Semantic_Web')

for s, p, o in g:
    print(s, p, o)

The components of the triples are URIs (resources) or Literals (values).

URIs are grouped together by namespace, common namespaces are included in RDFLib:

from rdflib.namespace import DC, DCTERMS, DOAP, FOAF, SKOS, OWL, RDF, RDFS, VOID, XMLNS, XSD

You can use them like this:

from rdflib import Graph, URIRef, Literal
from rdflib.namespace import RDFS, XSD

g = Graph()
semweb = URIRef('http://dbpedia.org/resource/Semantic_Web')
type = g.value(semweb, RDFS.label)

Where RDFS is the RDFS namespace, XSD the XML Schema Datatypes namespace and g.value returns an object of the triple-pattern given (or an arbitrary one if multiple exist).

Or like this, adding a triple to a graph g:

g.add((
    URIRef("http://example.com/person/nick"),
    FOAF.givenName,
    Literal("Nick", datatype=XSD.string)
))

The triple (in n-triples notation) <http://example.com/person/nick> <http://xmlns.com/foaf/0.1/givenName> "Nick"^^<http://www.w3.org/2001/XMLSchema#string> . is created where the property FOAF.givenName is the URI <http://xmlns.com/foaf/0.1/givenName> and XSD.string is the URI <http://www.w3.org/2001/XMLSchema#string>.

You can bind namespaces to prefixes to shorten the URIs for RDF/XML, Turtle, N3, TriG, TriX & JSON-LD serializations:

g.bind("foaf", FOAF)
g.bind("xsd", XSD)

This will allow the n-triples triple above to be serialised like this:

print(g.serialize(format="turtle"))

With these results:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

<http://example.com/person/nick> foaf:givenName "Nick"^^xsd:string .

New Namespaces can also be defined:

dbpedia = Namespace('http://dbpedia.org/ontology/')

abstracts = list(x for x in g.objects(semweb, dbpedia['abstract']) if x.language=='en')

See also ./examples

Features

The library contains parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, JSON-LD, RDFa and Microdata.

The library presents a Graph interface which can be backed by any one of a number of Store implementations.

This core RDFLib package includes store implementations for in-memory storage and persistent storage on top of the Berkeley DB.

A SPARQL 1.1 implementation is included - supporting SPARQL 1.1 Queries and Update statements.

RDFLib is open source and is maintained on GitHub. RDFLib releases, current and previous are listed on PyPI

Multiple other projects are contained within the RDFlib "family", see https://github.com/RDFLib/.

Running tests

Running the tests on the host

Run the test suite with pytest.

poetry install
poetry run pytest

Running test coverage on the host with coverage report

Run the test suite and generate a HTML coverage report with pytest and pytest-cov.

poetry run pytest --cov

Viewing test coverage

Once tests have produced HTML output of the coverage report, view it by running:

poetry run pytest --cov --cov-report term --cov-report html
python -m http.server --directory=htmlcov

Contributing

RDFLib survives and grows via user contributions! Please read our contributing guide and developers guide to get started. Please consider lodging Pull Requests here:

To get a development environment consider using Gitpod or Google Cloud Shell.

Open in Gitpod Open in Cloud Shell

You can also raise issues here:

Support & Contacts

For general "how do I..." queries, please use https://stackoverflow.com and tag your question with rdflib. Existing questions:

If you want to contact the rdflib maintainers, please do so via:

pyrdfa3's People

Contributors

deniak avatar eugeneai avatar gromgull avatar iherman avatar nicholascar avatar pypingou avatar redapple avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyrdfa3's Issues

html5lib bug resolved?

Regarding this warning:

Warning (2013-07-16): the latest version of the html5lib package has a bug. This bug manifests itself if 
the source HTML file contains non-ASCII Unicode characters Until the bug is handled, users should 
use the older, 0.95 version. It can be downloaded at 
https://code.google.com/p/html5lib/downloads/detail?name=html5lib-0.95.tar.gz

I've tried with a recent version of html5lib and it seems to handle Unicode OK. Is there a test case I can use to confirm this?

Thank you.

missing documentation

I am looking for documentation how to use this - I tried using it like other parsers:

import rdflib


rdfa_text = """
    <div vocab="http://schema.org/" typeof="Order">
      <div property="seller" typeof="Organization">
        <b property="name">ACME Supplies</b>
      </div>
      <div property="customer" typeof="Person">
        <b property="name">Jane Doe</b>
      </div>
      <div property="orderedItem" typeof="OrderItem">
        Item number: <span property="orderItemNumber">abc123</span>
        <span property="orderQuantity">1</span>
        <div property="orderedItem" typeof="Product">
          <span property="name">Widget</span>
        </div>
        <link property="orderItemStatus" href="http://schema.org/OrderDelivered" />Delivered
        <div property="orderDelivery" typeof="ParcelDelivery">
          <time property="expectedArrivalFrom">2015-03-10</time>
        </div>
      </div>
      <div property="orderedItem" typeof="OrderItem">
        Item number: <span property="orderItemNumber">def456</span>
        <span property="orderQuantity">4</span>
        <div property="orderedItem" typeof="Product">
          <span property="name">Widget accessories</span>
        </div>
        <link property="orderItemStatus" href="http://schema.org/OrderInTransit" />Shipped
        <div property="orderDelivery" typeof="ParcelDelivery">
          <time property="expectedArrivalFrom">2015-03-15</time>
          <time property="expectedArrivalUntil">2015-03-18</time>
        </div>
      </div>
    </div>
"""

g = rdflib.Graph()
g.parse(data=rdfa_text, format="rdfa")

But that won't work:

Traceback (most recent call last):
  File "rdfa.py", line 54, in <module>
    g.parse(data=rdfa_text, format="rdfa")
  File "/home/f.ludwig/projects/rdflib/rdflib/graph.py", line 1075, in parse
    parser.parse(source, self, **args)
  File "/home/f.ludwig/.local/share/virtualenvs/dbe-FuKuixvd/lib/python3.8/site-packages/pyRdfa/rdflibparsers.py", line 138, in parse
    self._process(graph, pgraph, baseURI, orig_source,
  File "/home/f.ludwig/.local/share/virtualenvs/dbe-FuKuixvd/lib/python3.8/site-packages/pyRdfa/rdflibparsers.py", line 180, in _process
    _check_error(processor_graph)
  File "/home/f.ludwig/.local/share/virtualenvs/dbe-FuKuixvd/lib/python3.8/site-packages/pyRdfa/rdflibparsers.py", line 60, in _check_error
    raise Exception("RDFa parsing Error! %s" % msg)
Exception: RDFa parsing Error! name 'file' is not defined

Message 'No handlers could be found for logger "rdflib.term"' clutters output

Running localRDFa.py on Python 2.7.5, I get an invalid output like

No handlers could be found for logger "rdflib.term"
@prefix cc: <http://creativecommons.org/ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .

As a non-pyhton programmer, I'm not sure how I could avoid it. The hint at http://stackoverflow.com/questions/17393664/no-handlers-could-be-found-for-logger-rdflib-term gave an idea, but I'm still not sure what's missing, and how should I add it.

Please consider formally issuing a new code release

It has been quite some time since last formal code release, and interesting changes have happened since then which would be nice to promote more widely.

Newest release tagged here on Github is 3.5.2, issued in April 2019.
Newest release on PyPI is 2.3.7, issued in August 2010.
Code and embedded documentation was changed to list version as 4.0.0, in January 2020.

Please consider issuing a formal release, and to publish that to PyPI.

pyrdfa3 repository vs. rdflib's copy of pyrdfa3

Repeating the question I just asked on #rdflib - why is there both a top-level pyrdfa3 repository, and a separate copy of pyrdfa3 under the rdflib repository's plugins/parsers subdirectory?

If rdflib was using git submodules to pull in pyrdfa3, that would make some sense; it would enable us to focus on pyrdfa3 as a separate module and hopefully avoid bugs like #7 where rdflib's setup.py handles the 2to3 conversion, while pyrdfa3's own does not.

As an entirely separate copy, however, the risk is that fixes to one repo won't get into the other. And the repos are in fact slightly out of sync with one another (a couple of lines concerning type checking with isinstance()).

Error simple rdfa

I just want to start to have a running configuration of pyrdfa. But running Python 3.7 and rdfa3 ==3.5 gives the following error.

from pyRdfa import pyRdfa
import rdflib
print(pyRdfa().rdf_from_source('pyrdfa.xml'))

Input file pyrdfa.xml

<html xmlns="http://www.w3.org/1999/xhtml"
      prefix="cal: http://www.w3.org/2002/12/cal/ical#">
  <head>
    <title>Jo's Friends and Family Blog</title>
    <link rel="foaf:primaryTopic" href="#bbq" />
    <meta property="dc:creator" content="Jo" />
  </head>
  <body>
    <p about="#bbq" typeof="cal:Vevent">
      I'm holding
      <span property="cal:summary">
        one last summer barbecue
      </span>,
      on
      <span property="cal:dtstart" content="2015-09-16T16:00:00-05:00" 
            datatype="xsd:dateTime">
        September 16th at 4pm
      </span>.
    </p>
  </body>
</html>

error:

> NameError                                 Traceback (most recent call last)
> c:\users\dijkstrr\appdata\local\programs\python\python37-32\lib\site-packages\pyRdfa\__init__.py in _get_input(self, name)
>     448                                                 self.options.set_host_language(self.media_type)
> --> 449                                         return file(name)
>     450                         else :
> 
> NameError: name 'file' is not defined

What is going wrong?

Only ASCII support in rdf:HTML datatype

Thanks for this very useful tool! I’m trying to turn this RDFa into RDF/XML using scripts/localRDFa.py (note the Unicode ellipsis characters):

<!DOCTYPE html>
<html lang="en">
  <body prefix="schema: http://schema.org/">
    <div class="entry" resource="http://example.com/blog/1" typeof="schema:BlogPosting">
      <h2 property="schema:headline">Unicode is accepted here…</h2>
      <div property="schema:articleBody" datatype="rdf:HTML">… but not here!</div>
    </div>
  </body>
</html>

It fails with these error messages:

[digicol@timsdcxvm pyrdfa3-master]$ scripts/localRDFa.py -p /tmp/unicode.html 
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pyRdfa/__init__.py", line 648, in graph_from_source
    return self.graph_from_DOM(dom, graph, pgraph)
  File "/usr/lib/python2.6/site-packages/pyRdfa/__init__.py", line 501, in graph_from_DOM
    parse_one_node(topElement, default_graph, None, state, [])
  File "/usr/lib/python2.6/site-packages/pyRdfa/parse.py", line 67, in parse_one_node
    _parse_1_1(node, graph, parent_object, incoming_state, parent_incomplete_triples)
  File "/usr/lib/python2.6/site-packages/pyRdfa/parse.py", line 289, in _parse_1_1
    _parse_1_1(n, graph, object_to_children, state, incomplete_triples)
  File "/usr/lib/python2.6/site-packages/pyRdfa/parse.py", line 289, in _parse_1_1
    _parse_1_1(n, graph, object_to_children, state, incomplete_triples)
  File "/usr/lib/python2.6/site-packages/pyRdfa/parse.py", line 289, in _parse_1_1
    _parse_1_1(n, graph, object_to_children, state, incomplete_triples)
  File "/usr/lib/python2.6/site-packages/pyRdfa/parse.py", line 275, in _parse_1_1
    ProcessProperty(node, graph, current_subject, state, typed_resource).generate_1_1()
  File "/usr/lib/python2.6/site-packages/pyRdfa/property.py", line 126, in generate_1_1
    object = Literal(self._get_HTML_literal(self.node), datatype=HTMLLiteral)                       
  File "/usr/lib/python2.6/site-packages/rdflib-4.0.1-py2.6.egg/rdflib/term.py", line 564, in __new__
    _value, _datatype = _castPythonToLiteral(value)
  File "/usr/lib/python2.6/site-packages/rdflib-4.0.1-py2.6.egg/rdflib/term.py", line 1386, in _castPythonToLiteral
    return castFunc(obj), dType
  File "/usr/lib/python2.6/site-packages/rdflib-4.0.1-py2.6.egg/rdflib/term.py", line 1319, in _writeXML
    if s.startswith(b(u'<?xml version="1.0" encoding="utf-8"?>')):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 38: ordinal not in range(128)
Traceback (most recent call last):
  File "scripts/localRDFa.py", line 126, in <module>
    print processor.rdf_from_sources(value, outputFormat = format, rdfOutput = rdfOutput)
  File "/usr/lib/python2.6/site-packages/pyRdfa/__init__.py", line 685, in rdf_from_sources
    self.graph_from_source(name, graph, rdfOutput)
  File "/usr/lib/python2.6/site-packages/pyRdfa/__init__.py", line 657, in graph_from_source
    if not rdfOutput : raise b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 38: ordinal not in range(128)

If I remove the Unicode ellipsis character from the schema:articleBody, the HTML parses fine. It doesn’t hurt in the schema:headline.

I don’t know Python (yet) so I’m reporting this here, hoping that someone has the time for a hopefully quick fix. Thanks for looking into this!

Whitespace-insertion missing for structured data that spans multiple block elements

Scenario: Whitespace-insertion missing for structured data that spans multiple block elements
  Given a simple (X)HTML5 document with structured data that spans multiple block elements without whitespace in between like this:
    """
      <!DOCTYPE html>
      <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" vocab="http://schema.org/">
      <head><title>Structured Data Example</title></head>
      <body typeof="BlogPosting">
      <main property="articleBody">
      <p>Foo</p><p>Bar</p>
      </main>
      </body>
      </html>
    """
  When extracting the structured data of articleBody
  Then the structured data should be: "Foo Bar"

The actual structured data currently is "FooBar" (space missing).

Although I haven't yet tried to trace it down to the spec, I understand that this might actually be a problem with the RDFa (and microdata, which I expect to have the same behavior) specification, as RDFa itself cannot tell the difference between inline elements and block elements. So I do not know whether this actually should be fixed. However, when the parser knows that it's HTML5+RDFa or XHTML+RDFa, the parser knows what elements are block and what elements are inline. I think it's actually a safer behavior (although I don't know whether that would be spec compliant) to insert a space between elements always, assuming that normally words are not broken by elements, and make an exception for those few elements known to break words, like all inline elements.

The glitch consequence from this issue is that minimized documents don't work unless they insert a single space between block-level elements in order to have words in different elements to not be joined.

how hard would it be to extend pyRdfa3 to lxml.etree?

Hey there,

Just tried to feed graph_from_DOM an already-parsed lxml.etree document and I tripped over the fact that it only speaks xml.dom.minidom. Since both these APIs give access to roughly the same information (at least as far as RDFa is concerned), I'd be okay with trying to make it handle both—unless it was too much of a snarl, or you didn't want it to for some reason.

Thoughts?

Looking for a new maintainer

Dear all,

many years ago I developed this Python module. The library seems to be fairly solid; the only change I made a while ago was to adapt it to Python3 as well. The library is also what drives an RDFa extraction service at W3C.

However… I have recently retired and, although I still maintain some activities at the W3C, I do it on greatly reduced hours. Maintaining this library, possibly developing it further as RDFLib evolves, etc, is not something I can commit myself to do anymore. I am looking for a person (or persons) who would be willing to take over this responsibility.

Any takers?

Cc @RDFLib/rdflib

Distiller does not handle IRIs nor UTF-8 characters well when using direct input

I have an RDFa document using IRIs. The distiller web page messes up the encoding - at least when using direct input:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" prefixes="skos: http://www.w3.org/2004/02/skos/core#" lang="cs">
<head>
<title>Kódy oborů OECD FORD - Frascati manualFields of Research and Development classification (FORD) - Frascati manual</title>
</head>
<body about="https://data.mvcr.gov.cz/zdroj/číselníky/ford" typeof="skos:ConceptScheme">
<h1>Kódy oborů OECD FORD - Frascati manualFields of Research and Development classification (FORD) - Frascati manual <a href="https://data.mvcr.gov.cz/zdroj/číselníky/ford">🔗</a></h1>
<p>
Toto je HTML zobrazení číselníku <a href="https://data.mvcr.gov.cz/zdroj/číselníky/ford"><span property="skos:prefLabel" lang="cs">Kódy oborů OECD FORD - Frascati manual</span><span property="skos:prefLabel" lang="en">Fields of Research and Development classification (FORD) - Frascati manual</span></a> identifikovaného <a href="https://data.mvcr.gov.cz/zdroj/číselníky/ford">https://data.mvcr.gov.cz/zdroj/číselníky/ford</a> a publikovaného dle <a href="https://ofn.gov.cz/číselníky/">Otevřené formální normy (OFN) pro číselníky.</a>
</p>
<table rev="skos:inScheme">
<tr>
<th>IRI</th>
<th>Kód</th>

<th>Název anglicky</th>

<th>Tranzitivně širší položka</th></tr>
<tr id="https://data.mvcr.gov.cz/zdroj/číselníky/ford/položky/10000" about="https://data.mvcr.gov.cz/zdroj/číselníky/ford/položky/10000" typeof="skos:Concept">
<td><a href="https://data.mvcr.gov.cz/zdroj/číselníky/ford/položky/10000">https://data.mvcr.gov.cz/zdroj/číselníky/ford/položky/10000</a></td>
<td property="skos:notation">10000</td>

<td property="skos:prefLabel" lang="en">1. Natural Sciences</td>
<td></td>

</tr>
<tr id="https://data.mvcr.gov.cz/zdroj/číselníky/ford/položky/10100" about="https://data.mvcr.gov.cz/zdroj/číselníky/ford/položky/10100" typeof="skos:Concept">
<td><a href="https://data.mvcr.gov.cz/zdroj/číselníky/ford/položky/10100">https://data.mvcr.gov.cz/zdroj/číselníky/ford/položky/10100</a></td>
<td property="skos:notation">10100</td>

<td property="skos:prefLabel" lang="en">1.1 Mathematics</td>
<td><a href="#https://data.mvcr.gov.cz/zdroj/číselníky/ford/položky/10000" rel="skos:broaderTransitive" resource="https://data.mvcr.gov.cz/zdroj/číselníky/ford/položky/10000">https://data.mvcr.gov.cz/zdroj/číselníky/ford/položky/10000</a></td>

</tr>
</table>
</body>
</html>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://data.mvcr.gov.cz/zdroj/�íselníky/typy-pracovních-vztahů/položky/dpp> a skos:Concept ;
    skos:inScheme <https://data.mvcr.gov.cz/zdroj/�íselníky/typy-pracovních-vztahů> ;
    skos:prefLabel "Dohoda o provedení práce"@cs .

<https://data.mvcr.gov.cz/zdroj/�íselníky/typy-pracovních-vztahů/položky/dp�> a skos:Concept ;
    skos:inScheme <https://data.mvcr.gov.cz/zdroj/�íselníky/typy-pracovních-vztahů> ;
    skos:prefLabel "Dohoda o pracovní �innosti"@cs .

<https://data.mvcr.gov.cz/zdroj/�íselníky/typy-pracovních-vztahů/položky/plný-úvazek> a skos:Concept ;
    skos:inScheme <https://data.mvcr.gov.cz/zdroj/�íselníky/typy-pracovních-vztahů> ;
    skos:prefLabel "Pracovní poměr - plný úvazek"@cs .

<https://data.mvcr.gov.cz/zdroj/�íselníky/typy-pracovních-vztahů/položky/služební-poměr> a skos:Concept ;
    skos:inScheme <https://data.mvcr.gov.cz/zdroj/�íselníky/typy-pracovních-vztahů> ;
    skos:prefLabel "Služební poměr"@cs .

<https://data.mvcr.gov.cz/zdroj/�íselníky/typy-pracovních-vztahů/položky/zkrácený-úvazek> a skos:Concept ;
    skos:inScheme <https://data.mvcr.gov.cz/zdroj/�íselníky/typy-pracovních-vztahů> ;
    skos:prefLabel "Pracovní poměr - zkrácený úvazek"@cs .

<https://data.mvcr.gov.cz/zdroj/�íselníky/typy-pracovních-vztahů> a skos:ConceptScheme ;
    skos:prefLabel "Typy pracovních vztahů"@cs,
        "Employment relation types"@en .

prytteXMLserializer_3_2.py: inconsistent use of tabs and spaces

Hi,

While attempting to byte compile this package with compileall, it fails like:

RdfaExtras/serializers/XMLWriter.py'...
Compiling '/gnu/store/5p5wl4z8391ykvxf2v6nwnp81k25n58v-python-pyrdfa3-3.5.3/lib/python3.9/site-packages/pyRdfaExtras/serializers/__init__.py'...
Compiling '/gnu/store/5p5wl4z8391ykvxf2v6nwnp81k25n58v-python-pyrdfa3-3.5.3/lib/python3.9/site-packages/pyRdfaExtras/serializers/jsonserializer.py'...
Compiling '/gnu/store/5p5wl4z8391ykvxf2v6nwnp81k25n58v-python-pyrdfa3-3.5.3/lib/python3.9/site-packages/pyRdfaExtras/serializers/prettyXMLserializer.py'...
Compiling '/gnu/store/5p5wl4z8391ykvxf2v6nwnp81k25n58v-python-pyrdfa3-3.5.3/lib/python3.9/site-packages/pyRdfaExtras/serializers/prettyXMLserializer_3.py'...
Compiling '/gnu/store/5p5wl4z8391ykvxf2v6nwnp81k25n58v-python-pyrdfa3-3.5.3/lib/python3.9/site-packages/pyRdfaExtras/serializers/prettyXMLserializer_3_2.py'...
*** Sorry: TabError: inconsistent use of tabs and spaces in indentation (prettyXMLserializer_3_2.py, line 219)
Compiling '/gnu/store/5p5wl4z8391ykvxf2v6nwnp81k25n58v-python-pyrdfa3-3.5.3/lib/python3.9/site-packages/pyRdfaExtras/serializers/turtleserializer.py'...
error: in phase 'install': uncaught exception:
%exception #<&invoke-error program: "python" arguments: ("-m" "compileall" "--invalidation-mode=unchecked-hash" "/gnu/store/5p5wl4z8391ykvxf2v6nwnp81k25n58v-python-pyrdfa3-3.5.3") exit-status: 1 term-signal: #f stop-signal: #f> 
phase `install' failed after 0.3 seconds
command "python" "-m" "compileall" "--invalidation-mode=unchecked-hash" "/gnu/store/5p5wl4z8391ykvxf2v6nwnp81k25n58v-python-pyrdfa3-3.5.3" failed with status 1

This is with Python 3.9.9.

pydfa3 Adaption

How can I adapt the pyrdfa3 parser to get the dynamic XPath?

pyRdfa fails on case-sensitive systems

Problem:
"import pyRdfa" on a Linux system fails.

Python 2.7.5 (default, Jul 8 2013, 09:48:59)
[GCC 4.8.1 20130603 (Red Hat 4.8.1-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import pyRdfa
Traceback (most recent call last):
File "", line 1, in
File "/home/dan/schema_rdfa_py2/lib/python2.7/site-packages/pyRdfa/init.py", line 132, in
from rdflib.Graph import Graph
ImportError: No module named Graph

The filesystem shows that rdflib contains a module named "graph.py", but nothing starting with a capital G:
$ ls rdflib/g*py
rdflib/graph.py

$ ls rdflib/G*.py

This particular error can be resolved by changing the following line from:

from rdflib.Graph import Graph

to:

from rdflib.graph import Graph

... however, it is only one example import of many that are subject to this case-sensitivity problem.

Bug in JSON-LD serialization

There's a bug in the JSON-LD serialization. Type coercions are applied to the full IRIs instead of applying them to the compact IRIs used in the document. Example:

{
  "@context": {
    "owl": "http://www.w3.org/2002/07/owl#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "http://www.w3.org/2000/01/rdf-schema#isDefinedBy": {  <-- should be rdfs:isDefinedBy
      "@type": "@id"
    },
    "http://www.w3.org/2000/01/rdf-schema#seeAlso": {      <-- should be rdfs:seeAlso
      "@type": "@id"
    }
  },
  "@id": "http://www.w3.org/ns/json-ld#context",
  "@type": "rdf:Property",
  "rdfs:label": {
    "@value": "JSON-LD context",
    "@language": "en"
  },
  "rdfs:isDefinedBy": {
    "@id": "http://www.w3.org/ns/json-ld",
    "@type": "owl:Ontology",
    "rdfs:label": {
      "@value": "The JSON-LD Vocabulary",
      "@language": "en"
    },
    "rdfs:comment": {
      "@value": "This is a vocabulary document and is used to achieve certain features of the JSON-LD language.",
      "@language": "en"
    }
  },
  "rdfs:seeAlso": "http://www.w3.org/TR/json-ld-syntax/#referencing-contexts-from-json-documents",
  "rdfs:comment": {
    "@value": "This link relation is used to reference a JSON-LD context from a JSON document so that it can be interpreted as JSON-LD.",
    "@language": "en"
  }
}

How to handle timeout exception?

Hi,

I am using pyrdfa3 for parsing 100000 URL but I am getting a timeout exception. I tried several methods but couldn't fix it.

Can you suggest some method?

Thanks

Language markup (`lang="en") gets injected into strings with `rdf:HTML` datatype

Given this (rather silly) RDFa markup:

<section lang="de" resource="#">
<h2>Keywords</h2>
<ul>
  <li property="schema:keywords">Gift</li>
  <li property="schema:keywords" lang="en">Gift</li>
  <li property="schema:keywords" datatype="rdf:HTML">CO<sub>2</sub> Gift</li>
</ul>
</section>

results in:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<#> schema:keywords "CO<sub lang=\"de\">2</sub> Gift"^^rdf:HTML,
        "Gift"@de,
        "Gift"@en .

The lang="de" (in this case) is injected into any outer-most markup. In the scenario above that generates some nonsensical markup--which the author didn't intend. 😃

I don't believe this is intended behavior (the Ruby Distiller lacks this bug), so thought I'd report it.

Cheers!
🎩

Error installing pyRdfaExtras due to tuple unpacking using Python 3.3

Although README.txt claims "The package has been adapted to Python 3", attempting to install pyrdfa3 under Python 3.3 fails with a syntax error in the pyRdfaExtras directory.

Steps to reproduce:

  1. Setting up the environment:
    virtualenv --python=/usr/bin/python3.3 ~/schema_rdfa
    git clone https://github.com/RDFLib/pyrdfa3.git
    cd pyrdfa3
  2. Installing the package:
    ~/schema_rdfa/bin/python setup.py install
  3. Error:
    ...
    byte-compiling /home/dan/schema_rdfa/lib/python3.3/site-packages/pyRdfaExtras/init.py to init.cpython-33.pyc
    File "/home/dan/schema_rdfa/lib/python3.3/site-packages/pyRdfaExtras/init.py", line 112
    def add(self, (s,p,o)) :
    ^
    SyntaxError: invalid syntax

It looks like Python 3 dropped support for using tuples directly in method signatures like this five years ago via PEP 3113 (http://www.python.org/dev/peps/pep-3113/) with warnings added as of Python 2.6. I expect the alternative would be something like "def add(self, triple)" and then checking to ensure that triple[0], [1], and [2] were all defined.

Looking over RDFlib itself, it seems that the API is rife with tuple unpacking behaviour :/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.