Note: since I retired a few months ago I do not really maintain this package any more. I would be more than happy if an interested party was interested to take over. In the meantime, I have "archived" the repository to clearly signal that there is no maintenance. I would be happy to unarchive it and transfer ownership if someone is interested.
@iherman
This is a common Python interface to extract structured data from HTML files in RDF. Structured data can be in microdata, RDFa, or Turtle embedded in HTML. While RDFa and Turtle are both RDF serialization syntaxes, microdata is not; it is simply a specification for attributes to be used with HTML5 to express structured data. A separate Semantic Web Interest Group Note defines a mapping from HTML5+Microdata to RDF.
The software in this repository is only a thin layer on top of:
- PyRdfa, a full RDFa parser and distiller, built on top of RDFLib
- pyMicrodata, a microdata to RDF distiller, built on top of the same RDFLib
The local package includes the extraction of Turtle embedded in HTML.
The library is used by the SDE service at W3C.