Giter Club home page Giter Club logo

rdf-nx-parser's Introduction

rdf-nx-parser

A non-validating tokenizer and parser for the RDF N-Triples and N-Quads serializations (or any “N-x”).

Provides parsing of N-Triples and N-Quads from strings, or tokenizing any “N-x” string.

Coverage Status

Why?

There are enough parsers already that are faster (see last section), but having a parser for Node.js is useful for building smaller tools.

Usage

npm install --save rdf-nx-parser

The module exports a parser object:

var parser = require('rdf-nx-parser');

Parsing

Use parseTriple() to parse an N-Triples statement, parseQuads() for N-Quads. Both return an objects, or null if the input can't be parsed.

var quad = parser.parseQuad(
    '_:foo ' + 
    '<http://example.com/bar> ' + 
    '"\\u9B3C\\u8ECA"@jp ' + 
    '<http://example.com/baz> .'
);

console.log(JSON.stringify(quad, null, 4));
{
    "subject": {
        "type": "blankNode",
        "value": "foo"
    },
    "predicate": {
        "type": "iri",
        "value": "http://example.com/bar"
    },
    "object": {
        "type": "literal",
        "value": "鬼車",
        "language": "jp"
    },
    "graphLabel": {
        "type": "iri",
        "value": "http://example.com/baz"
    }
}

Literal objects can have an additional language or datatypeIri property.

The parser does not verify that the data adheres to the [grammar] 1. It will instead happily parse anything as good as it can:

> parser.parseQuad('<foo> <:///baz>     "bar"  <$!#]&> .');

{ subject: { type: 'iri', value: 'foo' },
  predicate: { type: 'iri', value: ':///baz' },
  object: { type: 'literal', value: 'bar' },
  graphLabel: { type: 'iri', value: '$!#]&' } }

You can optionally pass an options object to these methods as a second parameter, shown with the defaults here:

parser.parseTriple(input, {
    // Set to `true` to get unparsed strings as `value`
    //properties
    asString: false,  
    
    // Include the unparsed token as `valueRaw` property
    // when returning objects
    includeRaw: false,

    // Decode unicode escapes, `\uxxxx` and `Uxxxxxxxx`
    // (but not percent encoding or punycode)
    unescapeUnicode: true
});

Parsing a whole file of N-Triples / N-Quads lines can easily be done e. g. with Node's readline module, see the example.

Tokenization

An arbitrary number of “N-x” tokens can be extracted from a string into an array of token objects with the tokenize() method:

> parser.tokenize(
    '<foo> _:bar . "123"^^<http://example.com/int> ' +
    '"\u0068\u0065\u006C\u006C\u006F"@en-US . .'
);

[ { type: 'iri', value: 'foo' },
  { type: 'blankNode', value: 'bar' },
  { type: 'endOfStatement', value: '.' },
  { type: 'literal',
    value: '123',
    datatypeIri: 'http://example.com/int' },
  { type: 'literal',
    value: 'hello',
    language: 'en-US' },
  { type: 'endOfStatement', value: '.' },
  { type: 'endOfStatement', value: '.' } ]

Each token has at least a type and a value property. There are four token types: iri, literal, blankNode and endOfStatement (can be listed with the getTokenTypes() method).

Implementation

The implementation is based on regular expressions (to split the input into tokens) – they are pretty fast on V8. This regex-based implementation is faster than a previous simple state machine (that read the input in one scan). Seems like regexes can be compiled more effectively into machine code.

Node.js version support

Works with Node.js 0.10 and higher.

Tests

Run with: npm test (mocha, Chai, Istanbul)

Similar projects

rdf-nx-parser's People

Contributors

j13z avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.