arineng / jcr Goto Github PK

View Code? Open in Web Editor NEW

1.0 10.0 2.0 987 KB

JSON Content Rules Draft

Shell 4.86% HTML 95.14%

jcr's Introduction

Background

This is the repository of the JSON Content Rules (JCR) specification. It is currently in IETF Internet Draft form.

For background on JCR:

IETF Internet-Drraft specification draft-newton-json-content-rules-08
JSON Content Rules web page
The JCRValidator Ruby implementation on GitHub
Codalogic's C++ JCR Parser on GitHub
The IETF Internet-Draft GitHub page.
[email protected]

Maintenance of the Specification

This section is only relevant to how this document is formally produced and submitted to the IETF. I'm writing this so I can remember what to do in the future... or if I get hit by a bus.

The following files are normative:

draft-newton-json-content-rules.xml is the source of truth for the specification. This is the file to edit for updates.
The figs directory contains the working samples that are put into the specification.
test_figs.sh uses assert.sh to test the files in figs against the JCR Valiator.
jcr-abnf.txt is taken from the JCR Validator project verbatim. It is generated there using a Ruby script written by Pete Cordell (go Pete!).

Once edited and all the figures are working, create HTML and TXT versions of the document using xml2rfc.

It's best to use a local install of xml2rfc, but if you use the online version at xml.resource.org, the submitted XML must have all figures and entity references embedded... that can be done with xmllint but its kinda cludgy.

The commands for creating HTML and TXT documents are xml2rfc draft-newton-json-content-rules.xml --html -o draft-newton-json-content-rules-09.html and xml2rfc draft-newton-json-content-rules.xml --text -o draft-newton-json-content-rules-09.txt.

Check in the XML, HTML, and TXT files so that your co-author and others can review them.

Then submit it to the IETF via the IETF submission page (use the .txt file as the .xml file will not work since local entity references cannot be uploaded). Once that is done, merge the working branch back to master (because some references in the document point to master).

jcr's People

Contributors

Stargazers

Watchers

Forkers

codalogic anewton1998

jcr's Issues

change 4627 example

the example JCR for the 4627 example needs more line breaks to mirror the 4627 example. This makes it clearer to track.

root_rule mapping to a group_rule

We allows root_rule to map to a group_rule. Does that mean any of the members of the group can act as group rules? e.g. if we have JCR of:

( { "foo" : string } | { "bar" : string } )

does that mean the following is a valid JSON instance:

{ "foo" : "baz" }

and so is:

{ "bar" : "baz" }

If so, does the I-D need updating to cover this?

Define Regular Expression standard

The draft should specify which regular expression standard to use

Allow regular expression flags

Should the syntax allow regular expression flags?

Member names and augmented rules

In 4.7. Object Rules, it says "Each member rule of an object rule is evaluated in the order in which they appear in the object rule." Initially I thought this would cause problems when combined with the (@Augments) annotation because pre-existing regular expression names could prevent new augmenting names being added at the end. However, I now think the rule is correct as it prevents extensions changing the meaning of members compared to how they would be interpreted without the extension, which seems like the correct behavior. For example, if a ruleset goes to the effort of marking an object closed to extension:

my_obj { foo, bar, 0 //:any }

then extensions shouldn't be able to add to this. If an augmenting rule ended up with an effective definition of:

my_obj { foo, bar, 0//:any, augmenting_rule }

then the processing rule described in 4.7 correctly keeps the object closed to extension without having to get into any clever logic to work out whether there are extension blocking rules (for which there may be a number of effective idioms, as indicated above) or not.

If that's reasonable, I suggest we add some text to 4.7 to highlight with an example that this is the approach we've adopted.

Inheritance of @(unordered) by group rules

We need to figure out what to do about gropus in unordered arrays:

@(unordered) [ (foo,bar), baz ]

Does that mean (foo,bar) is evaluated unordered too?

Ambiguous empty choices

tl;dr: After some analysis I don't think this is a concern, but it might be worth documenting it in an appendix or similar as others may ask the same question.

When a choice or array has zero or one member, there's no , or | character to say whether it is a sequence or choice. This is primarily an issue when a type is expected to be extended in a later version.

For example, are these likely to be sequences or choices in future versions?:

[]
{}
[ "MountDrive" ]
{ "MountDrive" : string }

If it's an array that is expected to be extended, there may not be an issue. The initial version of an array containing boot phases might be:

$boot_phases = [ string * ]

A later version could become:

$boot_phases = [ ("MountDrive" | string) * ]

For an object that is supposed to be a choice, we could initially do:

$shutdown_reason = { // : any ? }

when another option is defined, this would be changed to:

$shutdown_reason = { "MountDrive" : string | // : any ? }

So it looks like that initial concern is not an issue in reality. But since others may ask the same question it would be worth documenting.

Root rules cannot be member rules

It needs to be stated that root rules cannot be member rules.

integer and floats of different bit sizes

Following online discussion, we could consider adding types of int8, uint8, int16, uint16, int32, uint32, int64, uint64, int128, uint128, bigint, float, double and decimal. While some aspects of this can be expressed using ranges, it may be appropriate to include the above as shorthand, easy to remember types.

In the above naming, bigint would be arbitrary precision integer. decimal would be an arbitrary precision number that included decimal places.

Trivial point: ip4 vs. ipv4, ip6 vs. ipv6

I think people are more used to writing "ipv4" (or actually "IPv4") than "ip4". Incredibly minor, but why not.

Repetition step

While we're discussing repetition (hopefully soon, never to return to :-) ), with the new syntax we have the option to specify the modulo of valid repeats. i.e. we could specify pairs of floats by doing:

"coords" : [ float @*%2 ]

where % indicates that there is a repetition step, and the p_integer thereafter specifies what a valid instance repetition must be divisible by.

I mention it because I recall an example in XSD1.1 that used xs:assert to specify such a constraint. We could do this with JCRCC as:

"coords" @{assert count($) % 2==0} : [ float @* ]

It maybe that JSON doesn't really need this, whereas XML does. JSON could do it as:

"coords" : [ [ float @2 ] @* ]

whereas XML can't do this for an attribute, and it would be verbose for elements:

<coords><coord>10.0 5.3</coord><coord>11.6 15.9</coord></coords>

This is not something I feel strongly about. It's a "while we're on the subject" thought. We can either dismiss it as "thought about, but discarded" or just leave it as an issue to think about in the future.

allow colon before target rule names

A colleague who has started using JCR for a server testing framework found this issue. Our new relaxed syntax rules allow a colon (':') character before objects and arrays, but not before target rule names. This caused a bit of confusion.

This looks like it should work, but doesn't.

    {
     "net" : {
       "handle"        : arin_string,
       "name"          : arin_string,
       "startAddress"  : arin_string,
       "version"       : arin_string,
       "netBlocks"     : {
         "netBlock"       : {
           "cidrLength"       : arin_string,
           "description"      : arin_string,
           "startAddress"     : arin_string,
           "type"             : arin_string
         }
       }
     }
   }

   arin_string : { "$" : string }

This is what works:

    {
     "net" : {
       "handle"         arin_string,
       "name"           arin_string,
       "startAddress"   arin_string,
       "version"        arin_string,
       "netBlocks"    : {
         "netBlock"      : {
           "cidrLength"        arin_string,
           "description"       arin_string,
           "startAddress"      arin_string,
           "type"              arin_string
         }
       }
     }
   }

   arin_string : { "$" : string }

base64 or base64url or both

We have a tag for base64, but I-JSON (rfc7493) recommends using base64url. Should we do that too? Instead of base64 or in addition to base64?

Name spacing callback, etc annotations

As we start formulating callback and code check annotations, we should consider rule set name spacing. It might be that there are multiple checks in various rule sets implemented in different languages.

-07 Changes Checklist

Rework examples to have new $ and = and repetition syntax
Remove Appendix C Combining Multiple Rulesets (Experimental)
Update RFC 4627 example to RFC 7159 #15
Update with final ABNF from jcrvalidator

LL(1) parser friendliness

I have been looking into various types of parsing techniques. Turns out most of the parsers I've implemented are recursive descent with single token / character look-ahead (LL(1)).

The way we currently have annotations (and to a lesser extent colons) means that the JCR ABNF is not LL(1) compatible. It may be necessary to parse any annotations and back-track up to four times. Parslet does this very well, but the C++ Boost::Spirit library (which is more XML SAX like in that it fires off events as it parses input) used naively would potentially quadruple up the annotations identified on a rule. Not only is this less efficient it also requires more work from the developer. I figure the easier the grammar is to implement, the more implementations there will be, and the more success JCR will have.

Thus this was troubling me, so I looked to see if the grammar could be de-conflicted. Alas it can be, but at the loss of clarity in the ABNF, and also the potential for introducing errors.

So I propose we leave the grammar as it is and accept that it is not LL(1) friendly. If anyone brings it up we can say we know, but we decided it's not that big-a-deal. Although the grammar allows annotations, they are a typically rare (so the inefficiency won't occur) and typical scenarios where JCR is used are not likely to be performance critical.

I agree it makes sense to capturing this decision with some text in the draft.

Allow multiple nameless root rules

Recent email discussions raised the issue that we discussed allowing a single nameless root rule a while back (arineng/jcrvalidator#20). What we didn't do it allow multiple nameless root rules. I'm not sure if this was conscious decision, or just didn't occur to us at the time (the use-case was such that it didn't push allowing multiple to the fore). Since we allow the single nameless rule as:

( {...} | {...} | {...} )

We could change the ABNF to:

rule(:jcr) { ( spcCmnt | directive ).repeat >> ( spcCmnt | directive | root_rule | rule ).repeat }
#! jcr = *( sp-cmt / directive / root_rule / rule )

to allow:

{...}
{...}
{...}

This also allows a single root rule at top or bottom depending on personal preference.

I've done a local test of the ABNF and all seems OK. If this seems the way to go I could do a formal pull request when the colon work has settled out more.

Combiner precedence

Moved and enhanced from jcrvalidator...

What is the relative precedence of the choice and sequence combiners? The current text says "Sequence and choice combinations maybe mixed, with evaluation occurring in the order the rules are specified", but I'm not sure what that means. For example, does [ this | that, other | more ] correspond to [ (this | that), (other | more) ], [ this | (that, other) | more ], or [ ((this | that), other) | more ](left to right precedence?)?

Relax NG says that you have to be explicit about the groupings. So [ this | that, other | more ] would be illegal. The GCC C|C++ compiler generates a warning if you do the equivalent of that, even though it is not ambiguous in the language, so maybe people struggle with the precedence of them. It makes me think that the Relax NG approach has some merit and one should be required to explicitly group operators, i.e. do [ (this | that), (other | more) ], or [ this | (that, other) | more ].

What do you think?

Require min in min .. max repetition

When we have the min .. max repetitions I think it will be less confusing if we require the min repetition. So instead of allowing:

my_array : [ integer *..100 ]

Require the user to write:

my_array : [ integer *0..100 ]

or:

my_array : [ integer *1..100 ]

It would save quite a few people, including myself(!), having to look up what *..100 meant!

mention the colon syntax for arrays and objects

the draft needs to mention that colons are valid for defining arrays and objects

remove rule name in favor on a name annotation

I was thinking we should have an annotation for naming rules such as @(name my_integers).

This would allow naming rules that are embedded in other rules. [ 2 :0..2 ] could be [ 2 @(name my_integers) :0..2 ].

Taking this a step further, we could eliminate rule names as the exist today and simply always use the name annotation. So this:

[ 2 my_integers, 2 my_strings ]
my_integers :0..2
my_strings ( :"foo" | :"bar" )

would turn into this:

[ 2 my_integers, 2 my_strings ]
@(name my_integers) :0..2
@(name my_strings) ( :"foo" | :"bar" )

This makes the syntax a bit more simple (I think). As of now, a ruleset is composed of a series of rules (and directives) where each rule is composed of a rule name and rule definition. Making the name an annotation means that a ruleset is a series of rules, and rules are rule definitions.

What do you think?

Add a @{spec} annotation to specify format of a string

New types will likely be added as a form of string. Some such formats will have structure that can't easily be represented using regular expressions and other techniques.

To address this a @{spec} annotation can be introduced. This will contain a uri specifying where the format of the string is defined.

For example, a fictional type representing a DNA fingerprint could have the specification:

$fingerprint = @{spec http://bio.org/specs/dna#fingerprint} string

From a validators point of view, the above type can be considered to be string with the format specified by http://bio.org/specs/dna#fingerprint.

Whether validators give errors, warnings or act silently when they encounter a spec they don't know is an implementation detail.

More details on @{not}

If we have:

@{not} 2

which of the following are valid?

3
2.0
"A string"

If we have:

@{not} "foo" : string

which of the following are valid:

"bar" : 10
"foo" : 10

Does @{not} "foo" : string give a different result to @{not} "foo" : any?

is 42 == 42.0

This was asked of me yesterday.

Given this JSON:

[ 42, 42 ]

Does that match this JCR (using old colon syntax ):

[ : 42, : 42.0 ]

or does it only match this JCR:

[ : 42, : 42 ]

Refactor colon only for member rules

In #18 we discussed, deep in the thread, moving the colon to member rules and out of "type definitions". This is also a specific request from @johnwcowan and a general request from Daniel Parker.

So there is good news and bad news.

The good news

The colon_rework branch in the JCR validator (https://github.com/arineng/jcrvalidator/tree/colon_rework) almost works. Assuming that an ABNF parser will be "greedy" like Parslet, we should have no problems. Honestly I don't know if the ABNF spec addresses the issue of two rules with the same starting syntax.
In #18 we discussed rules for primitives with strings and regexes would still need a colon.

$s = :"foo"

Fortunately, this is not the case. The jcrvalidator branch works just fine with

$s = "foo"

the bad news

We've introduced a problem with number literals in arrays and groups by moving the colon around. The following breaks:

[ 1 ]
[ 1* 1 ]
( 1 )
( 1* 1 )

The parser cannot determine the difference between a repetition and the number literal.

I think we have the following paths forward.

move the repetition to the end of the type definition: [ "a" 1*2]
but the repetitions inside < and >: [ <1*2> "a" ]
a combination of the above: [ "a" <1*2> ]

My personal preference is option 3 as [1 1*3] just looks odd. A naive reader is more likely to understand [1 <1*3>].

Possible @{doc} annotation

On the JSON mailing list Anders Rundgren talks about documentation as a schema use-case (https://mailarchive.ietf.org/arch/msg/json/56jsEX-4kL2rzb9l2Ew5Zpez718).

We have ";" style comments, but they are not formally associated with any particular entity within a ruleset. While this can be done via convention, it can make extracting secondary documentation less robust. To counter this, we could have an @{doc} annotation. The gold plated version (!) might have two forms. An @{doc} with a string would be in-situ documentation, e.g.

@{doc "The maximum speed allowed"} "max-speed" : float

An @{doc} with a label could point to an out-of-line directive including the documentation. This has the benefit that large documentation doesn't get in the way of breaking up the visible structure of what is being documented. For example:

image = {
        @{doc width} "width" : 0..,
        @{doc height} "height" : 0..
}
#{doc width "The height of the image"}
#{doc height "The height of the image"}

Instance member name association

Some thoughts on member name association as a result of working on AOR.

Observing that the same member name specification can exist in multiple branches of an object definition, there needs to be a way of associating an instance member name to a member name specification(s).

After some thought I'm hoping this might work...

Observe that there are 3 types of member name specification: q-string specs, (non-empty) regex specs, and the (empty regex) wildcard spec. (e.g. "foo", /^foo/ and //.)

The proposal is that an instance member name should be:

first compared against all the q-name specs that are contained in the object specification. If the instance name matches one, the instance name and its value is associated with every occurrence of that name specification and its type.
second, compare the instance name against all non-empty regex specs. If only one match is found, then the instance and its value is associated with every occurrence of that regex name specification and its type. If the instance name matches more than one syntactically different regex spec, then the instance is deem invalid.
third, if the wildcard spec is present (//), the instance name is associated with each occurrence of the wildcard specification.
fourth, if the instance name does not match any of the above, it is ignored.

This way an instance name will be matched with only one member name specification. This will hopefully make the AOR processing more analysable and easier for users to understand.

Note that this means that q-string specs take precedence over regex specs, which in turn take precedence over wildcard specs. So:

$o1 = { /^p\d+$/ : integer *, "p0" : string }

would accept:

{ "p0" : "foo", "p1" : 2 }

and reject (due to type mismatch):

{ "p0" : 1, "p1" : 2 }

annotation for calling custom code to evaluate rules

Regarding the topic of annotations to call custom code for evaluating rules, in the issue on the jcrvalidator I speculated with something like:

odd_int @(check-code my_code) : integer

# code my_code(ruby-2.0)
## def my_def data
##   return data % 2
## end

I'm now thinking this is a bad idea. Embedding code in a ruleset invites all the headaches with variable scope and library paths, etc... If people want to pass around code, they can do that via GitHub. Or if they feel they must put it in an RFC, they can provide the actual code alongside the ruleset.

I propose we simply allow an annotation to link a rule to an entry point in the custom code. The only thing we need to do is say that the custom code either produces a positive evaluation or a negative evaluation.

I think this is what we want:

odd_int @(call my_code) : integer

Update RFC 4627 example to RFC 7159

Note that:

"Width": "100"

goes to:

"Width": 100

and there's the extra annotated member.