anweiss / cddl Goto Github PK
View Code? Open in Web Editor NEWConcise data definition language (RFC 8610) implementation and JSON and CBOR validator in Rust
Home Page: https://cddl.anweiss.tech
License: MIT License
Concise data definition language (RFC 8610) implementation and JSON and CBOR validator in Rust
Home Page: https://cddl.anweiss.tech
License: MIT License
A v2 of this library would likely warrant a re-write using a more formal parser library, like nom. It would be interesting to compare the performance of nom vs. the handwritten implementation that exists today.
Array validation doesn't seem to work as expected when the array contains a non-homogenous "record" (fixed length list using different types at different indices).
If I create the following CDDL:
human = [
age: int,
name: tstr,
]
I expect that this will only validate an input list with two elements [integer, string] in order. However, the current code will successfully validate the following (incorrect) JSON inputs:
["Bob", 43]
or
[44, 45, "Carol", "Chuck"]
It appears that the code is iterating through the group checking that somewhere in the input list there is a value of the correct type; it fails to enforce that there are the right number of elements and that elements come in the expected order.
This problem exists in CBOR validation as well.
Using the latest version at this point in time: 0.9.0-beta.1
.
I am having a problem with validating CBOR [18]
(hex: 9F12FF
- regardless of definite/indefinite array) against CDDL tester = [18/12]
which should be valid.
But I am getting:
failed: error validating type choice at cbor location : expected value 12, got Array([Integer(Integer(18))])
and validating CBOR [12]
(hex: 9F0CFF
) against the same CDDL also yields:
failed: error validating type choice at cbor location /0: expected value 18, got Integer(12)
error validating type choice at cbor location : expected value 12, got Array([Integer(Integer(12))])
but validating CBOR [12]
and [18]
against CDDL tester = [18//12]
works.
According to the specification type choices should have worked, am I mistaken or is this something that is currently not supported in this package?
Thank you for your great work!
Take the following code as an example:
foo = int
; no errors
bar = { int, int // int, tstr }
; error
; baz = { int, foo // int, tstr }
It seems to happen whenever a typename is used as the last field within a group choice.
It generates an Error return. However, even in the first example, bar
, I think it is still incorrectly parsing, as it seems to consider it more equivalent to what bar = {int, int int, tstr }
might be, although that in and of itself seems weird.
As per the RFC8610 specs:
Analogous to types, CDDL also allows choices between groups,
delimited by a "//" (double slash). Note that the "//" operator
binds much more weakly than the other CDDL operators, so each line
within "delivery" in the following example is its own alternative in
the group choice:
address = { delivery }
delivery = (
street: tstr, ? number: uint, city //
po-box: uint, city //
per-pickup: true )
city = (
name: tstr, zip-code: uint
)
bar
should be parsed into 2 group choices with 2 members each, not 1 choice with 4 members.
Here is the debug-printed value of bar
:
CDDL { rules: [Type(TypeRule {
name: Identifier(("bar", None)),
generic_param: None,
is_type_choice_alternate: false,
value: Type([
Type1 {
type2: Map(Group([
GroupChoice([
(ValueMemberKey(ValueMemberKeyEntry { occur: None, member_key: None, entry_type: Type([Type1 { type2: Typename((Identifier(("int", None)), None)), operator: None }]) }), true),
(ValueMemberKey(ValueMemberKeyEntry { occur: None, member_key: None, entry_type: Type([Type1 { type2: Typename((Identifier(("int", None)), None)), operator: None }]) }), false),
(ValueMemberKey(ValueMemberKeyEntry { occur: None, member_key: None, entry_type: Type([Type1 { type2: Typename((Identifier(("int", None)), None)), operator: None }]) }), true),
(ValueMemberKey(ValueMemberKeyEntry { occur: None, member_key: None, entry_type: Type([Type1 { type2: Typename((Identifier(("tstr", None)), None)), operator: None }]) }), false)
])
])),
operator: None
}
]),
range: (0, 7)
})] }
And here is how bar = { int, int, int, tstr }
is parsed:
CDDL { rules: [Type(TypeRule {
name: Identifier(("bar", None)),
generic_param: None,
is_type_choice_alternate: false,
value: Type([
Type1 {
type2: Map(Group([
GroupChoice([
(ValueMemberKey(ValueMemberKeyEntry { occur: None, member_key: None, entry_type: Type([Type1 { type2: Typename((Identifier(("int", None)), None)), operator: None }]) }), true),
(ValueMemberKey(ValueMemberKeyEntry { occur: None, member_key: None, entry_type: Type([Type1 { type2: Typename((Identifier(("int", None)), None)), operator: None }]) }), true),
(ValueMemberKey(ValueMemberKeyEntry { occur: None, member_key: None, entry_type: Type([Type1 { type2: Typename((Identifier(("tstr", None)), None)), operator: None }]) }), false)
])
])),
operator: None
}]),
range: (0, 7)
})] }
Which is missing one of the fields, possibly since they are unnamed?
Which made me try bar = ( a: int, b: int // c: int, d: tstr )
but it still has the same problem as before with them all being in one group choice.
I have this CDDL snippet:
m = non-empty<{
? 0 => "zero"
}>
non-empty<M> = (M) .and ({ + any => any })
which produces this error:
> docker run -i --rm -v $PWD:/data -w /data ghcr.io/anweiss/cddl-cli:latest compile-cddl --cddl non-empty.cddl
error: parser errors
┌─ input:5:20
│
5 │ non-empty<M> = (M) .and ({ + any => any })
│ ^^^^ expected rule identifier followed by an assignment token '=', '/=' or '//='
Error: "Parser error"
It looks like .and
is not fully supported?
Create a language server extension for VSCode built on the wasm package. Prototype being developed in the lsp
branch.
If an invalid control operator is specified, the parser is not properly reporting it and instead reports an invalid rule failure.
Numbers that are part of an identifier with a hyphen -
are being mistakenly highlighted ... e.g. in blake2b-256
, the 256
is highlighted.
Lexer doesn't support multiline, unprefixed byte strings.
All errors are currently Box
'ed with little actionable information. The library should provide for better error handling mechanisms for lexing, parsing and validation.
I have this CBOR (diag notation):
{1: 65535, 2: h'1122334455', 3: 6, }
It validates successfully with this CDDL:
var_header = {
K_KEY_PROVIDER: uint,
K_KEY_ID: bstr,
? K_KEY_VERSION: uint,
? K_AUX_DATA: bstr,
? K_NONCE : bstr,
? K_AUTH_TAG : bstr,
? K_AAD : bstr,
*uint => any ; extensions
}
K_RESERVED = 0
K_KEY_PROVIDER = 1
K_KEY_ID = 2
K_KEY_VERSION = 3
K_AUX_DATA = 4
K_NONCE = 5
K_AUTH_TAG = 6
K_AAD = 7
; extend here
According to @cabo my CDDL is incorrect because it translates to textual, rather than integer, map keys, e.g. "K_NONCE"
and not 5. My file is accepted because of the "extensions" line. However it should have failed validation anyway, because the first two fields (K_KEY_PROVIDER
and K_KEY_ID
) are mandatory in the schema, but missing from the CBOR file.
Errors are only being handled one at a time and the line and column numbers are not properly tracked during parsing and error handling. Errors should be tolerated and the line and column numbers should correctly identify where the errors are being detected.
tester = $$vals
$$vals //= 18
$$vals //= 12
gives a error parsing CDDL: incremental parsing error
, also
tester = $$vals
$$vals //= ( 18 )
$$vals //= ( 12 )
gives a error parsing CDDL: incremental parsing error
tester = $$vals
$$vals //= ( 18 , )
$$vals //= ( 12 , )
works (although unable to validate due to #116 (comment)).
If I read the specification correctly, a type
should be able to be coerced into a group
(with one grpent
) during semantic analysis and/or during validation (if needed). Or the other way around: everything is a group until it is not: trying to coerce it to a type and checking if it can be.
In the following block of CDDL:
top-level = top-group
;; This fails to parse with the newline after the "=>"
top-group //= (identifier-a =>
int)
;; It parses without the newline, or if the newline is before the
;; arrow.
top-group //= (identifier-b => int)
top-group //= (identifier-c
=> int)
identifier-a = 1
identifier-b = 2
identifier-c = 3
The first top-group
declaration cause a parse error. It seems to require both using an identifier as the key, and having the newline after the arrow. It is easy to work around by either moving the arrow to the new line, or just joining the lines.
error: parser errors
┌─ input:4:31
│
4 │ top-group //= (identifier-a =>
│ ╭──────────────────────────────^
5 │ │ int)
│ ╰^ invalid group entry syntax
Error: "Parser error"
I'm getting conflicting results of 'valid' cddl from the gem implementation of cddl and this one. The following example fails to validate using your rust CDDL tool but does using the gem tool. The following CDDL is used to validate a JSON-LD document where the value of the @context
is either a sting or an array of items where the first item as stated below followed by one or more URIs
. My understanding was the ~
is used to unwrap the type and remove the necessary CBOR tag (32).
document = {
@context : "https://www.example.com/ns/v1" / [ "https://www.example.com/ns/v1", 1* ~uri ]
}
Thoughts? Thanks!
Add support for RFC 8742 CBOR Sequences.
Hi,
Thanks for publishing this crate!
I'm investigating CDDL validation for Python. Lacking an existing library in Python, wrapping a safe implementation in Rust seems like a much better approach than C/C++, and your library came up. The README has a caveat ("personal learning exercise" etc.), but:
cddl-cat
(https://github.com/ericseppanen/cddl-cat) which seems a lot less featureful.unsafe
so presumably there's a limit to how much can go wrong (a panic, worst case? and I believe the Python wrapper will just turn that into a Python exception).So perhaps that warning is no longer valid? In which case, perhaps it should be removed.
I can't build packages that depend on cddl from crates.io, because the annotate-snippets dependency no longer works.
Can you push a new release?
Member keys of table types are not properly parsed. For example: { ? [ test ]: tstr, }
Seems like cddl is unable to validate any CBOR binary that uses non-standard simple values, instead producing
Validation of "filename.cbor" failed
error parsing cbor: unassigned type at offset X
As far as I understand this is due to serde_cbor intentionally producing parser error when it encounters any simple value it doesn't understand.
Is there any workaround for that or the fix would be to replace serde_cbor with other library?
One of my tests for the validator involves passing in an empty slice (originating in a Python byte string).
In 0.9.0beta0, this gave a validation error.
In beta1, it panics:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Io(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })', /home/itamarst/.cargo/registry/src/github.com-1ecc6299db9ec823/cddl-0.9.0-beta.1/src/validator/mod.rs:169:76
stack backtrace:
0: rust_begin_unwind
at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
1: core::panicking::panic_fmt
at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
2: core::result::unwrap_failed
at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
3: core::result::Result<T,E>::unwrap
at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1295:23
4: cddl::validator::validate_cbor_from_slice
at /home/itamarst/.cargo/registry/src/github.com-1ecc6299db9ec823/cddl-0.9.0-beta.1/src/validator/mod.rs:169:38
...
In general I would suggest never using unwrap()
or expect()
in library APIs unless it's utterly impossible to avoid, since panics are very problematic for library users.
A common pattern is to use the same schema to validate multiple documents. In the current API, this requires a bunch of work involving the innards of the implementation:
A nicer API would be something like:
let cddl_schema = CDDLSchema::from_slice(my_schema_bytes);
for document in documents {
cdd_schema.validate(document)?
}
The CDDL of the core-href draft currently does not validate in the online service unless the .feature
is removed. Same goes for the JC<>
example.
Docs say that .feature is supported and on by default; it appears that it is not on on the web service because it gives errors like:
expected rule identifier followed by an assignment token '=', '/=' or '//='
Could that be enabled for the web service?
(By the way, the README still calls the spec for .feature draft-ietf-cbor-cddl-control
; it has been promoted to RFC 9165 since then).
Extend parser to be more error tolerant based on incomplete CDDL. This is required for implementing any sort of language server functions for IDE support.
Hi,
I'm wrapping your library for Python, and initial setup failed to build:
error: expected item, found `"serde_json requires that either `std` (default) or `alloc` feature is enabled"`
--> /home/itamarst/.cargo/registry/src/github.com-1ecc6299db9ec823/serde_json-1.0.69/src/features_check/error.rs:1:1
|
1 | "serde_json requires that either `std` (default) or `alloc` feature is enabled"
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected item
I imagine this is a missing feature flag in the serde_json
dependency in Cargo.toml
. I can workaround it locally by explicitly adding serde_json
as a dependency, but I assume this is something that will impact other people.
Incorporate position into the lexer and parser to provide users with more tangible error messages.
Need to incorporate documentation comments in the source.
The validation logic is incredibly confusing to new contributors. The first attempt was really just to get things working, but the resulting code is less than ideal. This issue aims to track all activities related to refactoring this logic to make it much more readable.
Consider the following schema:
reputation-object = {
application: text
reputons: [* reputon]
}
The reputon
type is never defined, and yet the validation code seems perfectly happy with the following document:
{"application": "blah", "reputons": [{"reputon": "xx"}
If this is invalid behavior, great, that can just be fixed.
But perhaps this is valid according to the RFC, and if so that's a very bad feature in a schema language, because a typo in a type name means validation suddenly doesn't happen. In which case perhaps a strict mode would be useful where all types must be explicitly specified? (But I really hope this isn't valid according to the RFC...).
The CDDL spec says:
When applied to an unsigned integer, the ".size" control restricts
the range of that integer by giving a maximum number of bytes that
should be needed in a computer representation of that unsigned
integer. In other words, "uint .size N" is equivalent to
"0...BYTES_N", where BYTES_N == 256**N.
audio_sample = uint .size 3 ; 24-bit, equivalent to 0...16777216
Using version 0.9.0-beta.1
:
The contents of size.cddl
start = Record
Record = {
id: Id
}
Id = uint .size 8
$ echo '{ "id": 5 }' | cddl validate --cddl size.cddl --stdin
[ERROR] Validation from stdin failed: error validating at JSON location /id: expected value .size 8, got 5
I get similar obscure errors when I apply the .size
control to the bytes
type.
I am wrapping the library with Python, using PyO3. PyO3's macros for wrapping classes aren't happy with having a lifetime parameter, so having a CDDL
attribute is difficult (I don't want to parse the schema from scratch each time I validate a document).
More broadly, having a lifetime identifier on the CDDL
struct just makes using it annoying since it percolates to everything.
I am very much not an expert, but I suspect this could be fixed simply by having the parser own the lexer and the original schema string?
Compile to WebAssembly target for use in the browser.
Tags are lexed, but should be parsed to address tagged data items with containing data types ... ABNF: "#" "6" ["." uint] "(" S type S ")"
. At the moment, tags of this format aren't properly parsed.
sargun:oci2 sargun$ ./cddl-darwin-amd64 --version
cddl 0.8.5
sargun:oci2 sargun$ ./cddl-darwin-amd64 compile-cddl --cddl hello.cddl
hello.cddl is conformant
The contents:
; 1868785970 is "oci2" as an integer
oci2 = #6.1868785970({
; A given version of the spec may use a specific set
; of hash schemes, file layouts, etc.. Therefore in order
; to allow for multiple versions of the schema to exist
; simultaneously, a user can quickly read this as the
; basis of comparison.
version: uint,
files: {
*filename => file,
}
})
; Consider restricting this
filename = tstr
file = {
mode: mode,
(
? uid: unsigned,
? gid: unsigned //
? username: tstr,
? groupname: tstr
)dwadawda
; Access time
? atime: tdate,
; Modification time
? mtime: tdate,
content: content,
}
content = {
$$content,
}
$$content //= (
type: "regularfile",
regularfile: [
; A 0 lengthed file may omit the hash.
size: uint,
; Blake 3 256-bit hash
? b3-256: bstr .size 256,
; Because this is a vector, it would require
; revving the specification.
;
; TODO: Consider adding new hash types.
; TODO: Consider adding holes.
],
)
$$content //= (
type: "directory",
directory: []
)
$$content //= (
type: "link",
link: [
target: tstr,
]
)
$$content //= (
type: "symlink",
symlink: [
target: tstr,
]
)
$$content //= (
type: "character",
character: [
major: uint .le 18446744073709551615,
minor: uint .le 18446744073709551615,
]
)
$$content //= (
type: "block",
block: [
major: uint .le 18446744073709551615,
minor: uint .le 18446744073709551615,
]
)
$$content //= (
type: "fifo",
block: []
)
rwx = [
read: bool,
write: bool,
execute: bool,
]
mode = [
user: rwx,
group: rwx,
other: rwx,
setuid: bool,
setgid: bool,
sticky: bool,
]
The part that's "dwadawda" is invalid.
What is the reason behind the validator
module not being exported for wasm targets? What would be required to enable that functionality or is it not possible at all?
Thank you!
Implement the additional proposed control operators for CDDL per https://datatracker.ietf.org/doc/html/draft-ietf-cbor-cddl-control-05. Working being tracked in #79. CC @cabo
Given that https://github.com/pyfisch/cbor has been archived and is no longer being maintained, and due to issues such as #90, this issue will ensure a proper CBOR replacement library is implemented.
When I try to define a port number field like so:
example = {
? port: (uint .lt 65536) .default 5683
}
or like so
example = {
? port: 0..65535 .default 5683
}
I get an error when I try to validate the following JSON object:
{
"port": 5682
}
The error I get looks something like this:
error validating at cddl location "" and JSON location : CDDL member key must be string data type. got 5683
Using the example for .default
and another control operator from the specification actually also fails.
Much of the size of this crate can be attributed to the regex
crate dependency. Only the needed features of regex
should be enabled per the instructions here
Implement a JSON data faker from CDDL per one of the original project goals outlined in the README
There's some odd formatting behavior with comments after trailing commas in member key group entries.
The parsing of CDDL input foo = (int / float)
changed in release 0.5.2. The choice operator no longer seems to work as intended.
Reading through the rfc, the only place the /
operator is allowed is here:
type = type1 *(S "/" S type1)
So I would expect the resulting ast to contain a Type
with two Type1
elements in type_choices
. This is what I see in 0.5.1.
In 0.5.2 I am seeing something different: a Group
with one GroupChoice
containing two GroupEntry
elements. I don't think that's right; it's what one would expect if I had specified foo = (int, float)
.
In fact, that's probably a better way to demonstrate the problem: the ast for (int, float)
is the same as for (int / float)
.
abbreviated ast from 0.5.1
Type {
type_choices: [
Type1 {
type2: Typename {
ident: Identifier {
ident: "int",
socket: None,
},
generic_arg: None,
},
operator: None,
},
Type1 {
type2: Typename {
ident: Identifier {
ident: "float",
socket: None,
},
generic_arg: None,
},
operator: None,
},
],
},
abbreviated ast from 0.5.2
Group {
group_choices: [
GroupChoice {
group_entries: [
(
TypeGroupname {
ge: TypeGroupnameEntry {
occur: None,
name: Identifier {
ident: "int",
socket: None,
},
generic_arg: None,
},
},
false,
),
(
TypeGroupname {
ge: TypeGroupnameEntry {
occur: None,
name: Identifier {
ident: "float",
socket: None,
},
generic_arg: None,
},
},
false,
),
],
},
],
}
Hi,
Continuing investigation of cddl
—support for same schema with both JSON and CBOR is great, but there's the problem of bytes. The CDDL RFC says "don't support bytes in schema language", which OK, that's an approach. But another alternative is to say "if schema says bytes, expectation is that in JSON document this will be base64-encoded bytes in a string." And then you could validate JSON documents even with a schema that had bytes, by converting to bytes as part of validation.
I imagine this would have to be a two-step process:
.size
.Since this not quite compatible with the RFC (arguably it is compatible, in that RFC says "don't use bstr in schema" so this is a superset), might want such a mode hidden behind an option. Some questions:
(Still investigating if this is an actual requirement for the project, or just a nice to have; if this takes more than 60 seconds to answer "not sure" is a fine answer for both questions.)
Unknown suffixes (not cbor
or json
) are recognized and there's a correct error message.
Seen on MacOS.
The CBOR validation function is incomplete. To the extent possible, CBOR validation should be properly implemented. Now that pyfisch/cbor#172 has been merged, it should be possible to better implement CBOR validation.
Thank you for this fantastic implementation of CDDL! ✨
I've stumbled upon the following issue and wonder if it might be a bug?
The following CDDL schema (wrongly?) allows maps as values in the fields
map even though only tstr
types are permitted:
message = {
fields: {
+ tstr => tstr
}
}
This CBOR here (with diagnostic JSON) gets accepted by this schema while I would expect an error:
A1666669656C6473A16474657374A164546578746F48656C6C6F2C204D65737361676521
{
"fields": {
"test": {
"Text": "Hello, Message"
}
}
}
This crate relies on the regex crate for parsing regex in CDDL. However, it only supports PCRE-like regex and doesn't require escaping of certain special characters. The parsed regex strings from CDDL should therefore be updated for proper parsing by the regex crate.
Named capture groups should also be prepended with a P
as follows: (?P<name>)
With repeated formatting, this:
; hello
world = "hello world"
becomes this:
; hello
world = "hello world"
and then this:
; hello
world = "hello world"
and so on.
Add support for the .bits control operator
To support #36 and provide a foundation for both JSON/CBOR validation and JSON (and JSON schema) generation, a visitor pattern should be implemented for walking the CDDL AST.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.