Giter Club home page Giter Club logo

jsontestsuite's Introduction

JSON Parsing Test Suite

A comprehensive test suite for RFC 8259 compliant JSON parsers

This repository was created as an appendix to the article Parsing JSON is a Minefield 💣.

/parsers/

This directory contains several parsers and tiny wrappers to turn the parsers into JSON validators, by returning a specific value.

  • 0 the parser did accept the content
  • 1 the parser did reject the content
  • >1 the process did crash
  • timeout happens after 5 seconds

/test_parsing/

The name of these files tell if their contents should be accepted or rejected.

  • y_ content must be accepted by parsers
  • n_ content must be rejected by parsers
  • i_ parsers are free to accept or reject content

/test_transform/

These files contain weird structures and characters that parsers may understand differently, eg:

  • huge numbers
  • dictionaries with similar keys
  • NULL characters
  • escaped invalid strings

These files were used to produce results/transform.html.

/run_tests.py

Run all parsers with all files:

$ python3 run_tests.py

Run all parsers with a specific file:

$ python3 run_tests.py file.json

Run specific parsers with all files:

$ echo '["Python 2.7.10", "Python 3.5.2"]' > python_only.json
$ python3 run_tests.py --filter=python_only.json

The script writes logs in results/logs.txt.

The script then reads logs.txt and generates results/parsing.html.

/results/

JSON Parsing Tests

jsontestsuite's People

Contributors

0xced avatar chulkilee avatar crisman avatar daxim avatar dhobsd avatar duelafn avatar erickt avatar federicoceratto avatar hlian avatar jamie-pate avatar jbee avatar jddurand avatar jgilje avatar jhwoodyatt avatar juliusmusseau avatar komuw avatar krono avatar leonidas-from-xiv avatar michel-kraemer avatar mmastrac avatar nst avatar obones avatar rioderelfte avatar rx14 avatar step- avatar stig avatar swissquote-nst avatar t-b avatar tonyg avatar williamthome avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jsontestsuite's Issues

Add category of tests for JSON extensions?

The RFC permits extensions. For example, jq accepts some representations of infinity, and is a bit liberal about number representations. A number of n_* tests could be labeled as "failure might be due to the parser supporting extensions".

From RFC7159:

  1. Parsers

A JSON parser transforms a JSON text into another representation. A
JSON parser MUST accept all texts that conform to the JSON grammar.
A JSON parser MAY accept non-JSON forms or extensions.

Add parsing timings

It would be great if the test-suit could repeat each test until statistical significance is reached and report the timings with the standard deviations.

This would allow to use the testsuite not only to asses the conformance and extensions support of a parser but also to asses the performance of the different parsers on corner cases.

Test n_string_iso_latin_1.json on Java Jackson actually fails

import com.fasterxml.jackson.databind.ObjectMapper;
public class Main {
    public static void main(String[] args) throws IOException {
        new ObjectMapper().readTree("[\"é\"]".getBytes("ISO-8859-1"));
    }
}

It is marked as "succeed but should fail" in the table but this throws the following exception:

Exception in thread "main" com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 middle byte 0x22
 at [Source: [B@10b48321; line: 1, column: 5]
    at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1702)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:558)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3548)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3555)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeUtf8_3fast(UTF8StreamJsonParser.java:3361)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2517)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2465)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:315)
    at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeArray(JsonNodeDeserializer.java:283)
    at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:71)
    at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
    at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3798)
    at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2404)

I didnt test other encoding failures but this seems wrong.
Thanks for this study!

`y_string_nonCharacterInUTF-8_U%2B1FFFF.json` does not contain U+1FFFF

It contains U+1BFFF.

Content

od -t u1 tests/input/y_string_nonCharacterInUTF-8_U+1FFFF.json
0000000  91  34 240 155 191 191  34  93
0000010

Decoding

240 \360 11'110'000  -- Lead 4      '000 \0
155 \233 10'011'011  -- Trailer: 011'011 \33
191 \277 10'111'111  -- Trailer: 111'111 \77
191 \277 10'111'111  -- Trailer: 111'111 \77

Assembly

000'011'011'111'111'111'111 0o0337777
 0'0001'1011'1111'1111'1111 0x1BFFF

Result

1bfff != 1ffff

Fix

F = 1'111
B = 1'011

\3 -> \7

\0377777

240 \360 11'110'000  -- Lead 4      '000 \0
159 \237 10'011'111  -- Trailer: 011'111 \37 ** from 155 (+4)
191 \277 10'111'111  -- Trailer: 111'111 \77
191 \277 10'111'111  -- Trailer: 111'111 \77

Add Postgres JSONB parser support

I think this is worth having since it's (yet another) json parser written from scratch. It mostly is unsurprising but has a couple quirks. a) Because Postgres is very picky about UTF8 encoding it refuses almost all the grey area cases having to do with encoding issues and b) Postgres cannot handle \u0000 due to internal code issues it fails those test cases.

I sent a pull request. It has a couple limitations: there are no crashes but if there were the tests would come up with weird results as the server crashing would cause all or many subsequent tests to fail. Also it just uses whatever server is running that it finds it can connect to -- by default that's anything running locally on the default port 5432. There's no attempt to check the server version which is a bit unfortunate. The server behaviour is unlikely to change any time soon though so that shouldn't cause any problems in practice.

Executable bit set on non-executable files

A seemingly random set of files (including many JSON files and C source files) are set executable.
I run Linux and use zsh, but given an environment with a fairly similar chmod command and shell globbing, this would be easy to partially fix:

chmod a-x test_parsing/* test_transform/* parsers/**/*.{c,h}{,pp}

Tests n_172 and n_175 are the same

Tests like parsers/test_ccan_json/json/_test/nst_files/n_31.json (of which there are many) fail with jq only because jq permits multiple JSON texts in one input stream.

I'm not saying this is permissible JSON, just that it's an entire category of tests that jq will always fail (by design), so it would be best to separate them, kinda like i_* tests.

$ sha256sum parsers/test_ccan_json/json/_test/nst_files/n_17[25].json
c28f64ade65fab04e9ced46a826c6779188f7754c604fb3f4c3e8b80e3418148  parsers/test_ccan_json/json/_test/nst_files/n_172.json
c28f64ade65fab04e9ced46a826c6779188f7754c604fb3f4c3e8b80e3418148  parsers/test_ccan_json/json/_test/nst_files/n_175.json
$ 

UTF-16 no-BOM test cases should be changed to implementation-defined

Parsers should not be expected to detect UTF-16 in the absence of a BOM. A UTF-16 encoding of ASCII characters is valid UTF-8, and any parser that operates on a text stream (as opposed to a byte stream) is independent of encoding anyway. So they should be changed to implementation-defined.

Case insensitive object keys

I am currently the maintainer of cJSON and I want to note that before I started, cJSON was treating all strings in object keys in a case insensitive way (not even unicode aware, just tolower from the C standard library).

It seems like your blog post and tests don't mention this.

This is another kind of incorrect behavior of a JSON library to look out for (another mine in the minefield so to say.

`n_object_pi_in_key_and_trailing_comma.json` does not contain PI as indicated

Also, is the n-indicator for the broken utf, or for the trailing comma ?

At offset 2 it contains a lone utf-8 continuation byte 185 (\271, 0b10'111'001)

hephaistos:(530) ~/Play/Marpa/WORK/src > ./tools/utf-viewer.tcl languages/json/tests/input/n/n_object_pi_in_key_and_trailing_comma.json 
[ 0] { 123 1 U+007b
[ 1] "  34 1 U+0022
[ 2] ¹ 185 0 ^
[ 3] "  34 1 U+0022
[ 4] :  58 1 U+003a
[ 5] "  34 1 U+0022
[ 6] 0  48 1 U+0030
[ 7] "  34 1 U+0022
[ 8] ,  44 1 U+002c
[ 9] } 125 1 U+007d

Jackson TestJSONParsing uses system default charset

This probably causes the failure in y_string_utf16.json.

See use of the String constructor without specifying the charset, this uses the system default charset https://github.com/nst/JSONTestSuite/blob/master/parsers/test_java_jackson_2_8_4/TestJSONParsing.java#L36

The proper implementation would be to wrap the byte[] in a ByteArrayInputStream and pass this to the parser directly, which would allow it to figure out the encoding on it's own based on the first few bytes.

Explore behavior of C++ parsers

C++ is a widely used programming language in the server space (when validating JSON needs to happen fast). There are a lot of C++ json parsers:

  • nlohmann/json: header only, easy to use, modern c++, not aiming at being the fastest (but still fast enough).
  • RapidJSON: header-only, aiming at being the fastest c++ parser (can use SSE2 and SSE4.2, see configuration options), can trade conformance for speed (see options).
  • folly/json: Facebook's json parser.
  • JSON Spirit: implemented using Boost.Spirit.
  • picojson: a lightweight JSON parser.
  • V8 JavaScript Engine: has a C++ JSON parser.

"\u0000" is in fact legal

From RFC7159:

Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: [...]

but parsers/test_ccan_json/json/_test/nst_files/n_223.json is an n_* test and contains "\u0000", which means it should be a y_* test, that is, expected to parse successfully.

Inconsistency in Unicode noncharacter tests

These two tests seem inconsistent: both test the handling of a noncharacter, but one has an i_ prefix and the other has a y_ prefix.

$ cat y_string_escaped_noncharacter.json | xxd
0000000: 5b22 5c75 4646 4646 225d                 ["\uFFFF"]
$ cat i_string_unicode_U+FFFE_nonchar.json | xxd
0000000: 5b22 5c75 4646 4645 225d                 ["\uFFFE"]

I think they should both be implementation-defined FWIW.

Can you test against c-json

Hi, I wrote a simple JSON parser and would love for you to include the results in your project.
You can get to it at https://github.com/amwales-888/c-json
On linux to build, type 'make' and copy the 'json' binary to the parse location, the format to run is.
'json filepath' it will return 0 on success and 1 on failure.

Separate n_* tests that are about multiple texts in one file

Tests like parsers/test_ccan_json/json/_test/nst_files/n_31.json (of which there are many) fail with jq only because jq permits multiple JSON texts in one input stream.

I'm not saying this is permissible JSON, just that it's an entire category of tests that jq will always fail (by design), so it would be best to separate them, kinda like i_* tests.

Inconsistent about invalid UTF-8

There are multiple tests that check how the parser handles encountering invalid UTF-8 sequences in the byte stream. Some of them are i_ tests, but some of them are n_ tests. I think the n_ tests are incorrect and should be turned into i_ tests, because it's perfectly reasonable for parsers to convert invalid UTF-8 sequences to U+FFFD instead of failing to parse.

Huge exponent inconsistency

Why is i_number_neg_int_huge_exp [-1e+9999] considered implementation-defined while y_number_real_neg_overflow [-123123e100000] required to parse?

perl parse logic is faulty

The perl parsers use faulty logic try to detect parse errors based on what the internal version of the JSON input is.

No license included in the distribution

I want to use the files to test an open source Perl module I'm writing, but there's no license associated with repository that I can see.

Is this open source? And if so, how can I distribute it?

Can't checkout repository on Windows

Two files contain invalid characters:

Cloning into 'JSONTestSuite'...
remote: Counting objects: 1241, done.
remote: Compressing objects: 100% (442/442), done.
remote: Total 1241 (delta 199), reused 1210 (delta 168), pack-reused 0
Receiving objects: 100% (1241/1241), 34.76 MiB | 1.41 MiB/s, done.
Resolving deltas: 100% (199/199), done.
Checking connectivity... done.
error: unable to create file test_parsing/n_structure_<.>.json (Invalid argument)
error: unable to create file test_parsing/n_structure_<null>.json (Invalid argument)
Checking out files: 100% (910/910), done.
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

Range and precision for extreme number values

Range and Precision - What about numbers with a huge amount of digits? According to RFC 7159, "A JSON parser MUST accept all texts that conform to the JSON grammar" (section 9). However, according to the same paragraph, "An implementation may set limits on the range and precision of numbers.". So, it is unclear to me whether parsers are allowed to raise errors when they meet extreme values such 1e9999 or 0.0000000000000000000000000000001.

You said it is unclear to me whether parsers are allowed to raise errors but all the extreme values tests are y_* tests. Since the specification is contradictory, I think these tests should be implementation defined instead.

Some parsers, such as jansson and Json.NET has chosen to raise errors:

if((value == HUGE_VAL || value == -HUGE_VAL) && errno == ERANGE) {
    /* Overflow */
    return -1;
}
reader = new JsonTextReader(new StringReader("1E+309"));
ExceptionAssert.Throws<JsonReaderException>(() => reader.Read(), "Input string '1E+309' is not a valid number. Path '', line 1, position 6.");

reader = new JsonTextReader(new StringReader("-1E+5000"));
ExceptionAssert.Throws<JsonReaderException>(() => reader.Read(), "Input string '-1E+5000' is not a valid number. Path '', line 1, position 8.");

"JavaScript" is too broad, needs more engines

Even being the birthplace of JSON, JavaScript still has several implementations and engines which behave differently.

The parser labeled "JavaScript" uses NodeJS, whichever version is currently installed indiscriminately. I tested all of the currently used node versions (4 to 10) and they all behave similarly, but this is worth documenting at least.

I think it'd be a good idea to provide tests for major browser engines too. With WebKit/V8, it's definitely possible. I did tests with PhantomJS, but it's a poor choice for multiple reasons. Couldn't figure how to run Headless Chrome properly, but it should be doable. For other engines/browsers it's trickier and may even be impossible.

PhantomJS (2.1.1 with WebKit 538.1) results compared with NodeJS 9.6.1 are:

image

The problem with phantom is that it's abandoned, incredibly slow (couldn't even fit into 5 seconds timeout in tests) and iirc uses a JS engine different from V8.

Perl parsers parsing files per line

The two current .pl parsers both use a while loop with <$fh> which will read multiline files one line at a time and try to parse those lines individually, rather than the whole file.

How to test JSON-for-VHDL?

Hello,

I wrote a JSON parser for the Hardware Description Language VHDL called JSON-for-VHDL. My repository contains 5 projects for 5 different vendor tools, to compile the VHDL sources and to parse 1-3 example files. One of the needed simulators is open source (GHDL) and could be used in your test suite.

How can we add this parser to your tests?

Edit:
As an alternative: Can I add your testcases (*.json files) via Git subtree into my repository and execute the tests locally?

Kind regards
Patrick

UTF-16 test cases don't use UTF-16

Of all the test_parsing cases with "UTF-16" in the name, only i_string_UTF-16LE_with_BOM.json is actually encoded as UTF-16, the rest all show up as ASCII. I assume they're all supposed to be UTF-16.

parsing_json.php improvements

(Apologies if this isn't the correct place, I couldn't find anything better.)

Just found http://seriot.ch/parsing_json.php. Great writeup, it's surprising how something so seemingly simple can have so many ways to screw up. I found a few possible improvements:

i_string_iso_latin_1.json | ["E9"]
n_string_invalid_utf-8.json | ["FF"]

As of #30, both are i_.

["\uD800\uD800"] makes some parsers go nuts. R jsonlite yields ["\U00010000"], while Ruby parser yields ["F0908080"]. I still don't get where this value comes from.

Overeager decoding of surrogate pairs. \uD800\uDC00 should yield \U00010000, I guess that one ignores the top 10 bits of the supposed surrogate-low? F0908080 is \U00010000 in UTF-8, again ignoring the top 10 bits.

How do I set the TimeoutExpired?

I have the latest version but when I run

$ python3 run_tests.py

I get

('--', '/home/peter/git/JSONTestSuite/parsers/test_Bash_JSON/JSON.sh')
Traceback (most recent call last):
  File "run_tests.py", line 646, in <module>
    run_tests(restrict_to_path)
  File "run_tests.py", line 292, in run_tests
    except subprocess.TimeoutExpired:
AttributeError: 'module' object has no attribute 'TimeoutExpired'

Move test cases into separate repository?

I've written a JSON parser in Dlang and was looking for a test suite that would contain extreme number test cases. You seem to have put a lot of effort into yours and I would love to have it around as a git submodule for unit testing, but with the included binaries the proportions are not quite to my taste, so to say. I'll instead download the .zip and extract only the test case folder. The upside of a submodule would be the direct link to your repository for updates. I guess it is a question of public demand and spare time.

Remove hardcoded paths to /Users/nst

I would like to run this myself but I cannot:

$ grep -rI '/Users/nst'
run_tests.py:BASE_DIR = "/Users/nst/Projects/JSONTestSuite/"
parsers/test_json-rust/README:cargo build && ./target/debug/tj /Users/nst/Desktop/in.json 
parsers/test_json.py:    with open('/Users/nst/Desktop/p.txt', 'wb') as f:
parsers/test_jsmn/test_jsmn/test_jsmn.xcodeproj/xcuserdata/nst.xcuserdatad/xcschemes/test_jsmn.xcscheme:            argument = "/Users/nst/Projects/dropbox/JSON/test_cases/y_100.json"
parsers/test_jsmn/test_jsmn/test_jsmn.xcodeproj/xcuserdata/nst.xcuserdatad/xcschemes/test_jsmn.xcscheme:            argument = "/Users/nst/Projects/dropbox/JSON/test_cases/n_structure_&lt;.&gt;.json"
parsers/test_jsonChecker2/jsonChecker2/jsonChecker2.xcodeproj/project.xcworkspace/contents.xcworkspacedata:      location = "self:/Users/nst/Projects/dropbox/JSON/test_jsonChecker2/jsonChecker2/jsonChecker2.xcodeproj">
parsers/test_jsonChecker2/jsonChecker2/jsonChecker2.xcodeproj/xcuserdata/nst.xcuserdatad/xcschemes/jsonChecker.xcscheme:            argument = "/Users/nst/Projects/dropbox/JSON/test_cases/y_number_0e1.json"
parsers/test_json-jq.py:jq_paths = ["/usr/local/bin/jq", "/Users/nst/bin/jq"]
parsers/test_json-jq.py:dir_path = "/Users/nst/Projects/dropbox/JSON/test_cases/"
parsers/test_TouchJSON/test_TouchJSON.xcodeproj/xcuserdata/nst.xcuserdatad/xcschemes/test_TouchJSON.xcscheme:            argument = "/Users/nst/Projects/dropbox/JSON/test_cases_content/object_same_key_unclear_values.json"
parsers/test_STJSON/STJSONTests/STJSONTests.swift:            //try data.write(to: URL(fileURLWithPath: "/Users/nst/Desktop/data.json"))
parsers/test_STJSON/STJSONTests/STJSONTests.swift:        let dir = "/Users/nst/Projects/dropbox/JSON/test_cases/"
parsers/test_json-rustc_serialize/rj/README:cd /Users/nst/Projects/dropbox/JSON/test_json-rustc_serialize/rj; cargo build ;/Users/nst/Projects/dropbox/JSON/test_json-rustc_serialize/rj/target/debug/rj /Users/nst/Projects/dropbox/JSON/test_cases/n_177.json
parsers/test_ObjCNSJSONSerializer/test_ObjCNSJSONSerializer.xcodeproj/xcuserdata/nst.xcuserdatad/xcschemes/test_ObjCNSJSONSerializer.xcscheme:            argument = "/Users/nst/Projects/dropbox/JSON/test_cases/i_number_neg_int_huge_exp.json "

Please make these into paths that are relative to the JSONTestSuite directory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.