serde-rs / json-benchmark Goto Github PK
View Code? Open in Web Editor NEWnativejson-benchmark in Rust
License: Apache License 2.0
nativejson-benchmark in Rust
License: Apache License 2.0
In an attempt to try to find a good JSON DOM library that supports borrowing, I came across this benchmark and almost used the json
crate. I found it odd that the crates.io listing has no README, yet it's such a popular crate and performs quite well!
Then I saw on github that it isn't maintained. Not a problem if it's done with development, as a JSON library certainly could be. But another issue points out a soundness issue that Miri also doesn't like.
Given that it's unmaintained and potentially unsound, I'm not sure how valid of a comparison it actually provides in today's JSON ecosystem. In fact, I worry that the mere presence of it on this list might be leading some developers to adopt the crate despite, in my opinion, it not being a safe crate to adopt given the open issues.
Should I update it from my own machine if I do stuff?
Results I'm getting now:
Running `target/release/json-benchmark`
DOM STRUCT
======= serde_json ======= parse|stringify === parse|stringify ===
data/canada.json 51.8ms 23.9ms 36.9ms 20.0ms
data/citm_catalog.json 30.2ms 3.3ms 20.6ms 1.7ms
data/twitter.json 10.3ms 1.4ms 8.4ms 1.4ms
======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json 42.0ms 42.3ms
data/citm_catalog.json 16.3ms 2.0ms
data/twitter.json 6.7ms 1.1ms
==== rustc_serialize ===== parse|stringify === parse|stringify ===
data/canada.json 39.7ms 68.8ms 48.0ms 65.0ms
data/citm_catalog.json 26.8ms 6.1ms 32.7ms 4.1ms
data/twitter.json 13.4ms 2.8ms 17.9ms 2.6ms
This is using a branch of json-rust
that can directly write to io::Write
types. The performance is somewhat aligned, although the difference is smaller (and serde_json
writing to a struct is faster).
It is accounted for by the fact that io::Write
doesn't implement a method that allows to write a single u8
, so all single token writes have to be wrapped into a &[u8]
slice which creates an unnecessary level of indirection.
The project now lives at https://github.com/simd-lite/simd-json
I did make an issue about it on the original nativejson-benchmark, thus far unanswered.
The conformance tests of the nativejson-benchmark aren't actually testing conformance with ECMA or RFC, despite what the README says. Taking the roundtrip tests:
roundtrip10.json
will fail on any implementation that doesn't preserve ordering of the keys, not required by the standard (I pass it by the virtue of the keys being in alphabetical order).roundtrip13.json
, roundtrip14.json
, roundtrip18.json
and roundtrip19.json
all require higher precision than IEEE 754 float64, this is not required by the standard.roundtrip20.json
requires that a floating point zero is different from an integer zero, and is represented with a fraction, this is not required by the standard.roundtrip21.json
again requires different representation of floating zero, and requires that negative zero is preserved, not requirements of the standard.roundtrip24.json
, roundtrip25.json
, roundtrip26.json
and roundtrip27.json
requires that e
notation is written with a lower case e
and that positive exponents are written without a plus. Any implementation that writes capital E
and/or adds a +
to positive exponents would fail this.Web browsers and Node.js would fail 7 of those tests, while being perfectly within the standard.
I don't know if my original issue is going to be resolved, but should we add a note that this is a conformance test as it is ported from nativejson-benchmark
and is not actually testing conformance to JSON standard as it is?
The rustc_serialize
library has a streaming / token-based parser, which can be used in advanced applications to implement very efficient json extraction while keeping the memory overhead of the parsing low; this is great for large documents that may not easily fit into RAM. It would be awesome to see benchmarks for this style of API.
@sunnygleason and me have put quite some work into improving simd-json.rs performance it would be great if the numbers could be updated with the current version of simd-json.rs.
Many thanks in advance!
Where possible.
Hi!
I am doing research about Profile-Guided Optimization (PGO) effects on different kinds of software - my current results are available here. I decided to PGO-optimize json-benchmark
and try to estimate the PGO performance boost for that kind of workload.
My test environment is Macbook M1 Pro, Ventura 13.4, with a connected charger (it's important since it affects the results). The results are collected with multiple runs of cargo run --release
and choosing the greater results (since the same way is described in the README file). PGO optimization was done with cargo-pgo. My results are the following.
Release:
======= serde_json ======= parse|stringify ===== parse|stringify ====
data/canada.json 440 MB/s 710 MB/s 650 MB/s 510 MB/s
data/citm_catalog.json 830 MB/s 940 MB/s 1400 MB/s 1210 MB/s
data/twitter.json 510 MB/s 1360 MB/s 970 MB/s 1490 MB/s
==== rustc_serialize ===== parse|stringify ===== parse|stringify ====
data/canada.json 250 MB/s 140 MB/s 210 MB/s 100 MB/s
data/citm_catalog.json 350 MB/s 390 MB/s 270 MB/s 470 MB/s
data/twitter.json 200 MB/s 600 MB/s 150 MB/s 670 MB/s
======= simd-json ======== parse|stringify ===== parse|stringify ====
data/canada.json 710 MB/s 770 MB/s 890 MB/s
data/citm_catalog.json 1890 MB/s 1070 MB/s 2290 MB/s
data/twitter.json 1820 MB/s 1630 MB/s 1420 MB/s
PGO-optimized:
======= serde_json ======= parse|stringify ===== parse|stringify ====
data/canada.json 590 MB/s 720 MB/s 1150 MB/s 530 MB/s
data/citm_catalog.json 1100 MB/s 1090 MB/s 1620 MB/s 1210 MB/s
data/twitter.json 600 MB/s 1370 MB/s 990 MB/s 1370 MB/s
==== rustc_serialize ===== parse|stringify ===== parse|stringify ====
data/canada.json 280 MB/s 150 MB/s 230 MB/s 110 MB/s
data/citm_catalog.json 450 MB/s 470 MB/s 340 MB/s 530 MB/s
data/twitter.json 220 MB/s 740 MB/s 170 MB/s 780 MB/s
======= simd-json ======== parse|stringify ===== parse|stringify ====
data/canada.json 740 MB/s 780 MB/s 990 MB/s
data/citm_catalog.json 1900 MB/s 1190 MB/s 2520 MB/s
data/twitter.json 1900 MB/s 1700 MB/s 1620 MB/s
According to these tests, PGO allows us to achieve better performance with parsing JSON files. Hope my tests can help someone. Probably would be a good idea to add the information about PGO to the libraries' documentation somewhere in a "Performance tuning" section.
Pushed to benches
, just need to cargo update
and rebuild. My hardware:
======= serde_json ======= parse|stringify === parse|stringify ===
data/canada.json 24.7ms 23.2ms 12.5ms 19.1ms
data/citm_catalog.json 17.3ms 3.3ms 7.4ms 1.6ms
data/twitter.json 7.1ms 1.4ms 4.8ms 1.4ms
======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json 19.3ms 17.1ms
data/citm_catalog.json 10.7ms 1.4ms
data/twitter.json 4.1ms 1.0ms
==== rustc_serialize ===== parse|stringify === parse|stringify ===
data/canada.json 38.8ms 65.9ms 46.9ms 65.4ms
data/citm_catalog.json 26.2ms 6.0ms 34.4ms 4.4ms
data/twitter.json 13.3ms 2.7ms 18.5ms 2.5ms
So, I've the json.rs domain at my disposal. If this benchmark can be made to dump a JSON (duh) file with the results of the benchmark, which then could be uploaded to github, I can put up some simple webpage on that address that grabs the JSON from the repo and draws the graphs for it in JavaScript. That way we can have something nice to present, without having to do much of manual updates every time, just build -> run -> commit and push.
How does that sound?
Edit: Actually given the formatting of the log, I might be able to visualize it by scraping the readme.
Traverse DOM and count the number of JSON types, total length of string, and total numbers of elements/members in array/objects.
Hi!
Seems like comparison isn't fair enough since you use outdated C++ compilers but quite new Rust compiler. Please update to the latest GCC10/Clang10.
Thank you.
Similar to memorystat.h.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.