Giter Club home page Giter Club logo

simdjsonsharp's Introduction

SimdJsonSharp: Parsing gigabytes of JSON per second

C# version of lemire/simdjson (by Daniel Lemire and Geoff Langdale - https://arxiv.org/abs/1902.08318) fully ported from C to C#, I tried to keep the same format and API). The library accelerates JSON parsing and minification using SIMD instructions (AVX2). C# version uses System.Runtime.Intrinsics API.

UPD: Now it's also available as a set of pinvokes on top of the native lib as a .NETStandard 2.0 library, thus there are two implementations:

  1. 1.5.0 Fully managed netcoreapp3.0 library (100% port from C to C#)
  2. 1.7.0 netstandard2.0 library with native lib (bindings are generated via xoofx/CppAst)

Benchmarks

The following benchmark compares SimdJsonSharp with .NET Core 3.0 Utf8JsonReader, Json.NET and SpanJson libraries. Test json files can be found here.

1. Parse doubles

Open canada.json and parse all coordinates as System.Double:

|          Method |     fileName |    fileSize |      Mean | Ratio |
|---------------- |------------- |-------------|----------:|------:|
|        SimdJson |  canada.json | 2,251.05 Kb |  4,733 ms |  1.00 |
|  Utf8JsonReader |  canada.json | 2,251.05 Kb | 56,692 ms | 11.98 |
|         JsonNet |  canada.json | 2,251.05 Kb | 70,078 ms | 14.81 |
|    SpanJsonUtf8 |  canada.json | 2,251.05 Kb | 54,878 ms | 11.60 |

2. Count all tokens

|            Method |           fileName |    fileSize |         Mean | Ratio |
|------------------ |------------------- |------------ |-------------:|------:|
|          SimdJson | apache_builds.json |   127.28 Kb |     99.28 us |  1.00 |
|    Utf8JsonReader | apache_builds.json |   127.28 Kb |    226.42 us |  2.28 |
|           JsonNet | apache_builds.json |   127.28 Kb |    461.30 us |  4.64 |
|      SpanJsonUtf8 | apache_builds.json |   127.28 Kb |    168.08 us |  1.69 |
|                   |                    |             |              |       |
|          SimdJson |        canada.json | 2,251.05 Kb |  4,494.44 us |  1.00 |
|    Utf8JsonReader |        canada.json | 2,251.05 Kb |  6,308.01 us |  1.40 |
|           JsonNet |        canada.json | 2,251.05 Kb | 67,718.12 us | 15.06 |
|      SpanJsonUtf8 |        canada.json | 2,251.05 Kb |  6,679.82 us |  1.49 |
|                   |                    |             |              |       |
|          SimdJson |  citm_catalog.json | 1,727.20 Kb |  1,572.78 us |  1.00 |
|    Utf8JsonReader |  citm_catalog.json | 1,727.20 Kb |  3,786.10 us |  2.41 |
|           JsonNet |  citm_catalog.json | 1,727.20 Kb |  5,903.38 us |  3.75 |
|      SpanJsonUtf8 |  citm_catalog.json | 1,727.20 Kb |  3,021.13 us |  1.92 |
|                   |                    |             |              |       |
|          SimdJson | github_events.json |    65.13 Kb |     46.01 us |  1.00 |
|    Utf8JsonReader | github_events.json |    65.13 Kb |    113.80 us |  2.47 |
|           JsonNet | github_events.json |    65.13 Kb |    214.01 us |  4.65 |
|      SpanJsonUtf8 | github_events.json |    65.13 Kb |     89.09 us |  1.94 |
|                   |                    |             |              |       |
|          SimdJson |     gsoc-2018.json | 3,327.83 Kb |  2,209.42 us |  1.00 |
|    Utf8JsonReader |     gsoc-2018.json | 3,327.83 Kb |  4,010.10 us |  1.82 |
|           JsonNet |     gsoc-2018.json | 3,327.83 Kb |  6,729.44 us |  3.05 |
|      SpanJsonUtf8 |     gsoc-2018.json | 3,327.83 Kb |  2,759.59 us |  1.25 |
|                   |                    |             |              |       |
|          SimdJson |   instruments.json |   220.35 Kb |    257.78 us |  1.00 |
|    Utf8JsonReader |   instruments.json |   220.35 Kb |    594.22 us |  2.31 |
|           JsonNet |   instruments.json |   220.35 Kb |    980.42 us |  3.80 |
|      SpanJsonUtf8 |   instruments.json |   220.35 Kb |    409.47 us |  1.59 |
|                   |                    |             |              |       |
|          SimdJson |      truenull.json |    12.00 Kb |  16,032.6 ns |  1.00 |
|    Utf8JsonReader |      truenull.json |    12.00 Kb |  58,365.2 ns |  3.64 |
|           JsonNet |      truenull.json |    12.00 Kb |  60,977.3 ns |  3.80 |
|      SpanJsonUtf8 |      truenull.json |    12.00 Kb |  24,069.2 ns |  1.50 |

3. Json minification:

|                Method |           fileName |    fileSize |         Mean | Ratio |
|---------------------- |------------------- |------------ |-------------:|------:|
|  SimdJsonNoValidation | apache_builds.json |   127.28 Kb |     186.8 us |  1.00 |
|              SimdJson | apache_builds.json |   127.28 Kb |     262.5 us |  1.41 |
|               JsonNet | apache_builds.json |   127.28 Kb |   1,802.6 us |  9.65 |
|                       |                    |             |              |       |
|  SimdJsonNoValidation |        canada.json | 2,251.05 Kb |   4,130.7 us |  1.00 |
|              SimdJson |        canada.json | 2,251.05 Kb |   7,940.7 us |  1.92 |
|               JsonNet |        canada.json | 2,251.05 Kb | 181,884.0 us | 44.06 |
|                       |                    |             |              |       |
|  SimdJsonNoValidation |  citm_catalog.json | 1,727.20 Kb |   2,346.9 us |  1.00 |
|              SimdJson |  citm_catalog.json | 1,727.20 Kb |   4,064.0 us |  1.75 |
|               JsonNet |  citm_catalog.json | 1,727.20 Kb |  34,831.0 us | 14.84 |

Usage

The C# API is not stable yet and currently fully copies the original C-style API thus it involves some Unsafe magic including pointers.

Add nuget package SimdJsonSharp.Managed (for .NET Core 3.0) or SimdJsonSharp.Bindings for a .NETStandard 2.0 package (.NET 4.x, .NET Core 2.x, etc).

dotnet add package SimdJsonSharp.Bindings
or
dotnet add package SimdJsonSharp.Managed

The following sample parses a file and iterate numeric tokens

byte[] bytes = File.ReadAllBytes(somefile);
fixed (byte* ptr = bytes) // pin bytes while we are working on them
using (ParsedJson doc = SimdJson.ParseJson(ptr, bytes.Length))
using (var iterator = doc.CreateIterator())
{
    while (iterator.MoveForward())
    {
        if (iterator.GetTokenType() == JsonTokenType.Number)
            Console.WriteLine("integer: " + iterator.GetInteger());
    }
}

UPD: for SimdJsonSharp.Bindings types are postfixed with 'N', e.g. ParsedJsonN

As you can see the API looks similiar to Utf8JsonReader that was introduced recently in .NET Core 3.0

Also it's possible to just validate JSON or minify it (remove whitespaces, etc):

string someJson = ...;
string minifiedJson = SimdJson.MinifyJson(someJson);

Requirements

  • AVX2 enabled CPU

simdjsonsharp's People

Contributors

egorbo avatar tkp1n avatar tornhoof avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simdjsonsharp's Issues

Internal Bug

I use SimdJsonSharp.Managed.

Exception has occurred: CLR/System.InvalidOperationException
An unhandled exception of type 'System.InvalidOperationException' occurred in SimdJsonSharp.Managed.dll: 'Internal bug'
   at SimdJsonSharp.stage1_find_marks.find_structural_bits(Byte* buf, UInt64 len, ParsedJson pj)
   at SimdJsonSharp.SimdJson.JsonParse(Byte* jsonData, UInt64 length, ParsedJson pj, Boolean reallocIfNeeded)
   at SimdJsonSharp.SimdJson.ParseJson(Byte* jsonData, Int32 length, Boolean reallocIfNeeded)
   at JsonTest.Program.Main(String[] args) in c:\Users\hez20\Desktop\JsonTest\Program.cs:line 14

test code:

using System;
using System.Text;
using SimdJsonSharp;

namespace JsonTest
{
    class Program
    {
        static unsafe void Main(string[] args)
        {
            var p = Encoding.UTF8.GetBytes("{\"test\": 1}");
            fixed (byte* ptr = p)
            {
                var json = SimdJson.ParseJson(ptr, p.Length);
            }
            return;
        }
    }
}

Consider upgrading bindings to simdjson 0.4.0

Version 0.4 of simdjson is now available

Highlights

  • Test coverage has been greatly improved and we have resolved many static-analysis warnings on different systems.

New features:

  • We added a fast (8GB/s) minifier that works directly on JSON strings.
  • We added fast (10GB/s) UTF-8 validator that works directly on strings (any strings, including non-JSON).
  • The array and object elements have a constant-time size() method.

Performance:

  • Performance improvements to the API (type(), get<>()).
  • The parse_many function (ndjson) has been entirely reworked. It now uses a single secondary thread instead of several new threads.
  • We have introduced a faster UTF-8 validation algorithm (lookup3) for all kernels (ARM, x64 SSE, x64 AVX).

System support:

  • C++11 support for older compilers and systems.
  • FreeBSD support (and tests).
  • We support the clang front-end compiler (clangcl) under Visual Studio.
  • It is now possible to target ARM platforms under Visual Studio.
  • The simdjson library will never abort or print to standard output/error.

Version 0.3 of simdjson is now available

Highlights

  • Multi-Document Parsing: Read a bundle of JSON documents (ndjson) 2-4x faster than doing it individually. API docs / Design Details
  • Simplified API: The API has been completely revamped for ease of use, including a new JSON navigation API and fluent support for error code and exception styles of error handling with a single API. Docs
  • Exact Float Parsing: Now simdjson parses floats flawlessly without any performance loss (simdjson/simdjson#558).
    Blog Post
  • Even Faster: The fastest parser got faster! With a shiny new UTF-8 validator
    and meticulously refactored SIMD core, simdjson 0.3 is 15% faster than before, running at 2.5 GB/s (where 0.2 ran at 2.2 GB/s).

Minor Highlights

  • Fallback implementation: simdjson now has a non-SIMD fallback implementation, and can run even on very old 64-bit machines.
  • Automatic allocation: as part of API simplification, the parser no longer has to be preallocated-it will adjust automatically when it encounters larger files.
  • Runtime selection API: We've exposed simdjson's runtime CPU detection and implementation selection as an API, so you can tell what implementation we detected and test with other implementations.
  • Error handling your way: Whether you use exceptions or check error codes, simdjson lets you handle errors in your style. APIs that can fail return simdjson_result, letting you check the error code before using the result. But if you are more comfortable with exceptions, skip the error code and cast straight to T, and exceptions will be thrown automatically if an error happens. Use the same API either way!
  • Error chaining: We also worked to keep non-exception error-handling short and sweet. Instead of having to check the error code after every single operation, now you can chain JSON navigation calls like looking up an object field or array element, or casting to a string, so that you only have to check the error code once at the very end.

Installing via nuget

Hi,
I wanted to try out this awesome piece of code, but not 100% why nuget installation fails with following message:

Detected package downgrade: Microsoft.NETCore.Platforms from 3.0.0-preview3.19115.9 to 3.0.0-preview.19073.11. Reference the package directly from the project to select a different version. 
 OriginClient -> SimdJsonSharp 1.0.3 -> Microsoft.NETCore.Platforms (>= 3.0.0-preview3.19115.9) 
 OriginClient -> Microsoft.NETCore.Platforms (>= 3.0.0-preview.19073.11)
Package restore failed. Rolling back package changes for 'OriginClient'.

I've got .NET Core 3.0.0-preview2 installed on my machine, it's not yet compatible with it?
Thanks!

Consider updating to simdjson 0.2.0

The library simdjson has a new major release (0.2.0). Major changes:

  • Support for 64-bit ARM processors, can run under iOS (iPhone).
  • Runtime dispatching on x64 processors (switches to SSE on older x64 processors, uses AVX2 when available). Supports processors as far back as the Intel Westmere.
  • More accurate number parsing.
  • Fixes most warnings under Visual Studio.
  • Several small bugs have been fixed.
  • Better performance in some cases.
  • Introduces a JSON Pointer interface https://tools.ietf.org/html/rfc6901
  • Better and more specific error messages (with optional textual descriptions).
  • valgrind clean.
  • Unified code style (LLVM).

nuget package

Shout when this is published to nuget so we can play :)

Update to .NET Standard 2.1

Now that .NET Standard and .NET Core 3.0 is out, do you have any plans to further optimize the library by using ArrayPool and Span<T>?

Pre AVX2

Nice work. As you know RyuJIT can test for ISA level support at codegen time. Do you plan to attempt to offer a SIMD codepath for CPU without AVX2? They are still fairly common I guess.

@tannergooding

Facing a strong name build error

Using SimdJsonSharp.Bindings for a project using .NET 4.0 framework on Windows 11.

Facing the following build error

CSC : error CS8002: Referenced assembly 'SimdJsonSharp.Bindings, Version=1.7.0.0, Culture=neutral, PublicKeyToken=null' does not have a strong name.

image

Based on my investigation, this usually happens when the strongly named assembly is referencing an assembly(-ies) that does not have a strong name. A solution here would be to sign the assembly package and all the referenced assemblies.

This will pose a security risk for any future users. Can you please look into this?
If you need any further details to reproduce the issue, please let me know. I'd be happy to contribute and help fix it as well.

AccessViolations caused by missing padding (SIMDJSON_PADDING with AVX2)

Using SimdJsonSharp.Managed nuget with a .Net5 console application in VS19 on Windows 10 with AVX2 support i am able to reproduce an access violation with some Json files.

After a first investigation it looks like the expected padding (SIMDJSON_PADDING) doesn't work. My memory window is showing that there were only very few additional bytes added at the end of my json byte array (probably by debug mode).

grafik
The value of 'ptr' was pointing to the selected character shown in the memory window / see 'Autos' window bottom left.

The shown decompiled line seems to correspond to:

var v = Avx.LoadVector256(src);

If it helps i can share the source code and json .

Workaround: Adding the padding of 32 bytes and the end of the input byte array solved this issue.

Native Bindings is x10 Slower

I have tried to add SimdJsonSharp to my serializer perfomrance test suite. See https://github.com/Alois-xx/SerializerTests

When you compile it and let it run on .NET Core Preview 7

D:\Source\git\SerializerTests\bin\Debug\netcoreapp3.0>SerializerTests.exe -test combined -serializer Utf8JsonSerializer,SimdJsonSharpSerializer,SimdJsonSharpSerializerN

Serializer      Objects "Time to serialize in s"        "Time to deserialize in s"      "Size in bytes" FileVersion     Framework       ProjectHome     DataFormat      FormatDetails   Supports Versioning
SimdJsonSharpSerializer 1000000 0.1310  0.116   35777803        1.5.0.0 .NET Core 3.0.0-preview7-27912-14       https://github.com/EgorBo/SimdJsonSharp based on https://github.com/lemire/simdjson     Text    Json    No
SimdJsonSharpSerializerN        1000000 0.1117  3.139   35777803        1.7.0.0 .NET Core 3.0.0-preview7-27912-14       https://github.com/EgorBo/SimdJsonSharp based on https://github.com/lemire/simdjson     Text    Json    No
Utf8JsonSerializer      1000000 0.1135  0.330   35777803        1.3.7   .NET Core 3.0.0-preview7-27912-14       https://github.com/neuecc/Utf8Json      Text    Json    Yes

I find the native version needs 3s vs 0.1s for the managed version.
grafik

When lookint at the data in PerfView I find most time is spent in paserJson. Why this is so costly I cannot tell but it looks wasteful. Can you take a look why this is so much slower? Am I using the library the wrong way?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.