Giter Club home page Giter Club logo

jsonrepair's Introduction

jsonrepair

Repair invalid JSON documents.

Try it out in a minimal demo: https://josdejong.github.io/jsonrepair/

Use it in a full-fledged application: https://jsoneditoronline.org

Read the background article "How to fix JSON and validate it with ease"

The following issues can be fixed:

  • Add missing quotes around keys
  • Add missing escape characters
  • Add missing commas
  • Add missing closing brackets
  • Repair truncated JSON
  • Replace single quotes with double quotes
  • Replace special quote characters like “...” with regular double quotes
  • Replace special white space characters with regular spaces
  • Replace Python constants None, True, and False with null, true, and false
  • Strip trailing commas
  • Strip comments like /* ... */ and // ...
  • Strip JSONP notation like callback({ ... })
  • Strip escape characters from an escaped string like {\"stringified\": \"content\"}
  • Strip MongoDB data types like NumberLong(2) and ISODate("2012-12-19T06:01:17.171Z")
  • Concatenate strings like "long text" + "more text on next line"
  • Turn newline delimited JSON into a valid JSON array, for example:
    { "id": 1, "name": "John" }
    { "id": 2, "name": "Sarah" }
    

The jsonrepair library has streaming support and can handle infinitely large documents.

Install

$ npm install jsonrepair

Note that in the lib folder, there are builds for ESM, UMD, and CommonJs.

Use

Use the jsonrepair function using an ES modules import:

import { jsonrepair } from 'jsonrepair'

try {
  // The following is invalid JSON: is consists of JSON contents copied from 
  // a JavaScript code base, where the keys are missing double quotes, 
  // and strings are using single quotes:
  const json = "{name: 'John'}"
  
  const repaired = jsonrepair(json)
  
  console.log(repaired) // '{"name": "John"}'
} catch (err) {
  console.error(err)
}

Use the streaming API in Node.js:

import { createReadStream, createWriteStream } from 'node:fs'
import { pipeline } from 'node:stream'
import { jsonrepairTransform } from 'jsonrepair/stream'

const inputStream = createReadStream('./data/broken.json')
const outputStream = createWriteStream('./data/repaired.json')

pipeline(inputStream, jsonrepairTransform(), outputStream, (err) => {
  if (err) {
    console.error(err)
  } else {
    console.log('done')
  }
})

// or using .pipe() instead of pipeline():
// inputStream
//   .pipe(jsonrepairTransform())
//   .pipe(outputStream)
//   .on('error', (err) => console.error(err))
//   .on('finish', () => console.log('done'))

Use in CommonJS (not recommended):

const { jsonrepair } = require('jsonrepair')
const json = "{name: 'John'}"
console.log(jsonrepair(json)) // '{"name": "John"}'

Use with UMD in the browser (not recommended):

<script src="/node_modules/jsonrepair/lib/umd/jsonrepair.js"></script>
<script>
  const { jsonrepair } = JSONRepair
  const json = "{name: 'John'}"
  console.log(jsonrepair(json)) // '{"name": "John"}'
</script>

API

Regular API

You can use jsonrepair as a function or as a streaming transform. Broken JSON is passed to the function, and the function either returns the repaired JSON, or throws an JSONRepairError exception when an issue is encountered which could not be solved.

// @throws JSONRepairError 
jsonrepair(json: string) : string

Streaming API

The streaming API is availabe in jsonrepair/stream and can be used in a Node.js stream. It consists of a transform function that can be used in a stream pipeline.

jsonrepairTransform(options?: { chunkSize?: number, bufferSize?: number }) : Transform

The option chunkSize determines the size of the chunks that the transform outputs, and is 65536 bytes by default. Changing chunkSize can influcence the performance.

The option bufferSize determines how many bytes of the input and output stream are kept in memory and is also 65536 bytes by default. This buffer is used as a "moving window" on the input and output. This is necessary because jsonrepair must look ahead or look back to see what to fix, and it must sometimes walk back the generated output to insert a missing comma for example. The bufferSize must be larger than the length of the largest string and whitespace in the JSON data, otherwise, and error is thrown when processing the data. Making bufferSize very large will result in more memory usage and less performance.

Command Line Interface (CLI)

When jsonrepair is installed globally using npm, it can be used on the command line. To install jsonrepair globally:

$ npm install -g jsonrepair

Usage:

$ jsonrepair [filename] {OPTIONS}

Options:

--version, -v       Show application version
--help,    -h       Show this message
--output,  -o       Output file
--overwrite         Overwrite the input file
--buffer            Buffer size in bytes, for example 64K (default) or 1M

Example usage:

$ jsonrepair broken.json                        # Repair a file, output to console
$ jsonrepair broken.json > repaired.json        # Repair a file, output to file
$ jsonrepair broken.json --output repaired.json # Repair a file, output to file
$ jsonrepair broken.json --overwrite            # Repair a file, replace the file itself
$ cat broken.json | jsonrepair                  # Repair data from an input stream
$ cat broken.json | jsonrepair > repaired.json  # Repair data from an input stream, output to file

Alternatives:

Similar libraries:

Develop

When implementing a fix or a new feature, it important to know that there are currently two implementations:

  • src/regular This is a non-streaming implementation. The code is small and works for files up to 512MB, ideal for usage in the browser.
  • src/streaming A streaming implementation that can be used in Node.js. The code is larger and more complex, and the implementation uses a configurable bufferSize and chunkSize. When the parsed document contains a string or number that is longer than the configured bufferSize, the library will throw an "Index out of range" error since it cannot hold the full string in the buffer. When configured with an infinite buffer size, the streaming implementation works the same as the regular implementation. In that case this out of range error cannot occur, but it makes the performance worse and the application can run out of memory when repairing large documents.

Both implementations are tested against the same suite of unit tests in src/index.test.ts.

To build the library (ESM, CommonJs, and UMD output in the folder lib):

$ npm install 
$ npm run build

To run the unit tests:

$ npm test

To run the linter (eslint):

$ npm run lint

To automatically fix linter issues:

$ npm run format

To run the linter, build all, and run unit tests and integration tests:

$ npm run build-and-test

Release

To release a new version:

$ npm run release

This will:

  • lint
  • test
  • build
  • increment the version number
  • push the changes to git, add a git version tag
  • publish the npm package

To try the build and see the change list without actually publishing:

$ npm run release-dry-run

License

Released under the ISC license.

jsonrepair's People

Contributors

dependabot[bot] avatar josdejong avatar keyserj avatar yguy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

jsonrepair's Issues

Return a list of fixes to original JSON

Thanks for the nice library.
jsonrepair returns a single value. To keep the API, I suggest to add a new function e.g. jsonrepairer which returns the fixed JSON and an array of fixes the library performed on the original JSON.
The list could be later inspected if somebody is interested in possible data corruptions.

Colon expected error on valid json (version 3.2.3)

jsonrepair.jsonrepair('{"test": "hello\n\nworld"}')

Uncaught JSONRepairError: Colon expected at position 23
    at throwColonExpected (/Users/okomarov/Documents/repos/myrepo/node_modules/.pnpm/[email protected]/node_modules/jsonrepair/lib/cjs/jsonrepair.js:543:11)

For context, the version 3.2.0 does not throw.

Strip out commas after map open

Example:

{, test: "abc" }

Commas should also be stripped out if they occur after array open.

Google's Gemini is doing this sometimes (with JSON mode enabled).

Repair wrapped lines copied from a terminal

Terminal output typically is wrapped at for example the 80th character. It may be possible to detect this line wrapping pattern, and try out if removing these returns result in valid JSON to get it repaired.

Version >=3.6.0 fails to parse RegEx object values

Hey everyone,

we noticed that from 3.6.0 upwards, jsonrepair runs into an error when feeding it with a config file we are using.
In this config file, one of the object values is a RegEx Pattern, as this is expected by the framework (Vitest) which uses this config.
The raw, 'unrepaired' json looks something like this:

       {
            ...
            foo: ['default', 'some-other-reporter'],
            bar: './some/file/location.xml',
            baz: ['./another/file/path.ts'],
            css: { 
                      include: /standalone-styles.css/   // this fails starting from 3.6.0
                  },
            ...
        },

Adding quotes around /standalone-styles.css/ is not possible, as this would break the RegEx. So it has to be written as it is.

I tested this with several versions:

  • Unaffected: 3.1.0 - 3.5.0
  • Affected: 3.6.0, 3.6.1

The error we get is Unexpected character "/" at position 359

For now, we just refrain from using newer versions. Hope to see a fix for this, though.

Thanks in advance and best of regards,
Tobias

Wrap jsonrepair as VSCode-Extension?

It would be super cool, if we could fix JSON-Files directly within VSCode. Have you considered creating an extension yet?
Would that be something that you would be interesting in, or would it rather be a community contribution? :)

Improve repairing missing end quotes of strings containing delimiters like comma or single quote

For example, in playground (https://josdejong.github.io/jsonrepair/) :

{
  "message": "This test isn't successful

or

{
  "message": "This test is not successful, right?

both fail.
(strings with comma or simple-quote, without ending double quote)

My personal fixes are the followings :

  const lastDoubleQuote = uglyJson.trim().lastIndexOf('"');
  const lastSimpleQuote = uglyJson.trim().lastIndexOf("'");
  if (lastSimpleQuote > lastDoubleQuote) {
    // When the JSON string ends with a text containing a simple quote, without an ending double-quote
    //  jsonrepeair fails ..
    cleanJson = `${cleanJson}"`; // .. So we prepare the ugly JSON with an ending double-quote
  }
  const lastComma = cleanJson.trim().lastIndexOf(',');
  if (lastComma > lastDoubleQuote + 1) { // The + 1 is to avoid to catch a property separator comma `", "`
    // When the JSON string ends with a text containing a comma, without an ending double-quote
    //  jsonrepeair explodes.
    cleanJson = `${cleanJson}"`; // So we prepare the ugly JSON with an ending double-quote
  }
  cleanJson = jsonrepair(cleanJson);

Invalid string length

RangeError: Invalid string length
at ReadStream. (file:///C:/Users/username/AppData/Roaming/npm/node_modules/jsonrepair/bin/cli.js:54:15)
at ReadStream.emit (node:events:513:28)
at addChunk (node:internal/streams/readable:315:12)
at readableAddChunk (node:internal/streams/readable:289:9)
at ReadStream.Readable.push (node:internal/streams/readable:228:10)
at node:internal/fs/streams:279:14
at FSReqCallback.wrapper [as oncomplete] (node:fs:675:5)

I was trying to open a JSON file that is 12.5GB large but it seems like it can't handle it or I might be missing something?

this is the command I ran in the terminal:
jsonrepair large.json > new_large.json

Strip escape characters from an escaped string using two double quotes as escape

In many programming languages, a backslash is used to escape quotes in a string, like:

"{\"stringified\": \"content\"}"

Some languages though use a pair of double quotes to escape, for example batch scripts and C# verbatim string literals, . For example:

"{""stringified"": ""content""}"

The jsonrepair library already can repair escaped contents using backslash. It would be nice if it can also repair pairs of double quotes.

Error position in function expectDigit

Hi, there

The position value of JSONRepairError in function expectDigit should be i, instead of 2, right?

  function expectDigit(start: number) {
    if (!isDigit(text.charCodeAt(i))) {
      const numSoFar = text.slice(start, i)
      throw new JSONRepairError(`Invalid number '${numSoFar}', expecting a digit ${got()}`, 2)
    }
  }

Not working in Edge Runtime

Do you know why Next.JS / Vercel would throw this error? I couldn't find relevant code on a quick pass

./node_modules/jsonrepair/lib/esm/JSONRepairError.js

17:53:17.733 | Dynamic Code Evaluation (e. g. 'eval', 'new Function', 'WebAssembly.compile') not allowed in Edge Runtime

Missing types definition

Hello

I am attempting to install types in my TS app using npm install --save-dev @types/jsonrepair but I am facing the following issue:

npm ERR! code E404
npm ERR! 404 Not Found - GET https://registry.npmjs.org/@types%2fjsonrepair - Not found
npm ERR! 404
npm ERR! 404 '@types/jsonrepair@*' is not in this registry.

Am I doing something wrong?

Strange behavior with comment and parentheses

Hello,

Thanks for your lib, she is very useful.
I work on an project where the json is very dirty and i have one problem when comment is after a line with a parenthesis.

Sample:

{
"total_count" : 1,
"pos" : 0,
"rows":[
{
  "id" : "111111",
  "data" : [
     "*",
     "1111",
     "<a )'></a>"
     // comment 1
   ]
}
]
	
}

If you remove the line // comment 1 the lib validate the json OR if you remove the parenthesis in line "<a )'></a>" -> "<a '></a>" is work too

Do you have any idea workaround ? Maybe is an bug ?

Capture d’écran 2023-12-27 à 15 04 24

Thanks

Improve repairing of truncated strings

Hello!
First of all, thanks a lot for such a great library, really helps me out.

While I was playing with it recently I noticed a nasty bug that can't let me continue working on my project.
The bug is that the comma inside a string is interpreted as a comma in JSON, so that the next word becomes a property of an object.

For you to get a little bit more context, I'm providing an example:

{"subject":"Take-Home Assignment Enhancements","text":"Hello Sergey,I hop

This string above as an input generates the following:

{"subject":"Take-Home Assignment Enhancements","text":"Hello Sergey","I hop":null}

Thanks for your help in advance!

Doesn't strip unnecessary white space

I'm on my phone so apologies for any odd formatting but note the " keywords" key has a leading space that doesn't get fixed.

{
"type": "header-block",
"design": 0,
"values": {
options: {
background: "red",
alignment: 2,
}
}
" keywords": "nice, yeah"
}

Adding missing escape for double quote not working

I have the following JSON:
{ "text": "I want to buy 65" television" }

I would expect that the output will be:
{ "text": "I want to buy 65\\" television" }

however in the playground im getting the following error:
Colon expected at position 45

Would love to see a CLI

Hi, I'm generating json files with Hygen JS, and I'd love to be able to use this as a CLI.

Basically being able to run scripts like > jsonrepair ./package.json

Is that something that could happen?

Missing end quote adds line break rather than adding missing quote

If I accidently remove a quotation mark from the end of a value (like at the end of Fiber on line 1.

{
  "customerType": "Prepaid/Postpaid/Fiber,
  "emailAddress": "[email protected]",
  "isknown": "1",
  "phoneNumber": "123456"
}

The errow message is

Bad control character in string literal in JSON at line 2 column 0

The auto repair will add a line break and wrap around the next quotation mark rather than just add the missing quotation as would seem like the sensible fix. So we end up with the following:

{
  "customerType": "Prepaid/Postpaid/Fiber,\n  ","emailAddress":": ","[email protected]":",\n  ","isknown":": ","1":",\n  ","phoneNumber":": ","123456":"\n}"}

Is there a way to fix this issue alternatively is there a way to turn off the auto repair function for certain/all errors but still show the user there is an error?

Make jsonrepair accept an options object allowing to finetune what to fix or not

In one project I would like to fix missing quotes, commas, etc... but leave comments as later I'm using a parser that accepts comments.

It would be nice to be able to call something like jsonrepair(json_with_comments, {trimComments: false}) and have it fix everything except stripping comments.

Similarly it could accept options to disable every fixing rules available and have it fix everything by default like it works right now.

"Colon expected at position" error

I saw that someone else reported an error, but it had to do with a left quote. Recently, I went to repair a validation schema and ran into the same, but it's with a regular quote inside of a single quoted string. It didn't matter if it was escaped (\ or ) or not - it always results in "colon expected at XX". Here's a small example:

{
  properties: {
    name: {
      type: "string",
      isRequired: true,
      pattern: '[^<>\\/\\?:;"\\[\\]\\{\\}\\|~`!@#$%\\^\\*=+]*'
    }
  }
}

If you remove the quote from the regex pattern, everything works as expected.

Error: Colon expected at position...

Hi there.

When running jsonrepair from CLI (using laravel homestead 14.2.2) I get "Error: Colon expected at position" for any JSON entered which requires repair. Error does not fire when JSON is valid.

I've tested multiple broken JSON in other tools using this library (https://josdejong.github.io/jsonrepair/) and it works every time.

Is there a known CLI issue? Is the JSON processed in some other way when not using CLI?

Cheers.

two or more commas in a string breaks parsing

This is working:

{"translatedText":"I've Testing a bit, testing more comma

But this is not working:

{"translatedText":"I've Testing a bit, testing more commas, Not working

Error is: "Colon expected at position 58"

Use case is streaming tool call from chatGPT

throwInvalidUnicodeCharacter loops forever if input ends with part of unicode char

This happened for me pretty frequently when parsing streaming result from chatGPT and it ends with one half emoji. Easy to test with:

const testString = '{"s \ud';

Suggested solution:

function throwInvalidUnicodeCharacter(start: number) {
  let end = start + 2;
  const maxUnicodeLength = 6;  // Maximum length of Unicode escape sequences

  while (end - start <= maxUnicodeLength && /\w/.test(text[end])) {
    end++;
  }

  const chars = text.slice(start, end);
  throw new JSONRepairError(`Invalid unicode character "${chars}"`, i);
}

Or something similar

Python Integration

I have been using JSONRepair via a Linux command line. I would like to use it with python, however, I have not been able to despite various attempts. Could you provide some guidance?

Thank you in advance

Nodejs?

import jsonrepair from 'jsonrepair'
^^^^^^

SyntaxError: Cannot use import statement outside a module

Code:
``import jsonrepair from 'jsonrepair'

// The following is invalid JSON: is consists of JSON contents copied from
// a JavaScript code base, where the keys are missing double quotes,
// and strings are using single quotes:
const json = '{name: 'John'}'

const repaired = jsonrepair(json)
console.log(repaired) // '{"name": "John"}'``

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.