avian2 / jsonmerge Goto Github PK

Merge a series of JSON documents.

License: MIT License

Python 100.00%

jsonmerge's Introduction

Merge a series of JSON documents

This Python module allows you to merge a series of JSON documents into a single one.

This problem often occurs for example when different authors fill in different parts of a common document and you need to construct a document that includes contributions from all the authors. It also helps when dealing with consecutive versions of a document where different fields get updated over time.

Consider a trivial example with two documents:

>>> base = {
...         "foo": 1,
...         "bar": [ "one" ],
...      }

>>> head = {
...         "bar": [ "two" ],
...         "baz": "Hello, world!"
...     }

We call the document we are merging changes into base and the changed document head. To merge these two documents using jsonmerge:

>>> from pprint import pprint

>>> from jsonmerge import merge
>>> result = merge(base, head)

>>> pprint(result, width=40)
{'bar': ['two'],
 'baz': 'Hello, world!',
 'foo': 1}

As you can see, when encountering an JSON object, jsonmerge by default returns fields that appear in either base or head document. For other JSON types, it simply replaces the older value. These principles are also applied in case of multiple nested JSON objects.

In a more realistic use case however, you might want to apply different merge strategies to different parts of the document. You can tell jsonmerge how to do that using a syntax based on JSON schema.

If you already have schemas for your document, you can simply expand them with some additional keywords. Apart from the custom keywords described below, jsonmerge by default uses the schema syntax defined in the Draft 4 of the JSON schema specification.

You use the mergeStrategy schema keyword to specify the strategy. The default two strategies mentioned above are called objectMerge for objects and overwrite for all other types.

Let's say you want to specify that the merged bar field in the example document above should contain elements from all documents, not just the latest one. You can do this with a schema like this:

>>> schema = {
...             "properties": {
...                 "bar": {
...                     "mergeStrategy": "append"
...                 }
...             }
...         }

>>> from jsonmerge import Merger
>>> merger = Merger(schema)
>>> result = merger.merge(base, head)

>>> pprint(result, width=40)
{'bar': ['one', 'two'],
 'baz': 'Hello, world!',
 'foo': 1}

Another common example is when you need to keep a versioned list of values that appeared in the series of documents:

>>> schema = {
...             "properties": {
...                 "foo": {
...                     "type": "object",
...                     "mergeStrategy": "version",
...                     "mergeOptions": { "limit": 5 }
...                 }
...             },
...             "additionalProperties": False
...         }
>>> from jsonmerge import Merger
>>> merger = Merger(schema)

>>> rev1 = {
...     'foo': {
...         'greeting': 'Hello, World!'
...     }
... }

>>> rev2 = {
...     'foo': {
...         'greeting': 'Howdy, World!'
...     }
... }

>>> base = None
>>> base = merger.merge(base, rev1, merge_options={
...                     'version': {
...                         'metadata': {
...                             'revision': 1
...                         }
...                     }
...                 })
>>> base = merger.merge(base, rev2, merge_options={
...                     'version': {
...                         'metadata': {
...                             'revision': 2
...                         }
...                     }
...                 })
>>> pprint(base, width=55)
{'foo': [{'revision': 1,
          'value': {'greeting': 'Hello, World!'}},
         {'revision': 2,
          'value': {'greeting': 'Howdy, World!'}}]}

Note that we use the mergeOptions keyword in the schema to supply additional options to the merge strategy. In this case, we tell the version strategy to retain only 5 most recent versions of this field.

We also used the merge_options argument to supply some options that are specific to each call of the merge method. Options specified this way are applied to all invocations of a specific strategy in a schema (in contrast to mergeOptions, which applies only to the strategy invocation in that specific location in the schema). Options specified in mergeOptions schema keyword override the options specified in the merge_options argument.

The metadata option for the version strategy can contain some document meta-data that is included for each version of the field. metadata can contain an arbitrary JSON object.

Example above also demonstrates how jsonmerge is typically used when merging more than two documents. Typically you start with an empty base and then consecutively merge different heads into it.

A common source of problems are documents that do not match the schema used for merging. jsonmerge by itself does not validate input documents. It only uses the schema to obtain necessary information to apply appropriate merge strategies. Since the default strategies are used for parts of the document that are not covered by the schema it's easy to get unexpected output without any obvious errors raised by jsonmerge.

In the following example, the property Foo (uppercase F) does not match foo (lowercase f) in the schema and hence the version strategy is not applied as with previous two revisions:

>>> rev3 = {
...     'Foo': {
...         'greeting': 'Howdy, World!'
...     }
... }

>>> base = merger.merge(base, rev3, merge_options={
...                     'version': {
...                         'metadata': {
...                             'revision': 3
...                         }
...                     }
...                 })

>>> pprint(base, width=55)
{'Foo': {'greeting': 'Howdy, World!'},
 'foo': [{'revision': 1,
          'value': {'greeting': 'Hello, World!'}},
         {'revision': 2,
          'value': {'greeting': 'Howdy, World!'}}]}

Hence it is recommended to validate the input documents against the schema before passing them to jsonmerge. This practice is even more effective if the schema is filled in with more information than strictly necessary for jsonmerge (e.g. adding information about types, restrict valid object properties with additionalProperties, etc.):

>>> from jsonschema import validate
>>> validate(rev1, schema)
>>> validate(rev2, schema)
>>> validate(rev3, schema)
Traceback (most recent call last):
    ...
jsonschema.exceptions.ValidationError: Additional properties are not allowed ('Foo' was unexpected)

If you care about well-formedness of your documents, you might also want to obtain a schema for the documents that the merge method creates. jsonmerge provides a way to automatically generate it from a schema for the input document:

>>> result_schema = merger.get_schema()

>>> pprint(result_schema, width=80)
{'additionalProperties': False,
 'properties': {'foo': {'items': {'properties': {'value': {'type': 'object'}}},
                        'maxItems': 5,
                        'type': 'array'}}}

Note that because of the version strategy, the type of the foo field changed from object to array.

Merge strategies

These are the currently implemented merge strategies.

overwrite

Overwrite with the value in base with value in head. Works with any type.

discard

Keep the value in base, even if head contains a different value. Works with any type.

By default, if base does not contain any value (i.e. that part of the document is undefined), the value after merge is kept undefined. This can be changed with the keepIfUndef option. If this option is true, then the value from head will be retained in this case. This is useful if you are merging a series of documents and want to keep the value that first appears in the series, but want to discard further modifications.

append

Append arrays. Works only with arrays.

You can specify a sortByRef merge option to indicate the key that will be used to sort the items in the array. This option can be an arbitrary JSON pointer. When resolving the pointer the root is placed at the root of the array item. Sort order can be reversed by setting the sortReverse option.

arrayMergeById

Merge arrays, identifying items to be merged by an ID field. Resulting arrays have items from both base and head arrays. Any items that have identical an ID are merged based on the strategy specified further down in the hierarchy.

By default, array items are expected to be objects and ID of the item is obtained from the id property of the object.

You can specify an arbitrary JSON pointer to point to the ID of the item using the idRef merge option. When resolving the pointer, document root is placed at the root of the array item (e.g. by default, idRef is '/id'). You can also set idRef to '/' to treat an array of integers or strings as a set of unique values.

Array items in head for which the ID cannot be identified (e.g. idRef pointer is invalid) are ignored.

You can specify an additional item ID to be ignored using the ignoreId merge option.

A compound ID can be specified by setting idRef to an array of pointers. In that case, if any pointer in the array is invalid for an object in head, the object is ignored. If using an array for idRef and if ignoreId option is also defined, ignoreId must be an array as well.

You can specify a sortByRef merge option to indicate the key that will be used to sort the items in the array. This option can be an arbitrary JSON pointer. The pointer is resolved in the same way as idRef. Sort order can be reversed by setting the sortReverse option.

arrayMergeByIndex

Merge array items by their index in the array. Similarly to arrayMergeById strategy, the resulting arrays have items from both base and head arrays. Items that occur at identical positions in both arrays will be merged based on the strategy specified further down in the hierarchy.

objectMerge

Merge objects. Resulting objects have properties from both base and head. Any properties that are present both in base and head are merged based on the strategy specified further down in the hierarchy (e.g. in properties, patternProperties or additionalProperties schema keywords).

The objClass option allows one to request a different dictionary class to be used to hold the JSON object. The possible values are names that correspond to specific Python classes. Built-in names include OrderedDict, to use the collections.OrderedDict class, or dict, which uses the Python's dict built-in. If not specified, dict is used by default.

Note that additional classes or a different default can be configured via the Merger() constructor (see below).

version

Changes the type of the value to an array. New values are appended to the array in the form of an object with a value property. This way all values seen during the merge are preserved.

You can add additional properties to the appended object using the metadata option. Additionally, you can use metadataSchema option to specify the schema for the object in the metadata option.

You can limit the length of the list using the limit option in the mergeOptions keyword.

By default, if a head document contains the same value as the base, document, no new version will be appended. You can change this by setting ignoreDups option to false.

If a merge strategy is not specified in the schema, objectMerge is used for objects and overwrite for all other values (but see also the section below regarding keywords that apply subschemas).

You can implement your own strategies by making subclasses of jsonmerge.strategies.Strategy and passing them to Merger() constructor (see below).

The Merger Class

The Merger class allows you to further customize the merging of JSON data by allowing you to:

set the schema containing the merge strategy configuration,
provide additional strategy implementations,
set a default class to use for holding JSON object data and
configure additional JSON object classes selectable via the objClass merge option.

The Merger constructor takes the following arguments (all optional, except schema):

schema: The JSON Schema that contains the merge strategy directives provided as a JSON object. An empty dictionary should be provided if no strategy configuration is needed.
strategies: A dictionary mapping strategy names to instances of Strategy classes. These will be combined with the built-in strategies (overriding them with the instances having the same name).
objclass_def: The name of a supported dictionary-like class to hold JSON data by default in the merged result. The name must match a built-in name or one provided in the objclass_menu parameter.
objclass_menu: A dictionary providing additional classes to use as JSON object containers. The keys are names that can be used as values for the objectMerge strategy's objClass option or the objclass_def argument. Each value is a function or class that produces an instance of the JSON object container. It must support an optional dictionary-like object as a parameter which initializes its contents.
validatorclass: A jsonschema.Validator subclass. This can be used to specify which JSON Schema draft version will be used during merge. Some details such as reference resolution are different between versions. By default, the Draft 4 validator is used.

Support for keywords that apply subschemas

Complex merging of documents with schemas that use keywords allOf, anyOf and oneOf can be problematic. Such documents do not have a well-defined type and might require merging of two values of different types, which will fail for some strategies. In such cases get_schema() might also return schemas that never validate.

The overwrite strategy is usually the safest choice for such schemas.

If you explicitly define a merge strategy at the same level as allOf, anyOf or oneOf keyword, then jsonmerge will use the defined strategy and not further process any subschemas under those keywords. The strategy however will descend as usual (e.g. objectMerge will take into account subschemas under the properties keyword at the same level as allOf).

If a merge strategy is not explicitly defined and an allOf or anyOf keyword is present, jsonmerge will raise an error.

If a merge strategy is not explicitly defined and an oneOf keyword is present, jsonmerge will continue on the branch of oneOf that validates both base and head. If no branch validates, it will raise an error.

You can define more complex behaviors by defining for your own strategy that defines what to do in such cases. See docstring documentation for the Strategy class on how to do that.

Security considerations

A JSON schema document can contain $ref references to external schemas. jsonmerge resolves URIs in these references using the mechanisms provided by the jsonschema module. External references can cause HTTP or similar network requests to be performed.

If jsonmerge is used on untrusted input, this may lead to vulnerabilities similar to the XML External Entity (XXE) attack.

Requirements

jsonmerge supports Python 2 (2.7) and Python 3 (3.5 and newer).

You need jsonschema (https://pypi.python.org/pypi/jsonschema) module installed.

Installation

To install the latest jsonmerge release from the Python package index:

pip install jsonmerge

Source

The latest development version is available on GitHub: https://github.com/avian2/jsonmerge

To install from source, run the following from the top of the source distribution:

pip install .

jsonmerge uses Tox for testing. To run the test suite run:

tox

Troubleshooting

The most common problem with jsonmerge is getting unexpected results from a merge. Finding the exact reason why jsonmerge produced a particular result can be complicated, especially when head and base structures are very large. Most often the cause is a problem with either the schema or head and base that is passed to jsonmerge, not a bug in jsonmerge itself.

Here are some tips for debugging issues with jsonmerge:

Try to minimize the problem. Prune branches of head and base structures that are not relevant to your issue and re-run the merge. Often just getting a clearer view of the relevant parts exposes the problem.
jsonmerge uses the standard logging Python module to print out what it is doing during the merge. You need to increase verbosity to DEBUG level to see the messages.
A very common mistake is misunderstanding which part of the schema applies to which part of the head and base structures. Debug logs mentioned in the previous point can be very helpful with that, since they show how merge descends into hierarchies of all involved structures and when a default strategy is used.
With large head and base it's common that parts of them are not what you think they are. Validate your inputs against your schema using the jsonschema library before passing them onto jsonmerge. Make sure your schema is restrictive enough.
Pay special attention to parts of the schema that use oneOf, anyOf, allOf keywords. These can sometimes validate in unexpected ways.
Another problem point can be $ref pointers if they can cause recursion. Using recursive schemas with jsonmerge is fine, but they can often product unexpected results.

Reporting bugs and contributing code

Thank you for contributing to jsonmerge! Free software wouldn't be possible without contributions from users like you. However, please consider that I maintain this project in my free time. Hence I ask you to follow this simple etiquette to minimize the amount of effort needed to include your contribution.

Please use GitHub issues to report bugs.

Before reporting the bug, please make sure that:

You've read this entire README file.
You've read the Troubleshooting section of the README file.
You've looked at existing issues if the bug has already been reported.

Make sure that your report includes:

A minimal, but complete, code example that reproduces the problem, including any JSON data required to run it. It should be something I can copy-paste into a .py file and run.
Relevant versions of jsonmerge and jsonschema - either release number on PyPi or the git commit hash.
Copy of the traceback, in case you are reporting an unhandled exception.
Example of what you think should be the correct output, in case you are reporting wrong result of a merge or schema generation.

Please use GitHub pull requests to contribute code. Make sure that your pull request:

Passes all existing tests and includes new tests that cover added code.
Updates README.rst to document added functionality.

License

The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

jsonmerge's People

Contributors

Stargazers

Watchers

jsonmerge's Issues

Error when I use bracket in name

------Error-----
Traceback (most recent call last):
File "mergejson.py", line 32, in
mergejson('CFP01_C2_TIME_10_otros.json', 'CFP01_C2_TIME_10.json', 'salida.json')
File "mergejson.py", line 23, in mergejson
result = merger.merge(data1, data2)
File "/usr/local/lib/python3.6/site-packages/jsonmerge/init.py", line 286, in merge
return walk.descend(schema, base, head, meta).val
File "/usr/local/lib/python3.6/site-packages/jsonmerge/init.py", line 78, in descend
rv = self.work(strategy, schema, *args, **opts)
File "/usr/local/lib/python3.6/site-packages/jsonmerge/init.py", line 123, in work
rv = strategy.merge(self, base, head, schema, meta, objclass_menu=self.merger.objclass_menu, **kwargs)
File "/usr/local/lib/python3.6/site-packages/jsonmerge/strategies.py", line 270, in merge
base[k] = walk.descend(subschema, base.get(k), v, meta)
File "/usr/local/lib/python3.6/site-packages/jsonmerge/init.py", line 78, in descend
rv = self.work(strategy, schema, *args, **opts)
File "/usr/local/lib/python3.6/site-packages/jsonmerge/init.py", line 123, in work
rv = strategy.merge(self, base, head, schema, meta, objclass_menu=self.merger.objclass_menu, **kwargs)
File "/usr/local/lib/python3.6/site-packages/jsonmerge/strategies.py", line 114, in merge
raise HeadInstanceError("Head for an 'append' merge strategy is not an array", head)
jsonmerge.exceptions.HeadInstanceError: Head for an 'append' merge strategy is not an array: #/[1]CFP01

------Data-----
{"[1]CFP01":{"BAT":[{"times":[4],"values":[0.0]},{"times":[3],"values":[0.0]},{"times":[3],"values":[0.0]},{"times":[4],"values":[0.0]},{"times":[4],"values":[0.0]},{"times":[4],"values":[0.0]},{"times":[3],"values":[0.0]},{"times":[4,7],"values":[2.0,0.0]},{"times":[5],"values":[0.0]},{"times":[7],"values":.......

------ Solution (manual)
Change in the file {"[1]CFP01":{"BAT ... to ... {"CFP01":{"BAT ...

Does it preserve the ordering of attributes?

I would like to process Json objects with ordered structures representing a Table of Content of a document.

Is there a way to preserve the ordering during merges?

Best regards!

occasionally converts bool to string during version merge

jsonmerge==1.7.0, python 3.7.7
jsonmerge is intermittently converting false to "false" during runs with no apparent change in run parameters when using the version strategy for merging arrays of strings. Then it goes back to correctly returning the bool and not the string for a while, then throws a bad one again.

Input like this in the same spot in two different files:
{ "name": "hdfs_verify_ec_with_topology_enabled", "value": false }

Output like:
"value": [ { "value": false }, { "value": "false" } ],

Schema like:
"value": { "type": "string", "mergeStrategy": "version", "mergeOptions": { "keepIfUndef": True } }

Adding sorting to arrays

Hi,

would you be open to accept a PR to enable users to sort arrays?

I'm thinking in two options: either extending the classes and modifying the merge function, or simply adding an optional parameter to the current classes. Which one would you prefer?
Happy to submit an initial approach for review if you think that's a good idea.

How to merge ordered jsons (OrderedDicts)?

If I merge two ordered dict then the result is a dict. I know that order in json is not important, but we want the fields in our json file to be in a particular (for readability).
Are OrderedDicts supported? Maybe I'm missing something. Is there any workaround for this?

Feature request : unique merge strategy for lists

I have a use case similar to this one :

base = {
    "foo" : [1,2,3,999]
}

head = {
    "foo" : [2,4,5,999]
}

And I'd like a result like this :

{
    "foo" : [1,2,3,4,5,999]
}

But instead with the default strategy I get

{
    "foo": [2, 4, 5, 999]
}

And with append I get

{
    'foo': [1, 2, 3, 999, 2, 4, 5, 999]
}

Which is the expected behavior in both cases, but doesn't fit my needs.

That kind of merge can be done in python like that :

> list1 = [1,2,3,999]
> list2 = [2,4,5,999]
> listMerge = list(set(list1 + list2))
> listMerge
[1, 2, 3, 4, 5, 999]

I guess the conversion to a set and then bat to a list is expensive but for my use case where I have under 10 elements in the merged list there is no visible performance cost.

If you think it's a good idea I will gladly implement it and do a pull request.

Immutable strategy

Build a strategy that prevents the changing of values. Would help to identify data quality issues e.g. if you were trying to merge two large objects together.

Add link to repo in setup.py

Adding url="https://github.com/avian2/jsonmerge" to setup.py will make it easier to link back to the repo.

Merging on top of non-dictionaries fails

If the base JSON has some plain value, e.g. a string, for some key and the head JSON is trying to merge a dictionary for the same key, jsonmerge.merge() fails with the following error:

  File "C:\Program Files\Python37\lib\site-packages\jsonmerge\__init__.py", line 346, in merge
    return merger.merge(base, head)
  File "C:\Program Files\Python37\lib\site-packages\jsonmerge\__init__.py", line 301, in merge
    return walk.descend(schema, base, head, meta).val
  File "C:\Program Files\Python37\lib\site-packages\jsonmerge\__init__.py", line 78, in descend
    rv = self.work(strategy, schema, *args, **opts)
  File "C:\Program Files\Python37\lib\site-packages\jsonmerge\__init__.py", line 123, in work
    rv = strategy.merge(self, base, head, schema, meta, objclass_menu=self.merger.objclass_menu, **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\jsonmerge\strategies.py", line 270, in merge
    base[k] = walk.descend(subschema, base.get(k), v, meta)
  File "C:\Program Files\Python37\lib\site-packages\jsonmerge\__init__.py", line 78, in descend
    rv = self.work(strategy, schema, *args, **opts)
  File "C:\Program Files\Python37\lib\site-packages\jsonmerge\__init__.py", line 123, in work
    rv = strategy.merge(self, base, head, schema, meta, objclass_menu=self.merger.objclass_menu, **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\jsonmerge\strategies.py", line 243, in merge
    raise BaseInstanceError("Base for an 'object' merge strategy is not an object", base)
jsonmerge.exceptions.BaseInstanceError: Base for an 'object' merge strategy is not an object: #/std

This seems wrong. If I am overwriting a plain value that is not a dictionary, the merge should simply throw away the value and keep the new dictionary. This is different than using the "replace" strategy blindly - the merge should still do a recursive merge of dictionaries. It is just in cases where the base is not a dictionary, it should just replace.

This is especially tricky if the two JSONs are coming from some user data. If someone makes a mistake in the base JSON, the merge fails, even though it should not. The code that merges does not really have a way to predict where this could happen, thus it can not provide merge strategy overwrites statically.

In case this matters, this happens with jsonmerge 1.6.0 on Windows with python 3.7.2.

Make WalkSchema.descend() smarter regarding references.

Currently WalkSchema.descend() always dereferences a $ref and replaces it with a copy of the referenced object. This is not (always?) necessary.

This should remove the need for the ugly workaround in arrayMergeById.get_schema() (see #11)

Feature request: Merge multiple versions

The current strategies work well to merge two versions. In some use cases, there is a need to merge multiple versions at the same time to get the final version.

For example, merging a startup's funding amount from different media sources.

version = v_techcrunch
{
    "name": "Coinbase",
    "funding": "$200M"
}

version = v_crunchbase
{
    "name": "Coinbase",
    "funding": "$80M"
}

version = v_angellist
{
    "name": "Coinbase",
    "funding": "$120M"
}

final_version
{
    "name": "Coinbase",
    "funding": "$200M" <--- from v_techcrunch version
}

Here, it needs to be merged based on how recent and accurate data each media sources have. Like TechCrunch will have the latest data of startup's funding compare to Crunchbase and AngelList.

Some ideas of additional merge strategy

priorityMerge: merge based on the priority list
timebasedMerge: merge based on how recent data is

I am taking inspiration from all contributes of this repo who did great work, to make something similar to that supports the merger of multiple versions.

Cheers

test_reference_in_meta fails with jsonschema > 4.15.0

When trying the tests on v1.8.0 with python 3.10 on darwin-intel, I get the following failure:

======================================================================
ERROR: test_reference_in_meta (tests.test_jsonmerge.TestGetSchema)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/jm/jsonmerge-1.8.0/.eggs/jsonschema-4.16.0-py3.10.egg/jsonschema/validators.py", line 898, in resolve_from_url
    document = self.store[url]
  File "/tmp/jm/jsonmerge-1.8.0/.eggs/jsonschema-4.16.0-py3.10.egg/jsonschema/_utils.py", line 28, in __getitem__
    return self.store[self.normalize(uri)]
KeyError: 'schema_2.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/jm/jsonmerge-1.8.0/.eggs/jsonschema-4.16.0-py3.10.egg/jsonschema/validators.py", line 901, in resolve_from_url
    document = self.resolve_remote(url)
  File "/tmp/jm/jsonmerge-1.8.0/.eggs/jsonschema-4.16.0-py3.10.egg/jsonschema/validators.py", line 1007, in resolve_remote
    with urlopen(uri) as url:
  File "/usr/lib64/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python3.10/urllib/request.py", line 503, in open
    req = Request(fullurl, data)
  File "/usr/lib64/python3.10/urllib/request.py", line 322, in __init__
    self.full_url = url
  File "/usr/lib64/python3.10/urllib/request.py", line 348, in full_url
    self._parse()
  File "/usr/lib64/python3.10/urllib/request.py", line 377, in _parse
    raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: 'schema_2.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/jm/jsonmerge-1.8.0/tests/test_jsonmerge.py", line 1996, in test_reference_in_meta
    mschema = merger.get_schema(merge_options={
  File "/tmp/jm/jsonmerge-1.8.0/jsonmerge/__init__.py", line 364, in get_schema
    return walk.descend(schema).val
  File "/tmp/jm/jsonmerge-1.8.0/jsonmerge/__init__.py", line 86, in descend
    rv = self.work(strategy, schema, *args, **opts)
  File "/tmp/jm/jsonmerge-1.8.0/jsonmerge/__init__.py", line 213, in work
    rv = strategy.get_schema(self, schema, **kwargs)
  File "/tmp/jm/jsonmerge-1.8.0/jsonmerge/strategies.py", line 123, in get_schema
    item = dict(walk.resolve_subschema_option_refs(metadataSchema))
  File "/tmp/jm/jsonmerge-1.8.0/jsonmerge/__init__.py", line 153, in resolve_subschema_option_refs
    subschema = w._resolve_refs(JSONValue(subschema), resolve_base=True).val
  File "/tmp/jm/jsonmerge-1.8.0/jsonmerge/__init__.py", line 169, in _resolve_refs
    with self.resolver.resolving(ref) as resolved:
  File "/usr/lib64/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/tmp/jm/jsonmerge-1.8.0/.eggs/jsonschema-4.16.0-py3.10.egg/jsonschema/validators.py", line 841, in resolving
    url, resolved = self.resolve(ref)
  File "/tmp/jm/jsonmerge-1.8.0/.eggs/jsonschema-4.16.0-py3.10.egg/jsonschema/validators.py", line 887, in resolve
    return url, self._remote_cache(url)
  File "/tmp/jm/jsonmerge-1.8.0/.eggs/jsonschema-4.16.0-py3.10.egg/jsonschema/validators.py", line 903, in resolve_from_url
    raise exceptions.RefResolutionError(exc)
jsonschema.exceptions.RefResolutionError: unknown url type: 'schema_2.json'
----------------------------------------------------------------------

`AssertionError` at `assert head.val is resolved`

I'm trying to merge the following two files:

using the default strategy but am getting

Traceback (most recent call last):
  File "/nix/store/zj81jq5f04qcpa9ynal5m25hffja7adj-python3-3.10.12-env/lib/python3.10/site-packages/jsonmerge/__init__.py", line 386, in merge
    return merger.merge(base, head)
  File "/nix/store/zj81jq5f04qcpa9ynal5m25hffja7adj-python3-3.10.12-env/lib/python3.10/site-packages/jsonmerge/__init__.py", line 341, in merge
    return walk.descend(schema, base, head).val
  File "/nix/store/zj81jq5f04qcpa9ynal5m25hffja7adj-python3-3.10.12-env/lib/python3.10/site-packages/jsonmerge/__init__.py", line 86, in descend
    rv = self.work(strategy, schema, *args, **opts)
  File "/nix/store/zj81jq5f04qcpa9ynal5m25hffja7adj-python3-3.10.12-env/lib/python3.10/site-packages/jsonmerge/__init__.py", line 133, in work
    rv = strategy.merge(self, base, head, schema, objclass_menu=self.merger.objclass_menu, **kwargs)
  File "/nix/store/zj81jq5f04qcpa9ynal5m25hffja7adj-python3-3.10.12-env/lib/python3.10/site-packages/jsonmerge/strategies.py", line 345, in merge
    base[k] = walk.descend(subschema, base.get(k), v)
  File "/nix/store/zj81jq5f04qcpa9ynal5m25hffja7adj-python3-3.10.12-env/lib/python3.10/site-packages/jsonmerge/__init__.py", line 86, in descend
    rv = self.work(strategy, schema, *args, **opts)
  File "/nix/store/zj81jq5f04qcpa9ynal5m25hffja7adj-python3-3.10.12-env/lib/python3.10/site-packages/jsonmerge/__init__.py", line 133, in work
    rv = strategy.merge(self, base, head, schema, objclass_menu=self.merger.objclass_menu, **kwargs)
  File "/nix/store/zj81jq5f04qcpa9ynal5m25hffja7adj-python3-3.10.12-env/lib/python3.10/site-packages/jsonmerge/strategies.py", line 345, in merge
    base[k] = walk.descend(subschema, base.get(k), v)
  File "/nix/store/zj81jq5f04qcpa9ynal5m25hffja7adj-python3-3.10.12-env/lib/python3.10/site-packages/jsonmerge/__init__.py", line 86, in descend
    rv = self.work(strategy, schema, *args, **opts)
  File "/nix/store/zj81jq5f04qcpa9ynal5m25hffja7adj-python3-3.10.12-env/lib/python3.10/site-packages/jsonmerge/__init__.py", line 131, in work
    assert head.val is resolved
AssertionError

Overwrite head json field with base if head field is empty("")

Hi,
Thanks for developing this amazing utility.
I am facing a issue, suppose my head json contains a empty field and base has value then it should use base one.

Example:

base = {
"foo": "4",
"bar": [ "one" ],
}

head = {
"foo": "",
"bar": [ "two" ],
"baz": "Hello, world!"
}

here it always takes foo "", but i want if head has empty(") field then it should consider from bases i.e foo: "4"

arrayMergeById not working when "items" is an array

In example 6 here, when providing the full schema obtained here as shown below:

    {
        "$schema": "http://json-schema.org/draft-04/schema#",
        "type": "array",
        "mergeStrategy": "arrayMergeById",
        "items": [
            {
                "type": "object",
                "properties": {
                    "id": {
                        "type": "string"
                    },
                    "field": {
                        "type": "integer"
                    }
                }
            }
        ]
    }

the command base = merger.merge(base, a) breaks with the error:

jsonmerge.exceptions.SchemaError: 'arrayMergeById' merge strategy: This strategy is not supported when 'items' is an array: #/items

The expected result is for base to equal a after the merge.

Include relevant JSON pointer in exception classes

Now that we track where in the hierarchy we are with JSONValue, exception classes could report where exactly the error is.

Update dependency to admit for jsonschema 3.0.1

The tests run clean with both Python 3.7 and Python 2.7 using jsonschema-3.0.1 and these packages in a virtual env:

attrs==19.1.0
coverage==4.5.2
functools32==3.2.3.post2 (py27 only)
jsonmerge==1.5.2
jsonschema==3.0.1
pyrsistent==0.14.11
six==1.12.0

Please consider moving the requirements forward to accommodate for jsonschema 3.0.1.

Support

Is JsonMerge valid for this issue?

Depth Affects Behavior of mergeStrategy="append"

Hello,

I found this package today, and it seems like it should do exactly I want for a thorny input parsing problem. However, I found behavior which seems pathological at increased depth into the json. In particular:

merge_schema = {
    "properties" : {
        "g": {
            "properties":{
                "i": {
                    "mergeStrategy":"arrayMergeById",
                    "mergeOptions":{"idRef":'/uid'}
                },
                "h": {
                    "mergeStrategy":"append"
                }
            }
        },
        "n": {
            "properties":{
                "p": {
                    "properties": {
                        "r":{
                            "properties":{"mergeStrategy":"append"}
                        }
                    }
                }
            }
        }
    }
}

merger = Merger(merge_schema)

test_dict = \
{
    "g" : {
        "h" : [ "a", "b"],
        "i" : [
            {
                'uid' : "test1",
                "j" : 4,
            },
            {
                'uid' : "test2",
                "j" : 5,
            }
        ],
    },
    "n":{
        "o":4,
        "p":{
            "q" : 1,
            "r" : [
                "s",
                "t",
            ],
        }
    }
}
update_test_3 = {
    "g" : {
        "h": ["c"],
        "i" : [
            {'uid' : "test3",
            "j" : 11},
            {'uid' : "test1",
            "j" : 12}
        ]
    },
    "n" : {
        "p" : {
            "r":["u",],
        }
    }
}
expected_output = {
    "g": {
        "h": [
            "a",
            "b",
            "c"
        ],
        "i": [
            {
                "uid": "test1",
                "j": 12
            },
            {
                "uid": "test2",
                "j": 5
            },
            {
                "uid": "test3",
                "j": 11
            }
        ]
    },
    "n": {
        "o": 4,
        "p": {
            "q": 1,
            "r": [
                "s", "t", "u"
            ]
        }
    }
}
test_dict = merger.merge(test_dict, update_test_3)
print(test_dict == expected_output)
print(json.dumps(test_dict, indent=4))

yields

False
{
    "g": {
        "h": [
            "a",
            "b",
            "c"
        ],
        "i": [
            {
                "uid": "test1",
                "j": 12
            },
            {
                "uid": "test2",
                "j": 5
            },
            {
                "uid": "test3",
                "j": 11
            }
        ]
    },
    "n": {
        "o": 4,
        "p": {
            "q": 1,
            "r": [
                "u"
            ]
        }
    }
}

In particular, one may note that the append strategy seems to work correctly at one level of depth (g/h) but fails at another (n/p/r). Perhaps this is a subtlety of setting up the merge schema, though I am not sure what would be different between the first case and the second besides the depth.

Enhancement request? Merge objects in lists by default

I would like to use this library but the default behavior of override for lists isn't appropriate for me. I'd expect if items in a list are objects than they be merged as well. The merge strategy solution seems like overkill for this case. It seems that generally this is a more reasonable behavior than the current default. Any chance the default behavior will be changed?

Getting weird results when trying to merge via jsonmerge

https://stackoverflow.com/questions/69552168/merge-nested-level-jsons

Please help.

"append" strategy can produce invalid documents

For instance, if schema sets maxItems, append does not honor that. Same thing with the unique keyword.

The broader question is: Should merge strategies inspect the schema at all (apart from what is necessary to determine the merge strategy) and adapt their behavior to it or should that be left to the user? Is it an error if the user sets "append" strategy for an array that will overflow maxItems?

Order of merging

Hi @avian2, this is a question, as it's been a while since I used jsonmerge. What order does jsonmerge use to merge the documents. Obviously with versioning, some fields are more recent than others, how do we know what the latest is?

Is it support uniqueItems in arrays?

base = {
        "foo": 1,
        "bar": ["2"]
     }


head = {
        "bar": ["2"],
        "baz": "Hello, world!"
    }
schema = {  
            "properties": {
                "bar": {
                    "uniqueItems": True,
                    "mergeStrategy": "append",
                }
            }
        }

from jsonmerge import Merger
from pprint import *
merger = Merger(schema)
result = merger.merge(base, head)

pprint(result, width=40)
{'bar': ['2'],
 'baz': 'Hello, world!',
 'foo': 1}

Feature Request - Merge a single array element into all elements of an array.

Merge Strategy to merge an single element into all elements of an array.

BASE

{
  "table": [{
    "schema": "my_schema"
  }]
}

HEAD

{
  "tables": [
    {
      "name": "my_table"
    },
    {
      "name":"my_other_table"
    }
  ]
}

OUTPUT

{
  tables: [
    {
      "name": "my_table",
      "schema": "my_schema"
    },
    {
      "name":"my_other_table",
      "schema": "my_schema"
    }
  ]
}

Merger.get_schema() doesn't update references

If schema objects get updated after passing through a $ref JSON reference, the generated schema has invalid references.

Merge fails if head has non-ascii key

merge throws "UnicodeEncodeError" a if the head has a key that contains a non-ascii character

For example, the following code gives the error "UnicodeEncodeError: 'ascii' codec can't encode character u'\u20b9' in position 0: ordinal not in range(128)":

from jsonmerge import merge
from pprint import pprint
base = {
    u'\u20AC': 'euro'
}
head = {
    u'\u20b9': 'indian rupee'
}
result = merge(base, head)
pprint(result)

jsonschema 2.5.0 breaks Merger.get_schema()

jsonschema 2.5.0 introduced a LRU cache in the reference resolver:

python-jsonschema/jsonschema#203

This breaks get_schema() in some subtle ways, since we're modifying the schema while we're walking through it and dereferencing things. RefResolver.resolving() now sometimes returns outdated parts of the structure, which causes problems.

A proper solution would be to build up a new schema in parallel while walking over the old one (which should be read-only), similar to how we do it for walking over instances. This is not a trivial change. Specifically, it's hard to properly retain cross-references in the new schema.

Related comments in the code (several parts regarding handling of references currently work more-or-less by luck):

https://github.com/avian2/jsonmerge/blob/master/jsonmerge/strategies.py#L186
https://github.com/avian2/jsonmerge/blob/master/tests/test_jsonmerge.py#L1248

How do I use `"mergeStrategy": "append"` by default?

I want the following assertion to pass:

schema = ???
base = None
base = jsonmerge.merge(base, {"a": ["1"], "b": 3}, schema)
base = jsonmerge.merge(base, {"a": ["2"], "b": 4}, schema)

assert base == {"a": ["1", "2"], b: 4}

I know that this is possible with

schema = {
  'properties': {
    'a': {'mergeStrategy': 'append'},
    'b': {'mergeStrategy': 'overwrite'}
  }
}

but in my case, I trying to write a general system where the array key isn't necessarily a top-level key named "a"; I want to merge any and all arrays using "append" instead of "overwrite".

Is this possible somehow? Ideally I would be able to set schema like so: (this obviously does not currently work)

schema = {
  "mergeStrategy": {
    "typeMatch": {
      "object": "objectMerge",
      "array": "append",
      "default": "overwrite"
    }
  }
}

It seems like the best route would be to create a custom merge strategy, but I'm hoping there's an easier solution that I've overlooked

Defaults being applied to 1st element only when merging array of objects with arrayMergeById

I am running into an interesting issue with a complex json structure that includes a deep set of objects and arrays of objects. It appears that for every array only the 1st element has the default values defined in the schema applied to it -- every other element in the array does not. Is there something I need to do with the strategy or configuration.

Here is a link to the schema file that I am using. Is it possible i have missed something?

https://gist.github.com/ravensorb/5513ccc1b488832204498300fb868467

The specific one I am testing is "#/definitions/traefik/properties/routers"

Merge of all arrays with append strategy fails.

Hi,
Not sure if this is an issue or if there is a lack of understanding on my part. I am trying to merge two complicated json objects. The merge fails with the error:

No element of 'oneOf' validates both base and head: #

I am using a slightly modified schema mentioned in in issue #28 and https://www.tablix.org/~avian/blog/articles/talks/tomaz_solc_jsonmerge.pdf. That is, merge all arrays with append strategy.

The merge works great for some JSON documents. However, it failing for some other documents. I've pasted a sample python script below that causes the problem. Could someone let me know if there is an error in my coding or if this is not possible with JSON merge?

Thanks in advance for any feedback.

regards,

#!/usr/bin/env python3

import json
import sys

from jsonmerge import Merger

merge_schema = """
{
  "oneOf": [
    { "type": "number" },
    { "type": "string" },
    {
      "type": "array",
      "mergeStrategy": "arrayMergeById",
      "mergeOptions": {
        "idRef": "/"
      }
    },
    {
      "type": "object",
      "additionalProperties": {
        "$ref": "#"
      }
    }
  ]
}
"""

base = """
{
  "version": "1.0",
  "student": {
    "name": "Jane",
    "dob": "1-1-2020",
    "attribute1": {
      "size": 16777216,
      "name": "abc-xyz"
    },
    "class": {
      "type": "custom",
      "exams": [
        "final"
      ],
      "book": {
        "isbn": 1234,
        "name": "ABC Book",
        "author": "JohnDoe"
      }
    },
    "log": [
      "file"
    ]
  },
  "system": {
    "update": true,
    "update-path": "/tmp/file.json",
    "store1": {
      "store-url": "http://www.test.com/students/store1.json",
      "polling-interval": 5,
      "client-id": "abc-112233",
      "expire": {
        "batch-size": 1000,
        "scan-interval": 15
      }
    },
    "store2": {
      "store-url": "http://www.test.com/students/store2.json",
      "polling-interval": 5,
      "client-id": "abc-112233",
      "expire": {
        "batch-size": 1000,
        "scan-interval": 15
      }
    },
    "store3": {
      "store-url": "http://www.test.com/students/store3.json",
      "polling-interval": 5,
      "client-id": "abc-112233"
    },
    "report": {
      "polling-interval": 5,
      "batch-size": 5000,
      "report-type": "file",
      "exports": [
        {
          "name": "export1",
          "server": "server1:8443",
          "topic": "topic1",
          "resonse" : "required",
          "metadata": {
            "meta1": "Meta 1",
            "meta2": "Meta 2",
            "meta3": "Meta 3"
          }
        }
      ]
    }
  }
}
"""

new = """
{
  "student": {
    "class": {
      "type": "regular",
      "name": "no-name-class",
      "exams": [
        "mid-term1",
        "mid-term2"
      ],
      "book": {
        "isbn": 1234,
        "name": "ABC Book",
        "author": "JohnDoe"
      }
    }
  }
}
"""

def JsonMerge(base, new_obj):
  schema = json.loads(merge_schema)
  merger = Merger(schema)
  return merger.merge(base, new_obj, schema)

if __name__ == "__main__":
  bjson = None
  try:
    bjson = json.loads(base)
  except Exception as err:
    print('Base JSON load error. %s' % err)
    sys.exit(-1)

  njson = None
  try:
    njson = json.loads(new)
  except Exception as err:
    print('New JSON load error. %s' % err)
    sys.exit(-1)

  merged = None
  try:
    merged = JsonMerge(bjson, njson)
  except Exception as err:
    print('JSON merge error. %s' % err)
    sys.exit(-1)

  print(json.dumps(merged, indent=2, separators=(',', ': ')))

jsonmerge in GitHub Action and/or python virtual environments

I've been hitting a roadblock recently in trying to get a few of my projects setup with virtual environments (using pipenv) and GitHub actions and I think the issue is related to jsonmerge. That said, I cannot figure out why it is related.

Here is an example of a python file:

#!/usr/bin/python3
#########################################################################
#########################################################################
import logging
import argparse
import os, sys, pathlib
import jsonmerge

help('modules')

and here is a build action file

# This is a basic workflow to help you get started with Actions

name: CI

# Controls when the workflow will run
on:
  # Triggers the workflow on push or pull request events but only for the master branch
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

env:
  # Tells pipenv to create virtualenvs in /root rather than $HOME/.local/share.
  # We do this because GitHub modifies the HOME variable between `docker build` and
  # `docker run`
  WORKON_HOME: /home/runner/.local/share/virtualenvs

  LC_ALL: C.UTF-8
  LANG: C.UTF-8

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "build"
  build:
    # The type of runner that the job will run on
    runs-on: ubuntu-latest

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
      # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
      - uses: actions/checkout@v2

      - name: Set up Python 3.x
        uses: actions/setup-python@v2
        with:
          # Semantic version range syntax or exact version of a Python version
          python-version: '3.x'
          # Optional - x64 or x86 architecture, defaults to x64
          architecture: 'x64'

      # Runs a set of commands using the runners shell
      - name: Updating python pip, wheel, and setup tools
        run: |
          python3 -m pip install --upgrade pip setuptools wheel

      - name: Install pipenv
        run: |
          python3 -m pip install --upgrade pipenv

      - name: Installing packages
        run: |
          #echo "******************** PIPENV Installing Packages (system)"
          #python3 -m pipenv lock
          #python3 -m pipenv install --system
          echo "******************** PIPENV Installing Packages"
          python3 -m pipenv lock
          python3 -m pipenv install

      - name: Running test script
        run: |
          pipenv run jsonmerge-test.py

and here is the error I get

docker exec cmd=[bash --noprofile --norc -e -o pipefail /home/testuser/source/workflow/5] user=
| Traceback (most recent call last):
|   File "jsonmerge-test.py", line 7, in <module>
|     import jsonmerge
| ModuleNotFoundError: No module named 'jsonmerge'
[CI/build]   ❌  Failure - Run test script
Error: exit with `FAILURE`: 1

The weird part is -- if I run a simple script with "help('modules')" -- I get back a lot of all modules and jsonmerge 1.8.0 is in the list. Which has me wondering -- am I doing something wrong?

The other interesting thing to note -- even if I skip using pipenv and just do a pip install inside the action, it still fails to run with the exact same error.

Any chance you have any thoughts/suggestions?

command line tool

this would be useful as a command line tool as well, something like:

__main__.py

    #!/usr/bin/env python

    """
        Cli jsonmerge tool.
    """

    import argparse
    import json

    import jsonmerge

    ARGP = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawTextHelpFormatter,
    )
    ARGP.add_argument('files', nargs='+', help='Json Input Files')
    ARGP.add_argument('--merge-strategy', '-X', help='Root node merge strategy')

    def main(argp=None):
        if argp is None:
            argp = ARGP.parse_args()

        schema = dict()
        if argp.schema:
            with open(argp.schema) as file_handle:
                schema = json.load(file_handle)

        result = {}
        for file in argp.files:
            with open(file) as file_handle:
                result = jsonmerge.merge(result, json.load(file_handle), schema)

        print(json.dumps(result, indent=4))


    if __name__ == '__main__':
        main()

With

an entrypoint defined in the setup.sh
~~the ability to apply a merge strategy to the root node~~
ability to load a schema file as well?

That would be great.

Better exception classes

Currently, merge and get_schema methods only throw TypeError if something is wrong. It would be better to have own exception classes (possibly derived from TypeError) for better diagnostics from the user standpoint.

Maybe have SchemaError, BaseInstanceError and HeadInstanceError?

Which tool would you recommend to create json schema ?

Like https://www.liquid-technologies.com/online-json-to-schema-converter

Add Debian packaging

Would you consider adding Debian packaging and submitting it to Debian so this can be installed with apt install python3-jsonmerge .
This would allow the package to be used by other Debian packages.

Upgrade to latest jsonschema

jsonschema is currently on 3.0.0a1. jsonmerge uses 2.6.0

Removal Method for Arrays

In my project I found it necessary to sometimes remove elements (though not objects) from json arrays. I was able to do this by adding a separate merge strategy which takes a negative image of an array, such that, e.g.

test  = {
"AnArray":["An element", "A second element"]
}

removal = {
"AnArray":["An element"]
}

new = merger.merge(test, removal) 

new  = {
"AnArray":["A second element"]
}

The code for this is quite similar to that for append:

class RemoveStrategy(ArrayStrategy):
        def _merge(
            self, walk, base, head, schema, sortByRef=None, sortReverse=None, **kwargs
        ):
            new_array = []
            for array_element in base.val:
                if array_element not in head.val:
                    new_array.append(array_element)

            base.val = new_array

            self.sort_array(walk, base, sortByRef, sortReverse)

            return base

        def get_schema(self, walk, schema, **kwargs):
            schema.val.pop("maxItems", None)
            schema.val.pop("uniqueItems", None)

            return schema

I am happy to just have this within my project, but I wished to offer that if there is interest in adding this feature to the main package then I can write up the requisite tests and doc entry.

oneOf/anyOf of required incorrectly treated as branch

The below JSON Schema construct is purely about enforcement and not structure.
This implements a variant schema enforcement (in the below it says that either A&B must be present or A&C must be present).
This should be ignored for descent and not generate an error.

{ "type" : "object",
"anyOf" :
[ { "required" :
[ "A", "B" ] },
{ "required" :
[ "A", "C" ] } ],

Alphabetic sorting for arrays

Hi,

would you be open to accept a PR to enable users to sort arrays?

Add the possibility to disable logging

I'm getting a couple hundreds of lines like :

__init__.py:46>DEBUG - descend:     schema #
__init__.py:71>DEBUG - descend:     invoke strategy overwrite

while doing one of multiple json merging.

Could you, please remove logger from __init__.py or make possible to turn it off from outer code.

Please add more examples

I find this really usefull when working with different datasets. Spark has standard way of defining dataset schema as json. Sometimes to track schema changes what we really need is to merge old and new schemas to have umbrela schema to sport everything. That is ending up to merge of two dictionaries. More particularly merjing arrays of dicts by some key identifier.

Json name/value pairs with forward slash

I used version jsonmerge-1.1.0 with name values pairs like {test: {"MDT_c/kw/day": 0}}
but in current version a schema error is raised. It appears somethings has changed on how forward slashes are used or interpreted when there are nested objects.

jsonmerge fails assertion on nan float values

Python's default JSON encoder allows for the encoding of extended values like nan, -inf, and inf. We have an external process that is recording float nan values as part of a measurement error. This is different from indicating that a value was not recorded, so we unfortunately can't just substitute null or None in this case.

When running these values through jsonmerge, we hit an assertion failure as part of the resolution process. This is because python correctly interprets float('nan') == float('nan') to be false, so the assert fails: https://github.com/avian2/jsonmerge/blob/master/jsonmerge/__init__.py#L117

Per IEEE 754 - python does not allow nan float values to be considered equal.

I would like to request an extension to the merge process that allows for the optional comparison of nan values and to consider them equal (e.g. math.isnan(base.val) == math.isnan(resolved). While not a normal/expected behavior, this is currently preventing us from using this (very useful!) library without improperly modifying our measurement data.

I suggest this be allowable as a flag for callers to override the comparison logic to consider nan values equal. In our specific case, if we see two JSON documents with nan values for the same key, we do want to consider them equal (vs. hard fail on the assert).

I would be happy to contribute to a PR on this and would be interested in hearing other thoughts on this. A simple test case to reproduce is below. Thanks!

from jsonmerge import merge

base = { 
    "foo": 1,
    "bar": float('nan')
}

head = { 
    "foo": 1,
    "bar": float('nan')
}

result = merge(base, head)

Traceback (most recent call last):
  File "json_merge_test.py", line 13, in <module>
    result = merge(base, head)
  File "py3/lib/python3.6/site-packages/jsonmerge/__init__.py", line 346, in merge
    return merger.merge(base, head)
  File "py3/lib/python3.6/site-packages/jsonmerge/__init__.py", line 301, in merge
    return walk.descend(schema, base, head, meta).val
  File "py3/lib/python3.6/site-packages/jsonmerge/__init__.py", line 78, in descend
    rv = self.work(strategy, schema, *args, **opts)
  File "py3/lib/python3.6/site-packages/jsonmerge/__init__.py", line 123, in work
    rv = strategy.merge(self, base, head, schema, meta, objclass_menu=self.merger.objclass_menu, **kwargs)
  File "py3/lib/python3.6/site-packages/jsonmerge/strategies.py", line 270, in merge
    base[k] = walk.descend(subschema, base.get(k), v, meta)
  File "py3/lib/python3.6/site-packages/jsonmerge/__init__.py", line 78, in descend
    rv = self.work(strategy, schema, *args, **opts)
  File "py3/lib/python3.6/site-packages/jsonmerge/__init__.py", line 117, in work
    assert base.val == resolved

Problem with nested json, need help

Hi,

i have the following issue and I need help. Hopefully my schema settings are wrong.

example1.json

{
    "exampleList": [
       {
          "field": "",
          "field1": "",
          "field2": "",
          "field3": "",
          "field4": "",
          "field5": "",
          "field6": "",
          "field7": false,
          "field8": 10,
          "deeperExampleList": [
             {
                "field": "This is my Entry",
                "field1": "",
                "deeperDeeperExampleList": [
                   {
                     "field": "",
                     "field2": "",
                     "field3": true
                   },
                   {
                     "field": "",
                     "field2": "",
                     "field3": false
                   }
                ]
             }
          ],
          "deeperExampleList1": [
             {
                "field": "",
                "deeperDeeperExampleList1": [
                   {
                     "field": "",
                     "field1": "",
                     "field2": "",
                     "field3": ""
                   },
                   {
                     "field": "",
                     "field1": "",
                     "field2": "",
                     "field3": ""
                   },
                   {
                     "field": "",
                     "field1": "",
                     "field2": "",
                     "field3": ""
                  }
                ]
             }
          ]
       }
    ]
 }

example2.json

{
    "exampleList": [
       {
          "field": "",
          "field1": "",
          "field2": "",
          "field3": "",
          "field4": "",
          "field5": "",
          "field6": "",
          "field7": false,
          "field8": 10,
          "deeperExampleList": [
             {
               "field": "I want append this",
               "field1": "",
               "deeperDeeperExampleList": [
                  {
                    "field": "",
                    "field2": "",
                    "field3": true
                  }
               ]
            }
          ],
          "deeperExampleList1": [
             {
                "field": "",
                "deeperDeeperExampleList1": [
                   {
                     "field": "",
                     "field1": "",
                     "field2": "",
                     "field3": ""
                   },
                   {
                     "field": "",
                     "field1": "",
                     "field2": "",
                     "field3": ""
                   },
                   {
                     "field": "",
                     "field1": "",
                     "field2": "",
                     "field3": ""
                  }
                ]
             }
          ]
       }
    ]
 }

schema:

schema = {
"properties": {
   "exampleList": {
      "items":{
         "properties": {
            "deeperExampleList": {
               "mergeStrategy": "append",
               }
            }
         }
      }
   }
}

What I want:

{
    "exampleList": [
       {
          "field": "",
          "field1": "",
          "field2": "",
          "field3": "",
          "field4": "",
          "field5": "",
          "field6": "",
          "field7": false,
          "field8": 10,
          "deeperExampleList": [
             {
                "field": "This is my Entry",
                "field1": "",
                "deeperDeeperExampleList": [
                   {
                     "field": "",
                     "field2": "",
                     "field3": true
                   },
                   {
                     "field": "",
                     "field2": "",
                     "field3": false
                   }
                ]
             },
             {
               "field": "I want append this",
               "field1": "",
               "deeperDeeperExampleList": [
                  {
                    "field": "",
                    "field2": "",
                    "field3": true
                  }
               ]
            }
          ],
          "deeperExampleList1": [
             {
                "field": "",
                "deeperDeeperExampleList1": [
                   {
                     "field": "",
                     "field1": "",
                     "field2": "",
                     "field3": ""
                   },
                   {
                     "field": "",
                     "field1": "",
                     "field2": "",
                     "field3": ""
                   },
                   {
                     "field": "",
                     "field1": "",
                     "field2": "",
                     "field3": ""
                  }
                ]
             }
          ]
       }
    ]
 }

What am I doing wrong?

multiple idRef and ignoreId

support for multiple "keys" in an array seems to be missing.
This is required for some use-cases where more than one property is used as the "key" of the array.

Add some logging at DEBUG level to ease debugging

import logging, etc. Mostly to see how subschemas are evaluated during Walk.descend().

If object uses additionalProperties: false and an input document contains a rogue key, obscure exception is raised

I fed a schema that uses "additionalProperties": false everywhere to a Merger. I then sent a bad document (because a key wasn't defined in properties:) to the Merger and got an AttributeError exception. Partial trace is below. It took me about an hour to figure out that my document was invalid. Great that I got an exception; not so good that the exception made no sense.

    rv = self.call_descender(descender, schema, *args)
  File "/Users/terris/venv/edc/lib/python3.6/site-packages/jsonmerge/__init__.py", line 102, in call_descender
    return descender.descend_instance(self, schema, base, head, meta)
  File "/Users/terris/venv/edc/lib/python3.6/site-packages/jsonmerge/descenders.py", line 22, in descend_instance
    ref = schema.val.get("$ref")
AttributeError: 'bool' object has no attribute 'get'

jsonmerge performance

Hi,
I have been using JSON-merge in my projects for while now. It has been working great.
I've recently started using it in a new project where I update/merge multiple JSON documents
every few seconds. Since these changes, the CPU utilization for my app has gone up
considerably. I've profiled my code and narrowed it down. JsonMerge call seems to
be causing most of the spike. I have tried different merging strategies etc, but I have
not made any progress in reducing the CPU consumption. Any suggestions on how I
could go about reducing the CPU consumption of JSON-merge? Any pointers is
greatly appreciated. Thanks.

PS: my default merging strategy.

merge_schema = """
{
  "oneOf": [
    { "type": "string" },
    { "type": "number" },
    { "type": "boolean" },
    {
      "type": "array",
      "mergeStrategy": "arrayMergeById",
      "mergeOptions": {"idRef": "/"}
    },
    {
      "type": "object",
      "additionalProperties": { "$ref": "#" }
    }
  ]
}
"""

def JsonMerge(base, new_obj):
  schema = json.loads(merge_schema)
  merger = Merger(schema)
  return merger.merge(base, new_obj, schema)

Coalesce null strategy

Would be helpful if there was a strategy that would prevent head with empty string to replace base with string value.