ghandic / jsf Goto Github PK

View Code? Open in Web Editor NEW

156.0 5.0 34.0 1.58 MB

Creates fake JSON files from a JSON schema

Home Page: https://ghandic.github.io/jsf

License: Other

Dockerfile 0.29% Python 97.59% Makefile 0.22% Starlark 1.90%

jsf json-schema commandline faker property-based-testing python fastapi

jsf's People

Contributors

Stargazers

Watchers

jsf's Issues

generate_and_validate method does not return any json object

Hi @ghandic

generate_and_validate() method does not return any json object. If that is how it is meant to be, maybe I can rename it to "validate()" and generate a pull request.

Use default values

Thanks for the tool!

Is there way to have it use default values from the schema where present, rather than faking values?

Expand string type to have contentSchema, contentMediaType + contentEncoding

Allow auto data generation to produce data from fake files with certain encoding

https://json-schema.org/understanding-json-schema/reference/non_json_data.html#id2

Add ability to restrict custom providers safe/unsafe

Since we are using eval anything in the provider string will be evaluated which is insecure, this should be default safe and optionally allow use of lambda's etc

When minLength and maxLength are the same, it returns empty string

Self explanatory.

How to reproduce:

from json_faker import JSF

faker = JSF(
    {
        "type": "object",
        "properties": {
            "name": {"type": "string", "minLength":3, "maxLength":3 },
            "email": {"type": "string", "$provider": "faker.email"},
        },
        "required": ["name", "email"],
    }
)

fake_json = faker.generate()
fake_json

The property name will return the '' empty string.

Attempt fix:
Momentarily, I changed the random_fixed_length_sentence function in the jsf/schema_types/string_utils/content_type/text__plain.py file to this for now but eventhough the _min variable has a default value of 0, it won't return an empty string (although it wouldn't be the most valuable string).

def random_fixed_length_sentence(_min: int = 0, _max: int = 50) -> str:

    _min = int(_min)
    _max = int(_max)

    if _min > _max:
        raise ValueError("minLength must be less than maxLength")

    # Needs better implementation to return empty string
    sentence = ""
    while len(sentence) <= _min:
        sentence = random.choice(LOREM).capitalize()
        while len(sentence) < _max and random.random() > 0.2:
            sentence += " " + random.choice(LOREM)
        # sentence += random.choice(['.', '!', '?'])
    return sentence[:_max].strip()

Feature Request: `not` schema

Support not schema: https://json-schema.org/understanding-json-schema/reference/combining#not

Question: Supporting Parquet datatypes

I have a use case where I need to generate data for parquet datatypes. I am currently using a custom version of JSF. Would you like to have this feature here?

JSON looks like the following:

"UInt32": {
      "type": "uint32"
    },
    "UInt64": {
      "type": "uint64"
    },
    "Float16": {
      "type": "float16"
    }

[number.py:jsf.src.schema_types.number:line 304 - generate()] - INFO: Generating random uint32
[number.py:jsf.src.schema_types.number:line 52 - generate()] - DEBUG: is_float: False
[number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 35227457
[number.py:jsf.src.schema_types.number:line 333 - generate()] - INFO: Generating random uint64
[number.py:jsf.src.schema_types.number:line 52 - generate()] - DEBUG: is_float: False
[number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 4669327448559716910
[number.py:jsf.src.schema_types.number:line 362 - generate()] - INFO: Generating random float16
[number.py:jsf.src.schema_types.number:line 57 - generate()] - DEBUG: is_float: True
[number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 1.920763087895552e+17

Document what versions of draft schema are supported

Currently working to draft 7, but should add support for multiple draft versions

Use test data from https://github.com/Julian/jsonschema/tree/main/json

Add support for schema composition

https://json-schema.org/understanding-json-schema/reference/combining.html#schema-composition

Handle self referencing recursive structures

https://json-schema.org/understanding-json-schema/structuring.html?highlight=ref#recursion

Handle advanced cases of non required dropout

In cases such as:

maxProperties / minProperties
Property dependencies
Schema dependencies

Option to prefer default values when available and/or example values, before random values

I also would like to be able to apply defaults to the default generation.

Example from JSON Schema Faker, show the inputs and outputs:

You can see in the above example, that the generated sample is easy to read and understand. Whereas a randomly generated set of inputs would lose much valuable context.

The use case is that we define inputs for our application as JSON schema specifications, and we ask users to provide their input matching that specification. It is much preferrable to have those generated values defaulting to the actual default values: (1) becase it is easier to understand from a user perspective if they see 'items per batch' defaulting to something like 1000 and not 1235134523451345, and (2) because in those cases, deleting the defaults has the expected effect or basically not overriding them.

Another case would be an example URL value or an example region name. Seeing the example or default value gives a much stronger hint of what kind of inputs are actually required. (If you see us-east-1 as the default, you are going to feel comfortable providing us-west-2, and you probably won't mistakenly type US West (Oregon).)

As (apparently?) implemented in JSON Schema Faker, I could imagine adding a use_defaults and use_examples into the generate() method.

Happy to contribute if this is something I could help with.

I have a simple json schema and I don't want to specify ContentEncoding for every field.

  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "accountUid": {
      "type": "string"
    }
  }
}

If I do the following:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from jsf import JSF

faker = JSF.from_json("simple.json")
fake_json = faker.generate()

print(fake_json)

I get the following error:

root@d42f6d379f02:/app# python basic.py 
/usr/local/lib/python3.10/site-packages/pydantic/_internal/_config.py:261: UserWarning: Valid config keys have changed in V2:
* 'smart_union' has been removed
  warnings.warn(message, UserWarning)
Traceback (most recent call last):
  File "/app/basic.py", line 6, in <module>
    faker = JSF.from_json("simple.json")
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 208, in from_json
    return JSF(json.load(f))
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 54, in __init__
    self._parse(schema)
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 183, in _parse
    self.root = self.__parse_definition(name="root", path="#", schema=schema)
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 141, in __parse_definition
    return self.__parse_object(name, path, schema)
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 66, in __parse_object
    props.append(self.__parse_definition(_name, path=f"{path}/{_name}", schema=definition))
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 156, in __parse_definition
    return self.__parse_primitive(name, path, schema)
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 59, in __parse_primitive
    return cls.from_dict({"name": name, "path": path, "is_nullable": is_nullable, **schema})
  File "/usr/local/lib/python3.10/site-packages/jsf/schema_types/string.py", line 156, in from_dict
    return String(**d)
  File "/usr/local/lib/python3.10/site-packages/pydantic/main.py", line 150, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 1 validation error for String
contentEncoding
  Field required [type=missing, input_value={'name': 'accountUid', 'p...False, 'type': 'string'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.0.3/v/missing

I can see here #58 where you added support for ContentEncoding and in #4 you even link to the spec of the json schema where it explains what the ContentEncodings are.

What I don't see is anyway for the default value to be used. IE I don't want to specify the ContentEncoding on each and every single field. As I'm going to have tens of schemas with hundreds of fields each that are not generated by me. Yet I need to figure out a way to generate and fake data for these schemas so we can automate some testing of ETL's.

This being said I also noticed that when I put "ContentEncoding": "8bit" in my schema, it still failed. Apparently you're checking that the value is 8-bit. Which according to the json schema docs is wrong.

The acceptable values are 7bit, 8bit, binary, quoted-printable, base16, base32, and base64. If not specified, the encoding is the same as the containing JSON document.

Expand test cases to use official cases from JSON schema docs

Use test data from https://github.com/Julian/jsonschema/tree/main/json

recursion.json test not available

Hi @ghandic

I see a recursion.json in src/tests/data, but I do not see a test for it. Is it supported by jsf? Also, is $ref:"#" supported?

Thanks.

Testing as-installed package downstream

Thanks for this package!

It would be lovely for downstream packagers if the tests:

made it through in the sdist on PyPI
- and still wouldn't get installed
used import jsf rather than import ..jsf so that they could test the as-installed package

I'd be happy to work up a PR that did these things, if that was desirable.

Motivation: I'm looking to package this for conda-forge:

conda-forge/staged-recipes#20888

The lack of tests aren't a hold-up, but do help us catch metadata creep which is only semi-automated.

Thanks again!

Handle dynamic generation of keys in object based off "pattern properties"

https://json-schema.org/understanding-json-schema/reference/object.html#id9

Handle joint types eg { "type": ["number", "string"] }

Document basic usage of jsf

Question: Logging in JSF

Is there a reason that we do not use logging in JSF?

patternProperties not working

Hi,

I am having issues with JSF when patternProperties was defined. See below:

JSON Schema:

{
    "title": "XXXXX",
    "description": "XXXXX",
    "type": "object",

    "definitions": {
        "InstructionItem": {
            "type": "object",
            "properties": {
                "Command": {
                    "description": "XXXX",
                    "type": "string"
                },
                "ExecutionTimeout": {
                    "description": "XXXX",
                    "type": "integer"
                },
                "ExecutionType": {
                    "description": "XXXXX",
                    "type": "string"
                },
                "InvokeSequence": {
                    "description": "XXXXXX",
                    "type": "integer"
                },
                "MachineLabel": {
                    "description": "XXXX",
                    "type": "string"
                },
                "NodeReference": {
                    "description": "XXXXX",
                    "type": "string"
                }
            },
            "required": [
                "Command",
                "ExecutionTimeout",
                "ExecutionType",
                "InvokeSequence",
                "MachineLabel",
                "NodeReference"
            ]
        },
        "InstructionStep": {
            "type": "object",
            "properties": {
                "CertificateURL": {
                    "description": "XXXX",
                    "type": "string"
                },
                "Description": {
                    "description": "XXXX",
                    "type": "string"
                },
                "ManualStep": {
                    "description": "XXXX",
                    "type": "boolean"
                },
                "RunAsUser": {
                    "description": "XXXX",
                    "type": "string"
                },
                "StepCommand": {
                    "description": "XXXX",
                    "type": "string"
                },
                "StepFunction": {
                    "description": "XXXX",
                    "type": "string"
                },
                "UseFunction": {
                    "description": "XXXX",
                    "type": "boolean"
                },
                "StepRun": {
                    "description": "XXXX",
                    "type": "string"
                },
                "cwd": {
                    "description": "XXXX",
                    "type": "string"
                }
            },
            "required": [
                "CertificateURL",
                "Description",
                "ManualStep",
                "RunAsUser",
                "StepCommand",
                "StepFunction",
                "UseFunction",
                "StepRun",
                "cwd"
            ]
        }
    },


    "properties": {
        "AreNotificationsEnabled": {
            "description": "XXXX",
            "type": "boolean"
        },
        "Description": {
            "description": "XXXXX",
            "type": "string"
        },
        "Instructions": {
            "description": "XXXXX",
            "type": "array",
            "items": {
                "$ref": "#/definitions/InstructionItem"
            }
        },
        "IsActive": {
            "description": "XXXXX",
            "type": "boolean"
        },
        "IsCustomerFacing": {
            "description": "XXXXXX",
            "type": "boolean"
        },
        "IsAdminFacing": {
            "description": "XXXXX",
            "type": "boolean"
        },
        "IsSystem": {
            "description": "XXXXX",
            "type": "boolean"
        },
        "Name": {
            "description": "XXXXXX",
            "type": "string"
        },
        "Nodes":{
            "description": "XXXXX",
            "type": "object",
            "patternProperties": {
                "[A-Z_]+": {
                    "description": "XXXXX",
                    "type": "object",
                    "properties": {
                        "AdminTask": {
                            "description": "XXXXX",
                            "type": "object",
                            "properties": {
                                "AdminFun": {
                                    "description": "XXXXX",
                                    "type": "object",
                                    "patternProperties": {
                                        "[A-Z_]+": {
                                            "description": "XXXX",
                                            "type": "object",
                                            "properties": {
                                                "Instructions": {
                                                    "description": "XX",
                                                    "type": "object",
                                                    "patternProperties": {
                                                        "[A-Z_-]+": {
                                                            "$ref": "#/definitions/InstructionStep"
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "required": [
        "AreNotificationsEnabled",
        "Description",
        "Instructions",
        "IsActive",
        "IsCustomerFacing",
        "IsAdminFacing",
        "IsSystem",
        "Name",
        "Nodes"
    ]
}

Error message:

> jsf --schema .\af.schema --instance .\t.json
Traceback (most recent call last):
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\object.py", line 40, in generate
    return super().generate(context)
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\base.py", line 49, in generate
    raise ProviderNotSetException()
jsf.schema_types.base.ProviderNotSetException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\object.py", line 40, in generate
    return super().generate(context)
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\base.py", line 49, in generate
    raise ProviderNotSetException()
jsf.schema_types.base.ProviderNotSetException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\python3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\python3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "%HOME%\jsonvalidator\env\Scripts\jsf.exe\__main__.py", line 7, in <module>
  File "%HOME%\jsonvalidator\env\lib\site-packages\typer\main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "%HOME%\jsonvalidator\env\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "%HOME%\jsonvalidator\env\lib\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "%HOME%\jsonvalidator\env\lib\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "%HOME%\jsonvalidator\env\lib\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "%HOME%\jsonvalidator\env\lib\site-packages\typer\main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\cli.py", line 19, in main
    JSF.from_json(schema).to_json(instance)
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\parser.py", line 143, in to_json
    json.dump(self.generate(), f, indent=2)
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\parser.py", line 131, in generate
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\object.py", line 42, in generate
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\object.py", line 42, in <dictcomp>
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\object.py", line 42, in generate
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
TypeError: 'NoneType' object is not iterable

I've tried replacing patternProperties with properties and it worked.

Thanks,

jsf requires perfect ordering of definitions when parsing

Right now __parse_definition() might rely on self.definitions for the parsing of references:

jsf/jsf/parser.py

Line 159 in 162d6cd

cls = deepcopy(self.definitions.get(f"#{frag}"))

If $refs haven't been defined in perfect order parsing might fail.

I'm currently converting pydantic models to JSON schemas and end up with a valid and compliant JSON schema.
However, the ordering of the refs is out of my control.
When loading the schema to JSF it fails with:

line 164, in __parse_definition
    cls.name = name
    ^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'name'

To reproduce the issue:

faker = JSF(
    {
        "$defs": {
            "Foo": {
                "properties": {"bar": {"$ref": "#/$defs/SomeEnum"}},
                "required": ["bar"],
                "title": "Foo",
                "type": "object",
            },
            "SomeEnum": {"enum": ["A", "B"], "title": "SomeEnum", "type": "string"},
        },
        "properties": {"foobar": {"anyOf": [{"$ref": "#/$defs/Foo"}]}},
        "required": ["foobar"],
        "title": "FooBarObject",
        "type": "object",
    }
)

However, if you switch Foo with SomeEnum it works as expected:

faker = JSF(
    {
        "$defs": {
            "SomeEnum": {"enum": ["A", "B"], "title": "SomeEnum", "type": "string"},
            "Foo": {
                "properties": {"bar": {"$ref": "#/$defs/SomeEnum"}},
                "required": ["bar"],
                "title": "Foo",
                "type": "object",
            }
        },
        "properties": {"foobar": {"anyOf": [{"$ref": "#/$defs/Foo"}]}},
        "required": ["foobar"],
        "title": "FooBarObject",
        "type": "object",
    }
)

Generation of number type fails if no integer exists between minimum and maximum

If multipleOf is not set in the schema, then generating a number always attempts to use a step of 1, and throws an exception when no such valid number exists.

>>> from jsf import JSF
>>> JSF({"type": "number", "minimum": 0.1, "maximum": 0.9}).generate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python3.8/site-packages/jsf/parser.py", line 251, in generate
    return self.root.generate(context=self.context)
  File ".../lib/python3.8/site-packages/jsf/schema_types/number.py", line 37, in generate
    step * random.randint(math.ceil(float(_min) / step), math.floor(float(_max) / step))
  File "/usr/lib/python3.8/random.py", line 248, in randint
    return self.randrange(a, b+1)
  File "/usr/lib/python3.8/random.py", line 226, in randrange
    raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (1, 1, 0)

I suggest a check that max - min is greater than step, and if not try a smaller step.

It is even worse when using exclusive Maximums and Minimums, when it is unable to find any value in a range from 0.1-2.9

>>> JSF({
        "type": "number",
        "minimum": 0.1,
        "maximum": 2.9,
        "exclusiveMinimum": True,
        "exclusiveMaximum": True
    }).generate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python3.8/site-packages/jsf/parser.py", line 251, in generate
    return self.root.generate(context=self.context)
  File ".../lib/python3.8/site-packages/jsf/schema_types/number.py", line 37, in generate
    step * random.randint(math.ceil(float(_min) / step), math.floor(float(_max) / step))
  File "/usr/lib/python3.8/random.py", line 248, in randint
    return self.randrange(a, b+1)
  File "/usr/lib/python3.8/random.py", line 226, in randrange
    raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (2, 2, 0)

Add more edge cases for testing scenarios

Testing the error handling, giving users a easy to debug output rather then default error logs

Handle "if, then and else keywords"

Feature Request: Custom faker instance

Description

Currently, you force with faker = Faker() your own instance of faker, it's possible to pass faker as parameter for better customization?

Regards

Enums always coerced to strings

The current enum implementation results in generated int and float data being coerced to str due to how Pydantic handles Union (see these docs). Pydantic will coerce the input to the first type it can match in the Union, which in the current implementation of JSFEnum is always a string for integers and floats.

class JSFEnum(BaseSchema):
    enum: Optional[List[Union[str, int, float, None]]] = []

Pydantic offers the following recommendation to solve this issue:

As such, it is recommended that, when defining Union annotations, the most specific type is included first and followed by less specific types.

However, it also issues a warning concerning Unions inside of List or Dict types:

typing.Union also ignores order when defined, so Union[int, float] == Union[float, int] which can lead to unexpected behaviour when combined with matching based on the Union type order inside other type definitions, such as List and Dict types (because python treats these definitions as singletons). For example, Dict[str, Union[int, float]] == Dict[str, Union[float, int]] with the order based on the first time it was defined. Please note that this can also be affected by third party libraries and their internal type definitions and the import orders.

Because of this I think the best solution is to use Pydantic's Smart Union which will check the entire Union for the best type match before attempting to coerce.

AttributeError when definitions are in particluar order

Hi.
I have a simple schema:

schema.json

{
  "$schema": "http://json-schema.org/draft-06/schema#",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "a_arr": {
      "type": "array",
      "items": {
        "$ref": "#/definitions/A"
      }
    }
  },
  "definitions": {
    "A": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "bar": {
          "$ref": "#/definitions/B"
        }
      }
    },
    "B": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "foo": {
          "type": "string"
        }
      }
    }
  }
}

When I am trying to generate data, I hit an error.
Code sample:

import json

from jsf import JSF

s = json.load(open("schema.json"))
f = JSF(s)
fake_json = f.generate()
print(fake_json)

I got this traceback:

Traceback

Traceback (most recent call last):
  File "/Users/vlad/Library/Application Support/JetBrains/PyCharm2023.2/scratches/scratch_105.py", line 6, in <module>
    mismo_faker = JSF(mismo)
  File "/Users/vlad/Projects/test/venv/lib/python3.9/site-packages/jsf/parser.py", line 53, in __init__
    self._parse(schema)
  File "/Users/vlad/Projects/test/venv/lib/python3.9/site-packages/jsf/parser.py", line 179, in _parse
    item = self.__parse_definition(name, path=f"#/{def_tag}", schema=definition)
  File "/Users/vlad/Projects/test/venv/lib/python3.9/site-packages/jsf/parser.py", line 140, in __parse_definition
    return self.__parse_object(name, path, schema)
  File "/Users/vlad/Projects/test/venv/lib/python3.9/site-packages/jsf/parser.py", line 65, in __parse_object
    props.append(self.__parse_definition(_name, path=f"{path}/{_name}", schema=definition))
  File "/Users/vlad/Projects/test/venv/lib/python3.9/site-packages/jsf/parser.py", line 164, in __parse_definition
    cls.name = name
AttributeError: 'NoneType' object has no attribute 'name'

Process finished with exit code 1

My pip freeze:

pip freeze

annotated-types==0.5.0
attrs==23.1.0
certifi==2023.7.22
charset-normalizer==3.2.0
Faker==19.3.1
idna==3.4
jsf==0.8.0
jsonschema==4.19.0
jsonschema-specifications==2023.7.1
pydantic==2.3.0
pydantic_core==2.6.3
python-dateutil==2.8.2
referencing==0.30.2
requests==2.31.0
rpds-py==0.10.0
rstr==3.2.1
six==1.16.0
smart-open==6.3.0
typing_extensions==4.7.1
urllib3==2.0.4

However, when I reorder the definitions, it is working as expected.

Working schema

{
  "$schema": "http://json-schema.org/draft-06/schema#",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "a_arr": {
      "type": "array",
      "items": {
        "$ref": "#/definitions/A"
      }
    }
  },
  "definitions": {
    "B": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "foo": {
          "type": "string"
        }
      }
    },
    "A": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "bar": {
          "$ref": "#/definitions/B"
        }
      }
    }
  }
}

Note that the definition of B is before the definition of A. In my case, I cannot reorder definitions, as I have a big schema with many definitions coming from my customers. I would like to help fix this, but I would appreciate guidance on how to fix it.

Pydantic 2.0 deprecations

Pydantic 2 was released a few days ago... some of our tests are now failing with

.env/lib/python3.11/site-packages/pydantic/_internal/_config.py:206: in prepare_config
    warnings.warn(DEPRECATION_MESSAGE, DeprecationWarning)
E   pydantic.warnings.PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.0.1/migration/
        config     = <class 'jsf.schema_types.enum.JSFEnum.Config'>

(pytest is set up to treat warnings as errors)

Until deprecated usages are addressed, would it be possible to specify the requirement on pydantic as >=1.10.4,<2 (instead of just >=1.10.4)?

Uplift Docstrings so that its friendly for users in IDE

Cannot reference definitions within a nested, named-schema definition scope

Description

Implementation requires a single, global reference for all definitions. Making reference to a "Complex Structure" example provided by JSONSchema.org.

The schema leverages the idea that a definition using an $id signals a new scope for that definition and references are only made to other definitions within that scope. Specifically, the root's $defs block only contains an address definition while that address block itself has a definitions block:

{
  "$defs": {
    "address": {
      "$id": "/schema/address",
      "definitions": {
        "state": {}
      }
    }
  }
}

In this example, the root object has references to the address definition's '$id' value using "/state/address". Within the address block, it also contains a definition (state) which is referenced from properties as "#/definitions/state".

In this case, the # symbol relates to the scope within the address block.

Full Complex JSON Schema Example

{
  "$id": "https://example.com/schemas/customer",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "first_name": { "type": "string" },
    "last_name": { "type": "string" },
    "shipping_address": { "$ref": "/schemas/address" },
    "billing_address": { "$ref": "/schemas/address" }
  },
  "required": ["first_name", "last_name", "shipping_address", "billing_address"],
  "$defs": {
    "address": {
      "$id": "/schemas/address",
      "$schema": "http://json-schema.org/draft-07/schema#",
      "type": "object",
      "properties": {
        "street_address": { "type": "string" },
        "city": { "type": "string" },
        "state": { "$ref": "#/definitions/state" }
      },
      "required": ["street_address", "city", "state"],
      "definitions": {
        "state": { "enum": ["CA", "NY", "... etc ..."] }
      }
    }
  }
}

NOTE: Have verified this schema works with the jsonschema Python library using a generated JSON object using this project.

Expected

Should be able to run a jsf.JSF(schema) with this JSON Schema:

import json
import jsf

schema = json.load(open("complex.schema.json" , "r"))
gen = jsf.JSF(schema)
new_json = gen.generate()
print(json.dumps(new_json, indent=2))

Actual

Running the above code generates AttributeError: 'NoneType' object has no attribute 'name' on parser.py#L181.

Making adjustments to the schema definition to flatten the dependency tree and removing references looking at internal dependencies by their $id, we can make it work.

Adjusted JSON Schema Example

{
  "$id": "https://example.com/schemas/customer",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "first_name": { "type": "string" },
    "last_name": { "type": "string" },
    "shipping_address": { "$ref": "#/$defs/address" },
    "billing_address": { "$ref": "#/$defs/address" }
  },
  "required": ["first_name", "last_name", "shipping_address", "billing_address"],
  "$defs": {
    "address": {
      "$id": "/schemas/address",
      "$schema": "http://json-schema.org/draft-07/schema#",
      "type": "object",
      "properties": {
        "street_address": { "type": "string" },
        "city": { "type": "string" },
        "state": { "$ref": "#/$defs/state" }
      },
      "required": ["street_address", "city", "state"]
    },
    "state": { "enum": ["CA", "NY", "... etc ..."] }
  }
}

Use random type when field is nullable

In the following line:

jsf/src/jsf/parser.py

Line 78 in 3db7519

raise TypeError # pragma: no cover - not currently supporting other types TODO

A TypeError is raised when the item type represents more than one type (excluding null). I'm not sure why.

Can this method just return a random type from the list (including null)?

So:

import random

...

def __is_field_nullable(self, schema: Dict[str, Any]) -> Tuple[str, bool]:
    item_type = schema.get("type")
    if isinstance(item_type, list):
        if "null" in item_type:
            return random.choice(item_type), True
    return item_type, False

Make "definitions" variable in JSF._parse method

The latest JSON Schema draft versions recommend using $defs instead of definitions with the note that the actual reference pointer should be extracted from the $ref fragment. I think it will require a change in the _parse method of https://github.com/ghandic/jsf/blob/main/src/jsf/parser.py

String generator does not contain a word of length 1 / 2

The LOREM in text__plain.py file does not contain a word with length of one and two.

Therefore if a schema specify [minLength, maxLength] = [1,2] ([1,1],[2,2]) on a string property, it will return an empty string.

For example for a country code string ( -- Not necessarily the best example because country code should be an enum rather than a string but let's say for this exercise that the code value does not really matter 🤣 )

{
  "properties": {
    "code": {
      "maxLength": 2,
      "minLength": 2,
      "title": "Code",
      "type": "string"
    }
  }
}

Possible solution:
Just change the Lorem string to include a few small words

Edit:
The following line is also not correct if we want to be able to have a word of exact size
valid_words = list(filter(lambda s: len(s) < remaining, LOREM))
Should be replaced by
valid_words = list(filter(lambda s: len(s) <= remaining, LOREM))
The rest should still work thanks to that .strip() at the end that will remove the extra space.

Make CLI optional?

typer brings along rather a lot of dependencies. Might it be possible to make that dependency optional for using this as a library? One way would be a [cli] extra, or a whole separate package for jsf-cli.

Handle self references

Seems to fail on all use of references including self references

Take a look at the 2 schemas attached

sample_working.txt
sample_broken.txt

Add support for $id property

https://json-schema.org/understanding-json-schema/structuring.html?highlight=ref#id

can jsf work with schema no provider?

Schema I Used

{
    "title": "AlertSync",
    "description": "\u5ba1\u8ba1\u544a\u8b66model",
    "type": "object",
    "properties": {
        "audit_label": {
            "title": "Audit Label",
            "type": "string",
            "format": "ipv4"
        },
        "category": {
            "title": "Category",
            "minimum": 1,
            "maximum": 15,
            "type": "integer"
        },
        "level": {
            "title": "Level",
            "minimum": 0,
            "maximum": 3,
            "type": "integer"
        },
        "src_mac": {
            "title": "Src Mac",
            "default": "00:00:00:00:00:00",
            "pattern": "^([0-9A-F]{2})(\\:[0-9A-F]{2}){5}$",
            "type": "string"
        },
        "src_ip": {
            "title": "Src Ip",
            "type": "string",
            "format": "ipv4"
        },
        "src_port": {
            "title": "Src Port",
            "minimum": 1,
            "maximum": 65535,
            "type": "integer"
        },
        "dst_mac": {
            "title": "Dst Mac",
            "default": "FF:FF:FF:FF:FF:FF",
            "pattern": "^([0-9A-F]{2})(\\:[0-9A-F]{2}){5}$",
            "type": "string"
        },
        "dst_ip": {
            "title": "Dst Ip",
            "type": "string",
            "format": "ipv4"
        },
        "dst_port": {
            "title": "Dst Port",
            "minimum": 1,
            "maximum": 65535,
            "type": "integer"
        },
        "l4_protocol": {
            "$ref": "#/definitions/L4ProtocolEnum"
        },
        "protocol": {
            "$ref": "#/definitions/ProtocolEnum"
        },
        "illegal_ip": {
            "title": "Illegal Ip",
            "default": [],
            "type": "array",
            "items": {
                "type": "string",
                "format": "ipv4"
            }
        },
        "last_at": {
            "title": "Last At",
            "default": "2022-12-30T14:08:30.753677",
            "type": "string",
            "format": "date-time"
        },
        "count": {
            "title": "Count",
            "default": 1,
            "minimum": 1,
            "maximum": 100000,
            "type": "integer"
        },
        "other_info": {
            "title": "Other Info",
            "type": "object"
        },
        "payload": {
            "title": "Payload",
            "pattern": "^([0-9A-F]{2})+$",
            "type": "string"
        }
    },
    "required": [
        "audit_label",
        "category",
        "level",
        "l4_protocol",
        "protocol"
    ],
    "definitions": {
        "L4ProtocolEnum": {
            "title": "L4ProtocolEnum",
            "description": "An enumeration.",
            "enum": [
                "TCP",
                "UDP"
            ],
            "type": "string"
        },
        "ProtocolEnum": {
            "title": "ProtocolEnum",
            "description": "An enumeration.",
            "enum": [
                "S7COMM",
                "MODBUS"
            ],
            "type": "string"
        }
    }
}

error message

Traceback (most recent call last):
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/object.py", line 40, in generate
    return super().generate(context)
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/base.py", line 49, in generate
    raise ProviderNotSetException()
jsf.schema_types.base.ProviderNotSetException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/object.py", line 40, in generate
    return super().generate(context)
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/base.py", line 49, in generate
    raise ProviderNotSetException()
jsf.schema_types.base.ProviderNotSetException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/repos/sa-data-perf/debug.py", line 14, in <module>
    print(jsf.generate())
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/parser.py", line 137, in generate
    return self.root.generate(context=self.context)
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/object.py", line 42, in generate
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/object.py", line 42, in <dictcomp>
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/object.py", line 42, in generate
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
TypeError: 'NoneType' object is not iterable

My question

can jsf work with schema like given, this schema was generated by pydantic, i'm not sure which part cause this error, hope log more specifically to tell me which property cause this error

BUG: Unique items in array of dictionaries

Since current implementation is making use of sets in Python, dicts are not hashable, change would be needed to rectify this.

Example

"errors": {
            "type": "object",
            "properties": {
                "validationErrors": {
                    "type": "array",
                    "minItems": 0,
                    "maxItems": 2,
                    "uniqueItems": false,
                    "items": [
                        {
                            "type": "object",
                            "$state": {
                                "error": "lambda: random.choice([{'code':'3013','message':'Mandatory field is either Null or blank','field':'IDNumber'}, {'code':'2013','message':'Mandatory field is either Null or blank','field':'IDNumber'}])"
                            },
                            "properties": {
                                "code": {
                                    "type": "string",
                                    "description": "Error code from Digital gateway validation checks",
                                    "$provider": "lambda: state['validationErrors[0]']['error']['code']"
                                },
                                "message": {
                                    "type": "string",
                                    "description": "",
                                    "$provider": "lambda: state['validationErrors[0]']['error']['message']"
                                },
                                "field": {
                                    "type": "string",
                                    "description": "",
                                    "$provider": "lambda: state['validationErrors[0]']['error']['field']"
                                }
                            },
                            "required": ["code", "message", "field"]
                        }
                    ]
                }
            },
            "required": ["validationErrors"]
        }

Add support for multiple schemas being combined

https://json-schema.org/understanding-json-schema/structuring.html?highlight=ref#bundling

Error when generating from schema that has "oneOf" property

If we are using a schema like this:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "urn://Media.schema.json",
  "title": "Media",
  "version": "0.0.1",
  "description": "This event represents a Media",
  "type": "object",
  "properties": {
    "envID": {
      "type": "string",
      "minLength": 10
    },
    "envTimestamp": {
      "type": "integer",
      "exclusiveMinimum": 0
    },
    "javaType": {
      "type": "string"
    },
    "mediaKey": {
      "type": "string",
      "minLength": 5,
      "maxLength": 64
    },
    "mediaType": {
      "type": "string",
      "enum": [
        "COVER"
      ]
    }
  },
  "required": [
    "envID",
    "envTimestamp",
    "mediaType"
  ],
  "additionalProperties": false,
  "oneOf": [
    {
      "properties": {
        "mediaType": {
          "const": "COVER"
        }
      },
      "required": [
        "javaType"
      ]
    }
  ]
}

When "mediaType" is "COVER" the variable "javaType" must be included always. This is not happening now

Document advanced usage of jsf

This should include using shared local states, custom execution contexts, custom initial states, validation etc

ghandic / jsf Goto Github PK

jsf's People

Contributors

Stargazers

Watchers

Forkers

jsf's Issues

Description

Description

Expected

Actual

Schema I Used

error message

My question

Recommend Projects

Recommend Topics

Recommend Org