Giter Club home page Giter Club logo

jsf's People

Contributors

abraha2d avatar andrewshawcare avatar bollwyvl avatar cponfick avatar dependabot[bot] avatar elecay avatar ghandic avatar jenniferplusplus avatar joachimhuet avatar jtyoung84 avatar leobaldock avatar mschwab12 avatar simon-schoonjans avatar talinx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

jsf's Issues

Use default values

Thanks for the tool!

Is there way to have it use default values from the schema where present, rather than faking values?

When minLength and maxLength are the same, it returns empty string

Self explanatory.

How to reproduce:

from json_faker import JSF

faker = JSF(
    {
        "type": "object",
        "properties": {
            "name": {"type": "string", "minLength":3, "maxLength":3 },
            "email": {"type": "string", "$provider": "faker.email"},
        },
        "required": ["name", "email"],
    }
)

fake_json = faker.generate()
fake_json

The property name will return the '' empty string.

Attempt fix:
Momentarily, I changed the random_fixed_length_sentence function in the jsf/schema_types/string_utils/content_type/text__plain.py file to this for now but eventhough the _min variable has a default value of 0, it won't return an empty string (although it wouldn't be the most valuable string).

def random_fixed_length_sentence(_min: int = 0, _max: int = 50) -> str:

    _min = int(_min)
    _max = int(_max)

    if _min > _max:
        raise ValueError("minLength must be less than maxLength")

    # Needs better implementation to return empty string
    sentence = ""
    while len(sentence) <= _min:
        sentence = random.choice(LOREM).capitalize()
        while len(sentence) < _max and random.random() > 0.2:
            sentence += " " + random.choice(LOREM)
        # sentence += random.choice(['.', '!', '?'])
    return sentence[:_max].strip()

Question: Supporting Parquet datatypes

I have a use case where I need to generate data for parquet datatypes. I am currently using a custom version of JSF. Would you like to have this feature here?

JSON looks like the following:

"UInt32": {
      "type": "uint32"
    },
    "UInt64": {
      "type": "uint64"
    },
    "Float16": {
      "type": "float16"
    }

[number.py:jsf.src.schema_types.number:line 304 - generate()] - INFO: Generating random uint32
[number.py:jsf.src.schema_types.number:line 52 - generate()] - DEBUG: is_float: False
[number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 35227457
[number.py:jsf.src.schema_types.number:line 333 - generate()] - INFO: Generating random uint64
[number.py:jsf.src.schema_types.number:line 52 - generate()] - DEBUG: is_float: False
[number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 4669327448559716910
[number.py:jsf.src.schema_types.number:line 362 - generate()] - INFO: Generating random float16
[number.py:jsf.src.schema_types.number:line 57 - generate()] - DEBUG: is_float: True
[number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 1.920763087895552e+17

Option to prefer default values when available and/or example values, before random values

I also would like to be able to apply defaults to the default generation.

Example from JSON Schema Faker, show the inputs and outputs:

image

You can see in the above example, that the generated sample is easy to read and understand. Whereas a randomly generated set of inputs would lose much valuable context.

The use case is that we define inputs for our application as JSON schema specifications, and we ask users to provide their input matching that specification. It is much preferrable to have those generated values defaulting to the actual default values: (1) becase it is easier to understand from a user perspective if they see 'items per batch' defaulting to something like 1000 and not 1235134523451345, and (2) because in those cases, deleting the defaults has the expected effect or basically not overriding them.

Another case would be an example URL value or an example region name. Seeing the example or default value gives a much stronger hint of what kind of inputs are actually required. (If you see us-east-1 as the default, you are going to feel comfortable providing us-west-2, and you probably won't mistakenly type US West (Oregon).)

As (apparently?) implemented in JSON Schema Faker, I could imagine adding a use_defaults and use_examples into the generate() method.

image

Happy to contribute if this is something I could help with.

I have a simple json schema and I don't want to specify ContentEncoding for every field.

  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "accountUid": {
      "type": "string"
    }
  }
}

If I do the following:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from jsf import JSF

faker = JSF.from_json("simple.json")
fake_json = faker.generate()

print(fake_json)

I get the following error:

root@d42f6d379f02:/app# python basic.py 
/usr/local/lib/python3.10/site-packages/pydantic/_internal/_config.py:261: UserWarning: Valid config keys have changed in V2:
* 'smart_union' has been removed
  warnings.warn(message, UserWarning)
Traceback (most recent call last):
  File "/app/basic.py", line 6, in <module>
    faker = JSF.from_json("simple.json")
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 208, in from_json
    return JSF(json.load(f))
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 54, in __init__
    self._parse(schema)
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 183, in _parse
    self.root = self.__parse_definition(name="root", path="#", schema=schema)
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 141, in __parse_definition
    return self.__parse_object(name, path, schema)
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 66, in __parse_object
    props.append(self.__parse_definition(_name, path=f"{path}/{_name}", schema=definition))
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 156, in __parse_definition
    return self.__parse_primitive(name, path, schema)
  File "/usr/local/lib/python3.10/site-packages/jsf/parser.py", line 59, in __parse_primitive
    return cls.from_dict({"name": name, "path": path, "is_nullable": is_nullable, **schema})
  File "/usr/local/lib/python3.10/site-packages/jsf/schema_types/string.py", line 156, in from_dict
    return String(**d)
  File "/usr/local/lib/python3.10/site-packages/pydantic/main.py", line 150, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 1 validation error for String
contentEncoding
  Field required [type=missing, input_value={'name': 'accountUid', 'p...False, 'type': 'string'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.0.3/v/missing

I can see here #58 where you added support for ContentEncoding and in #4 you even link to the spec of the json schema where it explains what the ContentEncodings are.

What I don't see is anyway for the default value to be used. IE I don't want to specify the ContentEncoding on each and every single field. As I'm going to have tens of schemas with hundreds of fields each that are not generated by me. Yet I need to figure out a way to generate and fake data for these schemas so we can automate some testing of ETL's.

This being said I also noticed that when I put "ContentEncoding": "8bit" in my schema, it still failed. Apparently you're checking that the value is 8-bit. Which according to the json schema docs is wrong.

The acceptable values are 7bit, 8bit, binary, quoted-printable, base16, base32, and base64. If not specified, the encoding is the same as the containing JSON document.

Testing as-installed package downstream

Thanks for this package!

It would be lovely for downstream packagers if the tests:

  • made it through in the sdist on PyPI
    • and still wouldn't get installed
  • used import jsf rather than import ..jsf so that they could test the as-installed package

I'd be happy to work up a PR that did these things, if that was desirable.

Motivation: I'm looking to package this for conda-forge:

The lack of tests aren't a hold-up, but do help us catch metadata creep which is only semi-automated.

Thanks again!

patternProperties not working

Hi,

I am having issues with JSF when patternProperties was defined. See below:

JSON Schema:

{
    "title": "XXXXX",
    "description": "XXXXX",
    "type": "object",

    "definitions": {
        "InstructionItem": {
            "type": "object",
            "properties": {
                "Command": {
                    "description": "XXXX",
                    "type": "string"
                },
                "ExecutionTimeout": {
                    "description": "XXXX",
                    "type": "integer"
                },
                "ExecutionType": {
                    "description": "XXXXX",
                    "type": "string"
                },
                "InvokeSequence": {
                    "description": "XXXXXX",
                    "type": "integer"
                },
                "MachineLabel": {
                    "description": "XXXX",
                    "type": "string"
                },
                "NodeReference": {
                    "description": "XXXXX",
                    "type": "string"
                }
            },
            "required": [
                "Command",
                "ExecutionTimeout",
                "ExecutionType",
                "InvokeSequence",
                "MachineLabel",
                "NodeReference"
            ]
        },
        "InstructionStep": {
            "type": "object",
            "properties": {
                "CertificateURL": {
                    "description": "XXXX",
                    "type": "string"
                },
                "Description": {
                    "description": "XXXX",
                    "type": "string"
                },
                "ManualStep": {
                    "description": "XXXX",
                    "type": "boolean"
                },
                "RunAsUser": {
                    "description": "XXXX",
                    "type": "string"
                },
                "StepCommand": {
                    "description": "XXXX",
                    "type": "string"
                },
                "StepFunction": {
                    "description": "XXXX",
                    "type": "string"
                },
                "UseFunction": {
                    "description": "XXXX",
                    "type": "boolean"
                },
                "StepRun": {
                    "description": "XXXX",
                    "type": "string"
                },
                "cwd": {
                    "description": "XXXX",
                    "type": "string"
                }
            },
            "required": [
                "CertificateURL",
                "Description",
                "ManualStep",
                "RunAsUser",
                "StepCommand",
                "StepFunction",
                "UseFunction",
                "StepRun",
                "cwd"
            ]
        }
    },


    "properties": {
        "AreNotificationsEnabled": {
            "description": "XXXX",
            "type": "boolean"
        },
        "Description": {
            "description": "XXXXX",
            "type": "string"
        },
        "Instructions": {
            "description": "XXXXX",
            "type": "array",
            "items": {
                "$ref": "#/definitions/InstructionItem"
            }
        },
        "IsActive": {
            "description": "XXXXX",
            "type": "boolean"
        },
        "IsCustomerFacing": {
            "description": "XXXXXX",
            "type": "boolean"
        },
        "IsAdminFacing": {
            "description": "XXXXX",
            "type": "boolean"
        },
        "IsSystem": {
            "description": "XXXXX",
            "type": "boolean"
        },
        "Name": {
            "description": "XXXXXX",
            "type": "string"
        },
        "Nodes":{
            "description": "XXXXX",
            "type": "object",
            "patternProperties": {
                "[A-Z_]+": {
                    "description": "XXXXX",
                    "type": "object",
                    "properties": {
                        "AdminTask": {
                            "description": "XXXXX",
                            "type": "object",
                            "properties": {
                                "AdminFun": {
                                    "description": "XXXXX",
                                    "type": "object",
                                    "patternProperties": {
                                        "[A-Z_]+": {
                                            "description": "XXXX",
                                            "type": "object",
                                            "properties": {
                                                "Instructions": {
                                                    "description": "XX",
                                                    "type": "object",
                                                    "patternProperties": {
                                                        "[A-Z_-]+": {
                                                            "$ref": "#/definitions/InstructionStep"
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "required": [
        "AreNotificationsEnabled",
        "Description",
        "Instructions",
        "IsActive",
        "IsCustomerFacing",
        "IsAdminFacing",
        "IsSystem",
        "Name",
        "Nodes"
    ]
}

Error message:

> jsf --schema .\af.schema --instance .\t.json
Traceback (most recent call last):
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\object.py", line 40, in generate
    return super().generate(context)
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\base.py", line 49, in generate
    raise ProviderNotSetException()
jsf.schema_types.base.ProviderNotSetException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\object.py", line 40, in generate
    return super().generate(context)
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\base.py", line 49, in generate
    raise ProviderNotSetException()
jsf.schema_types.base.ProviderNotSetException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\python3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\python3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "%HOME%\jsonvalidator\env\Scripts\jsf.exe\__main__.py", line 7, in <module>
  File "%HOME%\jsonvalidator\env\lib\site-packages\typer\main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "%HOME%\jsonvalidator\env\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "%HOME%\jsonvalidator\env\lib\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "%HOME%\jsonvalidator\env\lib\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "%HOME%\jsonvalidator\env\lib\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "%HOME%\jsonvalidator\env\lib\site-packages\typer\main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\cli.py", line 19, in main
    JSF.from_json(schema).to_json(instance)
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\parser.py", line 143, in to_json
    json.dump(self.generate(), f, indent=2)
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\parser.py", line 131, in generate
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\object.py", line 42, in generate
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\object.py", line 42, in <dictcomp>
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
  File "%HOME%\jsonvalidator\env\lib\site-packages\jsf\schema_types\object.py", line 42, in generate
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
TypeError: 'NoneType' object is not iterable

I've tried replacing patternProperties with properties and it worked.

Thanks,

jsf requires perfect ordering of definitions when parsing

Right now __parse_definition() might rely on self.definitions for the parsing of references:

cls = deepcopy(self.definitions.get(f"#{frag}"))

If $refs haven't been defined in perfect order parsing might fail.

I'm currently converting pydantic models to JSON schemas and end up with a valid and compliant JSON schema.
However, the ordering of the refs is out of my control.
When loading the schema to JSF it fails with:

line 164, in __parse_definition
    cls.name = name
    ^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'name'

To reproduce the issue:

faker = JSF(
    {
        "$defs": {
            "Foo": {
                "properties": {"bar": {"$ref": "#/$defs/SomeEnum"}},
                "required": ["bar"],
                "title": "Foo",
                "type": "object",
            },
            "SomeEnum": {"enum": ["A", "B"], "title": "SomeEnum", "type": "string"},
        },
        "properties": {"foobar": {"anyOf": [{"$ref": "#/$defs/Foo"}]}},
        "required": ["foobar"],
        "title": "FooBarObject",
        "type": "object",
    }
)

However, if you switch Foo with SomeEnum it works as expected:

faker = JSF(
    {
        "$defs": {
            "SomeEnum": {"enum": ["A", "B"], "title": "SomeEnum", "type": "string"},
            "Foo": {
                "properties": {"bar": {"$ref": "#/$defs/SomeEnum"}},
                "required": ["bar"],
                "title": "Foo",
                "type": "object",
            }
        },
        "properties": {"foobar": {"anyOf": [{"$ref": "#/$defs/Foo"}]}},
        "required": ["foobar"],
        "title": "FooBarObject",
        "type": "object",
    }
)

Generation of number type fails if no integer exists between minimum and maximum

If multipleOf is not set in the schema, then generating a number always attempts to use a step of 1, and throws an exception when no such valid number exists.

>>> from jsf import JSF
>>> JSF({"type": "number", "minimum": 0.1, "maximum": 0.9}).generate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python3.8/site-packages/jsf/parser.py", line 251, in generate
    return self.root.generate(context=self.context)
  File ".../lib/python3.8/site-packages/jsf/schema_types/number.py", line 37, in generate
    step * random.randint(math.ceil(float(_min) / step), math.floor(float(_max) / step))
  File "/usr/lib/python3.8/random.py", line 248, in randint
    return self.randrange(a, b+1)
  File "/usr/lib/python3.8/random.py", line 226, in randrange
    raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (1, 1, 0)

I suggest a check that max - min is greater than step, and if not try a smaller step.

It is even worse when using exclusive Maximums and Minimums, when it is unable to find any value in a range from 0.1-2.9

>>> JSF({
        "type": "number",
        "minimum": 0.1,
        "maximum": 2.9,
        "exclusiveMinimum": True,
        "exclusiveMaximum": True
    }).generate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python3.8/site-packages/jsf/parser.py", line 251, in generate
    return self.root.generate(context=self.context)
  File ".../lib/python3.8/site-packages/jsf/schema_types/number.py", line 37, in generate
    step * random.randint(math.ceil(float(_min) / step), math.floor(float(_max) / step))
  File "/usr/lib/python3.8/random.py", line 248, in randint
    return self.randrange(a, b+1)
  File "/usr/lib/python3.8/random.py", line 226, in randrange
    raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (2, 2, 0)

Feature Request: Custom faker instance

Description

Currently, you force with faker = Faker() your own instance of faker, it's possible to pass faker as parameter for better customization?

Regards

Enums always coerced to strings

The current enum implementation results in generated int and float data being coerced to str due to how Pydantic handles Union (see these docs). Pydantic will coerce the input to the first type it can match in the Union, which in the current implementation of JSFEnum is always a string for integers and floats.

class JSFEnum(BaseSchema):
    enum: Optional[List[Union[str, int, float, None]]] = []

Pydantic offers the following recommendation to solve this issue:

As such, it is recommended that, when defining Union annotations, the most specific type is included first and followed by less specific types.

However, it also issues a warning concerning Unions inside of List or Dict types:

typing.Union also ignores order when defined, so Union[int, float] == Union[float, int] which can lead to unexpected behaviour when combined with matching based on the Union type order inside other type definitions, such as List and Dict types (because python treats these definitions as singletons). For example, Dict[str, Union[int, float]] == Dict[str, Union[float, int]] with the order based on the first time it was defined. Please note that this can also be affected by third party libraries and their internal type definitions and the import orders.

Because of this I think the best solution is to use Pydantic's Smart Union which will check the entire Union for the best type match before attempting to coerce.

AttributeError when definitions are in particluar order

Hi.
I have a simple schema:

schema.json
{
  "$schema": "http://json-schema.org/draft-06/schema#",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "a_arr": {
      "type": "array",
      "items": {
        "$ref": "#/definitions/A"
      }
    }
  },
  "definitions": {
    "A": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "bar": {
          "$ref": "#/definitions/B"
        }
      }
    },
    "B": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "foo": {
          "type": "string"
        }
      }
    }
  }
}

When I am trying to generate data, I hit an error.
Code sample:

import json

from jsf import JSF

s = json.load(open("schema.json"))
f = JSF(s)
fake_json = f.generate()
print(fake_json)

I got this traceback:

Traceback
Traceback (most recent call last):
  File "/Users/vlad/Library/Application Support/JetBrains/PyCharm2023.2/scratches/scratch_105.py", line 6, in <module>
    mismo_faker = JSF(mismo)
  File "/Users/vlad/Projects/test/venv/lib/python3.9/site-packages/jsf/parser.py", line 53, in __init__
    self._parse(schema)
  File "/Users/vlad/Projects/test/venv/lib/python3.9/site-packages/jsf/parser.py", line 179, in _parse
    item = self.__parse_definition(name, path=f"#/{def_tag}", schema=definition)
  File "/Users/vlad/Projects/test/venv/lib/python3.9/site-packages/jsf/parser.py", line 140, in __parse_definition
    return self.__parse_object(name, path, schema)
  File "/Users/vlad/Projects/test/venv/lib/python3.9/site-packages/jsf/parser.py", line 65, in __parse_object
    props.append(self.__parse_definition(_name, path=f"{path}/{_name}", schema=definition))
  File "/Users/vlad/Projects/test/venv/lib/python3.9/site-packages/jsf/parser.py", line 164, in __parse_definition
    cls.name = name
AttributeError: 'NoneType' object has no attribute 'name'

Process finished with exit code 1

My pip freeze:

pip freeze
annotated-types==0.5.0
attrs==23.1.0
certifi==2023.7.22
charset-normalizer==3.2.0
Faker==19.3.1
idna==3.4
jsf==0.8.0
jsonschema==4.19.0
jsonschema-specifications==2023.7.1
pydantic==2.3.0
pydantic_core==2.6.3
python-dateutil==2.8.2
referencing==0.30.2
requests==2.31.0
rpds-py==0.10.0
rstr==3.2.1
six==1.16.0
smart-open==6.3.0
typing_extensions==4.7.1
urllib3==2.0.4

However, when I reorder the definitions, it is working as expected.

Working schema
{
  "$schema": "http://json-schema.org/draft-06/schema#",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "a_arr": {
      "type": "array",
      "items": {
        "$ref": "#/definitions/A"
      }
    }
  },
  "definitions": {
    "B": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "foo": {
          "type": "string"
        }
      }
    },
    "A": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "bar": {
          "$ref": "#/definitions/B"
        }
      }
    }
  }
}
Note that the definition of B is before the definition of A. In my case, I cannot reorder definitions, as I have a big schema with many definitions coming from my customers. I would like to help fix this, but I would appreciate guidance on how to fix it.

Pydantic 2.0 deprecations

Pydantic 2 was released a few days ago... some of our tests are now failing with

.env/lib/python3.11/site-packages/pydantic/_internal/_config.py:206: in prepare_config
    warnings.warn(DEPRECATION_MESSAGE, DeprecationWarning)
E   pydantic.warnings.PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.0.1/migration/
        config     = <class 'jsf.schema_types.enum.JSFEnum.Config'>

(pytest is set up to treat warnings as errors)

Until deprecated usages are addressed, would it be possible to specify the requirement on pydantic as >=1.10.4,<2 (instead of just >=1.10.4)?

Cannot reference definitions within a nested, named-schema definition scope

Description

Implementation requires a single, global reference for all definitions. Making reference to a "Complex Structure" example provided by JSONSchema.org.

The schema leverages the idea that a definition using an $id signals a new scope for that definition and references are only made to other definitions within that scope. Specifically, the root's $defs block only contains an address definition while that address block itself has a definitions block:

{
  "$defs": {
    "address": {
      "$id": "/schema/address",
      "definitions": {
        "state": {}
      }
    }
  }
}

In this example, the root object has references to the address definition's '$id' value using "/state/address". Within the address block, it also contains a definition (state) which is referenced from properties as "#/definitions/state".

In this case, the # symbol relates to the scope within the address block.

Full Complex JSON Schema Example

{
  "$id": "https://example.com/schemas/customer",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "first_name": { "type": "string" },
    "last_name": { "type": "string" },
    "shipping_address": { "$ref": "/schemas/address" },
    "billing_address": { "$ref": "/schemas/address" }
  },
  "required": ["first_name", "last_name", "shipping_address", "billing_address"],
  "$defs": {
    "address": {
      "$id": "/schemas/address",
      "$schema": "http://json-schema.org/draft-07/schema#",
      "type": "object",
      "properties": {
        "street_address": { "type": "string" },
        "city": { "type": "string" },
        "state": { "$ref": "#/definitions/state" }
      },
      "required": ["street_address", "city", "state"],
      "definitions": {
        "state": { "enum": ["CA", "NY", "... etc ..."] }
      }
    }
  }
}

NOTE: Have verified this schema works with the jsonschema Python library using a generated JSON object using this project.

Expected

Should be able to run a jsf.JSF(schema) with this JSON Schema:

import json
import jsf

schema = json.load(open("complex.schema.json" , "r"))
gen = jsf.JSF(schema)
new_json = gen.generate()
print(json.dumps(new_json, indent=2))

Actual

Running the above code generates AttributeError: 'NoneType' object has no attribute 'name' on parser.py#L181.

Making adjustments to the schema definition to flatten the dependency tree and removing references looking at internal dependencies by their $id, we can make it work.

Adjusted JSON Schema Example

{
  "$id": "https://example.com/schemas/customer",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "first_name": { "type": "string" },
    "last_name": { "type": "string" },
    "shipping_address": { "$ref": "#/$defs/address" },
    "billing_address": { "$ref": "#/$defs/address" }
  },
  "required": ["first_name", "last_name", "shipping_address", "billing_address"],
  "$defs": {
    "address": {
      "$id": "/schemas/address",
      "$schema": "http://json-schema.org/draft-07/schema#",
      "type": "object",
      "properties": {
        "street_address": { "type": "string" },
        "city": { "type": "string" },
        "state": { "$ref": "#/$defs/state" }
      },
      "required": ["street_address", "city", "state"]
    },
    "state": { "enum": ["CA", "NY", "... etc ..."] }
  }
}

Use random type when field is nullable

In the following line:

raise TypeError # pragma: no cover - not currently supporting other types TODO

A TypeError is raised when the item type represents more than one type (excluding null). I'm not sure why.

Can this method just return a random type from the list (including null)?

So:

import random

...

def __is_field_nullable(self, schema: Dict[str, Any]) -> Tuple[str, bool]:
    item_type = schema.get("type")
    if isinstance(item_type, list):
        if "null" in item_type:
            return random.choice(item_type), True
    return item_type, False

String generator does not contain a word of length 1 / 2

The LOREM in text__plain.py file does not contain a word with length of one and two.

Therefore if a schema specify [minLength, maxLength] = [1,2] ([1,1],[2,2]) on a string property, it will return an empty string.

For example for a country code string ( -- Not necessarily the best example because country code should be an enum rather than a string but let's say for this exercise that the code value does not really matter ๐Ÿคฃ )

{
  "properties": {
    "code": {
      "maxLength": 2,
      "minLength": 2,
      "title": "Code",
      "type": "string"
    }
  }
}

Possible solution:
Just change the Lorem string to include a few small words

Edit:
The following line is also not correct if we want to be able to have a word of exact size
valid_words = list(filter(lambda s: len(s) < remaining, LOREM))
Should be replaced by
valid_words = list(filter(lambda s: len(s) <= remaining, LOREM))
The rest should still work thanks to that .strip() at the end that will remove the extra space.

Make CLI optional?

typer brings along rather a lot of dependencies. Might it be possible to make that dependency optional for using this as a library? One way would be a [cli] extra, or a whole separate package for jsf-cli.

can jsf work with schema no provider?

Schema I Used

{
    "title": "AlertSync",
    "description": "\u5ba1\u8ba1\u544a\u8b66model",
    "type": "object",
    "properties": {
        "audit_label": {
            "title": "Audit Label",
            "type": "string",
            "format": "ipv4"
        },
        "category": {
            "title": "Category",
            "minimum": 1,
            "maximum": 15,
            "type": "integer"
        },
        "level": {
            "title": "Level",
            "minimum": 0,
            "maximum": 3,
            "type": "integer"
        },
        "src_mac": {
            "title": "Src Mac",
            "default": "00:00:00:00:00:00",
            "pattern": "^([0-9A-F]{2})(\\:[0-9A-F]{2}){5}$",
            "type": "string"
        },
        "src_ip": {
            "title": "Src Ip",
            "type": "string",
            "format": "ipv4"
        },
        "src_port": {
            "title": "Src Port",
            "minimum": 1,
            "maximum": 65535,
            "type": "integer"
        },
        "dst_mac": {
            "title": "Dst Mac",
            "default": "FF:FF:FF:FF:FF:FF",
            "pattern": "^([0-9A-F]{2})(\\:[0-9A-F]{2}){5}$",
            "type": "string"
        },
        "dst_ip": {
            "title": "Dst Ip",
            "type": "string",
            "format": "ipv4"
        },
        "dst_port": {
            "title": "Dst Port",
            "minimum": 1,
            "maximum": 65535,
            "type": "integer"
        },
        "l4_protocol": {
            "$ref": "#/definitions/L4ProtocolEnum"
        },
        "protocol": {
            "$ref": "#/definitions/ProtocolEnum"
        },
        "illegal_ip": {
            "title": "Illegal Ip",
            "default": [],
            "type": "array",
            "items": {
                "type": "string",
                "format": "ipv4"
            }
        },
        "last_at": {
            "title": "Last At",
            "default": "2022-12-30T14:08:30.753677",
            "type": "string",
            "format": "date-time"
        },
        "count": {
            "title": "Count",
            "default": 1,
            "minimum": 1,
            "maximum": 100000,
            "type": "integer"
        },
        "other_info": {
            "title": "Other Info",
            "type": "object"
        },
        "payload": {
            "title": "Payload",
            "pattern": "^([0-9A-F]{2})+$",
            "type": "string"
        }
    },
    "required": [
        "audit_label",
        "category",
        "level",
        "l4_protocol",
        "protocol"
    ],
    "definitions": {
        "L4ProtocolEnum": {
            "title": "L4ProtocolEnum",
            "description": "An enumeration.",
            "enum": [
                "TCP",
                "UDP"
            ],
            "type": "string"
        },
        "ProtocolEnum": {
            "title": "ProtocolEnum",
            "description": "An enumeration.",
            "enum": [
                "S7COMM",
                "MODBUS"
            ],
            "type": "string"
        }
    }
}

error message

Traceback (most recent call last):
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/object.py", line 40, in generate
    return super().generate(context)
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/base.py", line 49, in generate
    raise ProviderNotSetException()
jsf.schema_types.base.ProviderNotSetException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/object.py", line 40, in generate
    return super().generate(context)
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/base.py", line 49, in generate
    raise ProviderNotSetException()
jsf.schema_types.base.ProviderNotSetException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/repos/sa-data-perf/debug.py", line 14, in <module>
    print(jsf.generate())
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/parser.py", line 137, in generate
    return self.root.generate(context=self.context)
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/object.py", line 42, in generate
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/object.py", line 42, in <dictcomp>
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
  File "/root/repos/sa-data-perf/venv/lib/python3.10/site-packages/jsf/schema_types/object.py", line 42, in generate
    return {o.name: o.generate(context) for o in self.properties if self.should_keep(o.name)}
TypeError: 'NoneType' object is not iterable

My question

can jsf work with schema like given, this schema was generated by pydantic, i'm not sure which part cause this error, hope log more specifically to tell me which property cause this error

BUG: Unique items in array of dictionaries

Since current implementation is making use of sets in Python, dicts are not hashable, change would be needed to rectify this.

Example

"errors": {
            "type": "object",
            "properties": {
                "validationErrors": {
                    "type": "array",
                    "minItems": 0,
                    "maxItems": 2,
                    "uniqueItems": false,
                    "items": [
                        {
                            "type": "object",
                            "$state": {
                                "error": "lambda: random.choice([{'code':'3013','message':'Mandatory field is either Null or blank','field':'IDNumber'}, {'code':'2013','message':'Mandatory field is either Null or blank','field':'IDNumber'}])"
                            },
                            "properties": {
                                "code": {
                                    "type": "string",
                                    "description": "Error code from Digital gateway validation checks",
                                    "$provider": "lambda: state['validationErrors[0]']['error']['code']"
                                },
                                "message": {
                                    "type": "string",
                                    "description": "",
                                    "$provider": "lambda: state['validationErrors[0]']['error']['message']"
                                },
                                "field": {
                                    "type": "string",
                                    "description": "",
                                    "$provider": "lambda: state['validationErrors[0]']['error']['field']"
                                }
                            },
                            "required": ["code", "message", "field"]
                        }
                    ]
                }
            },
            "required": ["validationErrors"]
        }

Error when generating from schema that has "oneOf" property

If we are using a schema like this:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "urn://Media.schema.json",
  "title": "Media",
  "version": "0.0.1",
  "description": "This event represents a Media",
  "type": "object",
  "properties": {
    "envID": {
      "type": "string",
      "minLength": 10
    },
    "envTimestamp": {
      "type": "integer",
      "exclusiveMinimum": 0
    },
    "javaType": {
      "type": "string"
    },
    "mediaKey": {
      "type": "string",
      "minLength": 5,
      "maxLength": 64
    },
    "mediaType": {
      "type": "string",
      "enum": [
        "COVER"
      ]
    }
  },
  "required": [
    "envID",
    "envTimestamp",
    "mediaType"
  ],
  "additionalProperties": false,
  "oneOf": [
    {
      "properties": {
        "mediaType": {
          "const": "COVER"
        }
      },
      "required": [
        "javaType"
      ]
    }
  ]
}

When "mediaType" is "COVER" the variable "javaType" must be included always. This is not happening now

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.