pydantic / pydantic-core Goto Github PK
View Code? Open in Web Editor NEWCore validation logic for pydantic written in rust
License: MIT License
Core validation logic for pydantic written in rust
License: MIT License
stop using maybe_as_string
.
lax_dict
takes a try_instance
argument and can build a dict from a python object.
We need to decide when this should be used and when not, this also needs to be reflected in LookupKey
.
Presumably this should be a config setting? But if it's just a config setting, we can't easily have a from_orm
method (where this would apply).
Perhaps we're happy to have it set via config, then if from_orm = True
in config, Model.parse_obj(my_object)
would just work.
We should also use a consistent name, e.g. either from_orm
or try_instance
or something better, everywhere.
I would like to avoid runtime configuration options if possible.
@PrettyWood what do you think?
Also relates to #108.
I guess in python versions of to_py
we could somehow deep copy everything to match current pydantic behaviour.
Won't help performance but I think it's correct.
I really want pydantic-core to support wasm. This is mostly so that the examples in pydantic's docs can be edited and run in the browser, but also for wider use of pydantic.
As per PyO3/pyo3#2412 (comment), it looks like it should be possible.
But I'm not sure how to integrate that with maturin github actions. @messense any pointers? Or would you be willing to submit a PR?
Also, as well as getting wheels to build, what more do we need to do to get pydantic-core working with pyiodide?
and make it work properly with recursive references, ref #140
set
/ frozenset
to list
/ tuple
?Although this is not "loosing information", the result is not deterministic/repeatable.
E.g. if you have the Field Tuple[PositiveInt, NegativeInt, str]
then the input set {1, -1, 'a'}
will work sometimes, and fail sometimes - this is pretty confusing.
I think we should change this.
list
/ tuple
to set
/ frozenset
?Should we allow coercing a list to a set? In this case we are "loosing information" (e.g. order), however creating a set from a list is often desired - e.g. when parsing a format (yaml, toml etc.) that only has a list type.
I think we should not change this
dict_key
to set
/ frozenset
?Not that common, but we have it now and I think it kind of makes sense since dict_key
"feels like" (sorry to be fluffy) a set
.
I guess since dict_key
are ordered, it should be fine to coerce them to list
and tuple
too.
I guess as with currently, we should allow dict_values
to all these types too?
I think we should change this.
In pydantic V1 we allow converting a generator to any of these types.
I think we should allow converting a generator to a list or tuple, but not set or frozenset.
@PrettyWood @tiangolo thoughts?
In your presentation you talk about achieving a 12x performance improvement for validating a list of dicts with length of 100 elements, is this test consistently accomplishing the same numbers for bigger lists?
The other question that I have is regarding the decision to choosing Rust as the language for the core of pedantic, which was the criteria to choose Rust over other languages like C or C++ or even dotnet?
Just got this error when hitting ctrl+c while running SchemaValidator
- e.g. creating a validator.
^Cthread '<unnamed>' panicked at 'a Display implementation returned an error unexpectedly: Error', /rustc/90ca44752a79dd414d9a0ccf7a74533a99080988/library/alloc/src/string.rs:2478:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "/Users/samuel/code/pydantic-core/create_many.py", line 8, in <module>
v = SchemaValidator(
pyo3_runtime.PanicException: a Display implementation returned an error unexpectedly: Error
as per #173 (comment)
We need to build sdist on CI and include self_schema.py
.
As per #21 we need to make sure values are copied.
As @tiangolo points out at in pydantic/pydantic#4218 (comment) we need to be able to validate a model without relying on isinstance
.
I think we should therefore change how existing models are validated to effectively revalidate model.__dict__
. That should solving copying (of models at least) and avoid subclasses being validated as parent classes.
This might have some performance impact, but it'll be much smaller than a hack in python to work around it.
Documenting in person discussion.
It might make sense to have a "type": "field"
schema or something along this lines to collect options that apply to the field and not the type of the field, like a "not required"
optional that would leave the field unpopulated if it is not included.
strict_str
etc. before) and a general casing using a python set #31Currently I'm not clear where config is used, and what attributes are respected.
E.g. there are some properties that are used in string.rs
that are not in the python types.
It also has minimal tests, we need to test it properly - perhaps a separate test file to be clean.
on model / typed_dict.
As per PyO3/pyo3#2463, we could cache the PyString
value for short (length <63 say) strings to achieve a significant time saving.
Although it would save time on the specific case of building dicts with repeated inputs, I wonder how much time it would really save in the real world?
Definitely this is an optimisation that should be looked at after pydantic v2 is released.
If we do do it, I guess it should be configurable.
cache.rs
in orjson might be a useful starting point for this.
In _pydantic_core.pyi
.
Might also be worth an experiment to if making strict
a runtime switch negatively effects performance.
I realise I invited collaborators to from pydantic to this repo without an explanation of what's going on.
The plan is obviously to make this repo public, but I want to get the basic design solid before flipping the switch (maybe that's unnecessary, I don't know).
I'd love feedback on this idea in general, and specifics if possible.
I'm not "announcing" this yet, but feel free to discuss it with others if that helps.
The idea is not particularly secret after this but i'm hoping to build some suspense/fomo before going public.
If this can be done easily, it should all much faster cross platform builds.
E.g. like
from typing import Optional
from pydantic import BaseModel
from devtools import debug
class Branch(BaseModel):
name: str
sub_branch: Optional['Branch'] = None
b = Branch(name='main', sub_branch=Branch(name='sub'))
debug(b)
Hi, author of pydantic-yaml here. I have no idea about anything Rust-related, unfortunately, but hopefully this feature request will make sense in Python land.
I'm going off this slide in this presentation by @samuelcolvin, specifically:
We could add support for other formats (e.g. yaml, toml) the only side affect would be bigger binaries.
Here's a relevant discussion about "3rd party" deserialization from v1: pydantic/pydantic#3025
It would be great if pydantic-core
were built in a way where non-JSON formats could be added "on top" rather than necessarily being built into the core. I understand performance is a big question in this rewrite, so ideally these would be high-level interfaces that can be hacked in Python (or implemented in Rust/etc. for better performance).
From the examples available already, it's possible that such a feature could be quite simple on the pydantic-core
side - the 3rd party would create their own function a-la validate_json
, possibly just calling validate_python
. However, care would be needed on how format-specific details are sent between pydantic
and the implementation. In V1 this is done with the Config
class and special json_encoder/decoder
attributes, which have been a pain to re-implement for YAML properly (without way too much hackery).
Ideally for V2, this would be something more easily addable and configurable. The alternative would be to just implement TOML, YAML etc. directly in the binary (and I wouldn't have to keep supporting my project, ha!)
Thanks again for Pydantic!
Somehow I forgot timedelta
the work in speedate is done, just needs the type implementing.
I'm keen to "run onto the spike" and find any big potential performance improvements in pydantic-core while the API can be changed easily.
I'd therefore love anyone with experience of rust and/or pyo3 to have a look through the code and see if I'm doing anything dumb.
Particular concerns:
cast_as
vs. extract
" issues described in PyO3/pyo3#2278 was a bit scary as I only found the solution by chance, are there any other similar issues with pyo3?input
or parts of input
(in the case of a dict/list/tuple/set etc.) copied when it doesn't need to be?PyObject
instead of PyAny
or visa-versa and improve performance?ListInput
and DictInput
we do a totally unnecessary map
, is this avoidable? Is this having a performance impact? Is there another way to give a general interface to the underlying datatypes that's more performanceRwLock
that was causing the performance problems, but I managed to remove that (albeit in a slightly unsafe way) in #32 and it didn't make a difference. Is something else the problem? Could we remove Arc
completely?I'll add to this list if anything else comes to me.
More generally I wonder if there are performance improvements that I'm not even aware of? "What you don't know, you can't optimise"
Should be fairly easy to modify models (to be renamed) to support partial as well as default values.
For obvious reasons passing in a recursive dict causes a segfault. It's certainly user error, but it might be nice to return a Python RecursionError
instead of a segfault
from pydantic_core import SchemaValidator
schema = {"type": "union", "choices": []}
schema['choices'].append(schema)
SchemaValidator(schema)
This is a hangover from pydantic v1, should be removed.
Would be amazing if we could parse and validate JSON directly, without creating python objects, then validating them.
The basic idea would be to create traits to achieve all the conversions used here, then implement those traits for both serde types, and pyo3 types.
Then use those types instead of pyo3 types throughout validators.
If we did this, it also opens the door to using pydantic-core without python ๐ - e.g. in an entirely theoretical "Tydantic" typescript package.
Dict including:
Passed as kwarg to function
Great project!
I'm opening this one as a TODO
for the development of pydantic-core
since pydantic/pydantic#4273 was closed but @samuelcolvin mentioned it could be handled in v2, not in v1.
Thanks!
We need a way for errors raise in python to properly populate kind
, message
, context
, maybe even loc
.
Solutions:
ValueError
sgetattr
I guess we should do some profile to see which is fastest.
We could add another property to line errors with internal details on what went wrong.
E.g. info that we wouldn't want to show to end users but which might help a developer debugging the errors.
Example would be DateTimeObjectInvalid
, see #77.
and associated validator, schema etc.
@PrettyWood what do you think? The current name is not good, do you have better idea?
@adriangb you also mentioned this is confusing, idea?
As you may already know I love unions: smart, strict and tagged ones ๐
I would like to work on a PR to add this.
The proposed syntax would be
'type': 'model',
'fields': {
'pet': {
'schema': {
'type': 'union',
'tag': 'species',
'choices': [
{
'type': 'model',
'fields': {
'species': {'schema': {'type': 'literal', 'expected': ['cat']}},
'lives': {'schema': {'type': 'int'}, 'default': 9},
},
},
{
'type': 'model',
'fields': {
'species': {'schema': {'type': 'literal', 'expected': ['dog']}},
'barks': {'schema': {'type': 'bool'}},
},
},
],
}
},
},
I guess the UnionValidator
would become something like
pub struct UnionValidator {
choices: Vec<CombinedValidator>,
strict: bool,
tag: Option<Tag>
}
pub struct Tag {
name: String,
field_validator_mapping: HashMap<*const str, *const CombinedValidator>
}
What do you think @samuelcolvin?
@tiangolo asked about pypy support. @messense do you have input?
Looks like pyo3 does support pypy, see here and PyO3/rust-numpy#219.
But I know pydantic-core uses some non abi3 methods, maybe we use stuff that would cause problems with pypy, we should probably try it sooner rather than later.
Like the other sequence like validators, but keep the type.
What do we do about str
? I guess we have to allow it, but there are regularly scenarios where you want "any sequence other than a string".
TypeDict
Field
which allows the value to be omitted or the default to be used if an error occursmacro to call at the end of build to check no other keys have been used.
From in person discussion.
Initializing a NamedTuple fails because it's immutable at a low level (object.__setattr__
and such tricks won't work).
Is this something we want to support?
Should we have a field option to specify how the field should be set (__new__
or setattr)? May be related to #59
More changes after #185.
Also, pydantic/pydantic#4254.
We should add loc
to PydanticValueError
which gets appended to the error loc
Also add file_position
(tuple[int, int]
of (line, col)
), one day when we have a custom JSON parser we can populate this in pydantic-core, until then we just add it via PydanticValueError
.
file_position
will require some pretty output in error messages.
I think we should use "Input", not "Value", also go through the test.
The recursion guard is unable to catch the following which results in a seg-fault
@pytest.mark.skip(reason='This case causes a seg-fault since the recursion checker cannot detect the cycle')
def test_function_change_id():
def f(input_value, **kwargs):
return input_value + ' Changed'
v = SchemaValidator(
{
'choices': [
{
'type': 'function',
'mode': 'before',
'function': f,
'schema': {'schema_ref': 'root-schema', 'type': 'recursive-ref'},
},
'int',
],
'ref': 'root-schema',
'type': 'union',
}
)
with pytest.raises(ValidationError) as exc_info:
assert v.validate_python('input value') == 'input value Changed'
print(str(exc_info.value))
This is because the input is changing on each step, so the id
isn't found in the recursion guard lookup set.
I don't see how we can detect this without introducing a mini-stack, which would really harm performance.
I think we just put a note in the docs. Saying "if you do really dumb stuff, you can get the validator to recursive infinitely".
Hello,
Seeing that this is being built in Rust, do you reckon crates such as
@sharksforarms's deku
@jam1garner's binrw
.. can serve as a back-end, doing the declarative parsing work?
I seem to have had a mental lapse and forgotten about strict working properly on JSON types.
JSON Input
should match python e.g.:
Hi!
I have been a long time user of pydantic (great library btw) and have been following the development of the rust library.
I was wondering if it would make sense/would be possible to separate the python related code (use of Py* etc.) and rust implementation.
That way it'd be possible to use this great library not only in python code, but also port it to e.g. js or use for validation of rust code.
I'd be happy to take a stab at it and create a PR - if wanted.
E.g. we shouldn't have FloatGreaterThan
and IntGreaterThan
, we should just have one GreaterThan
type.
This is not going to be the release which pydantic V2 is released with, but we should get a proper release out to give a target for other work.
What needs fixing before we do that?
See pydantic/pydantic@c8ba8f1 for an explanation of usage.
I Implemented some of the plumbing for this early on, then forgot the porcelain. Needs completing and testing.
I guess we should also make sure the other kwargs to validation functions make sense at the same time.
I'm not sure if this would be possible (I'm guessing it's not), but it would be nice to be able to recursively refer to a parent validator without having to know a-priori if it will be recursive or not.
Currently, if you are parsing something like:
class Outer:
inner: List[Outer]
You would have to know that Outer
is recursive before you parse its fields.
Would it be possible to have an optional "id"
property on every validator that acts as the reference for recursive schemas instead of a special "recursive-container"
validator? So:
from typing import List
from pydantic_core import SchemaValidator
class Outer:
inner: "List[Outer]"
v = SchemaValidator(
{
"type": "model-class",
# id is optional, required for this to be usable as a recursive ref
# the value is arbitrary, the id of the type seems like a safe choice
"id": str(id(Outer)),
"schema": {
"type": "model",
"fields": {
"inner": {
"type": "list",
"items": {
"type": "recursive-ref",
"id": str(id(Outer)),
}
},
},
},
}
)
I read here that Pydantic v2 will have native json support....
however a lot of devs are using .yaml for configs (it is much readable for humans).
i susspect i will be able to load it as python object but Strict Mode can't be used and data goes from native code (probably c) to python and back to native code in rust. That is at least kind of stupid
I also susspect other people may want to add validation to other serialization formats (like bson,protobuff or what ever they need). Some of those people would like to do it in rust (idealy (runtime) plugable (not everyone will want everything) but i am not shure if it is easyly acheavable, but it is possible).
that could make pydantic-core serialization agnostic and still have same performance and validation would not just care how data got to it (and i undestand it has to have at least somewhat resembeling json structure and types)
Aplicability would be huge. Because a good developer should always write some validation and lot of time you are doing it whole by hand in some validate method writing checking ranges of numers making regex matchers to strings. or converting int/str to date object conversions... you get the idea and with pydantic it would be just much easier to do it exhaustively.
Those are just ideas and i wanted to share them publicly. Maybe a effort to split out serialization from validation would be to huge.
but that is to @samuelcolvin to decide
main thing is to not rush this... if this will make it in any time ( i am not saying it must be 2.0.0) i would be happy
We need a way to support positional arguments, this will be helpful for:
I'd like to reuse as much of the logic from TypedDictValidator
as possible, in some regards here we can learn from the logic of validate_arguments
in pydantic.
My proposal would be this:
TypedDictValidator
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.