Comments (2)
Hey @ryanhiebert -
Thanks so much for the thoughtful post. It's clear you have read through my documentation and you have hit the nail on the head, as it were, to the fundamental conflict in this library to-date.
A Brief History of Time
Typical began as a part of my first foray into typed Python. At the time, I was a new convert to not only type hints, but Python3. I was struggling with the concept of strongly typed code, but I wanted the guarantees it provided. Too much of my functions and methods at the time were devoted to boilerplate validation and coercion of inputs. Thus came the @typic.al
decorator - which was inspired by the attrs library's own "cute" aliases. Along the way I came up with the @typic.klass
decorator, which took the same ideas from @typic.al
and put them into a dataclass-style decorator.
Speaking frankly, these two features were definitely good learning experiences, but I regret them. You can see from the design of the library that my view on how to use type-hints changed from a means to save developers from lazily-typed upstream code to a means to describe protocols for serialization, deserialization, and runtime validation of types described by the Python type-system.
If you look at code written in the v1 era vs the v2 era, you can start to see this evolution.
This has to do with my own experience with SerDes libraries in statically-typed libraries like Java, Go, etc. I also gained a critical understanding of how to write strongly-typed Python and realized that the "magic" of auto-coercion was largely unnecessary if I was just very careful and explicit about the types I was passing around. Today, my production code is still a heavy user of typical, but basically only at network boundaries. Within my applications, my type-hints and mypy do the rest of the work and give me much greater peace of mind.
Moving Forward
I've been hard at work on v3 for the last 6 months. In v3, which you can take a look at here: https://github.com/seandstewart/typical/tree/v3-routine-factories, you can see I now view this library as a SerDes library first and foremost. I've given very little thought to those two areas of the typical API, but I'm a fan of your thinking and like the idea of essentially "quarantining" them behind a magic sub-package. They do have their use, especially in larger code-bases where a developer may have less control over how well-typed external callers may be, so I don't want to get rid of them entirely.
Some notable changes in v3:
- Limited code-gen, instead preferring explicitly-defined "routines" and closures. This will aid in debugging and make the library much less mysterious.
- Promoting the constraints engine to a "core" feature set.
- Turning off implicit coercion in the class decorator - users must opt-in to this behavior.
Things I've considered:
- Completely dropping the jsonschema generation.
- Completely dropping the class decorator.
Thusfar I have done neither. There is even a core schema
package for schema generation, which is built to be completely extensible and pluggable. I personally follow the schema-first approach in my own code, but I haven't moved forward with isolating/dropping schema generation because I feel it's a very popular feature and I'm unsure of the impact if it were to just go missing from the library. As for the class decorator... I personally know of a few heavy uses of it, but even there I've been discouraging its use for some time, pointing people instead to the Protocol or Functional API. Still, I'm wary of dropping it completely for fear of drastically increasing the pain to upgrade for the sake of what amounts to a personal style preference on my part.
Things You've Made Me Consider
I want to close this comment out by saying - your submission has opened my mind to a middle way. Typical can ship two isolated packages. The first can maintain the cutesy typic.
and contain the useful-yet-problematic feature-set (@typic.al
, @typic.klass
, maybe schema-gen?). The second could be your "serious business" package, beginning with typical
.
WRT "strict" mode - yes... it's honestly quite nasty to wrestle with, I'm honestly not even sure what typical looks like without it at this point! The constraints engine has its own limitations which make it less-than-desirable when it comes to using for SerDes. Perhaps the solution is to simply do away with the juggling.
When you invoke transmute(...)
, you are telling typical to take this input and make it the targeted output type. This is explicit and there's no reason to toggle the underlying behavior. When you invoke validate(...)
, you are telling typical to check if the input can be considered a member of the target type.
A major caveat: Currently, the constraints engine allows for validating mappings against user types (e.g., dataclasses). So validate(MyType, input_dict)
could succeed if the values in the input_dict are aligned with MyType
. I still think this is valid behavior because the structure of the input meets the requirements of the defined type. The problem with this is we lose a critical guarantee, which is that the validated input is not actually MyType
. One option could be to transmute
valid complex data-types, but that also breaks a guarantee: the output type is now different than the input type, which could be surprising. Additionally, it's computationally expensive to do both operations. So I've now thought myself into a recursive loop and I break out without a decision on how to handle it.
What do you think about all this?
from typical.
Wow, thank you for taking the time to respond so thoroughly to my inquiry!
Thank you for telling me a bit more about the history, and about how your thinking has changed since then. One thing I'll point out is that less-than-ideal APIs made while learning are inevitable and shouldn't be regretted. Instead, it is better to think about what the legacy and future of the API is, and how we can make those movements most effectively.
I think that JSON Schema generation is a really neat feature. I also, at least currently, define JSON schemas directly, but I see great independent value in being able to validate that the schemas match or are compatible. This is a challenging problem in its own right, dealing with what kinds of interfaces are breaking changes and which are not. But I can also see it being tangential to the focus of a small package. This is really the question: what is the scope of the package? How much is too much to expect to all be one well in the same package?
I agree that the serialization and deserialization aspect is the central aspect of Typical. I think this is necessary, because (a) it's a hard, large problem at the center of everything Typical does, and (b) as Python typing and other language features grow, I think that the explicit principles of Typical encourage you to actually discourage or even remove now-redundant interfaces, and that serialization and deserialization are the ones least likely to be soon added to the language.
I see, and I think you do as well, Serialization, validation, and coercion as different things. In the spirit of explicit being better than implicit, I think that it's wise to separate them as much as possible. You have some neat interfaces for the validation. It's neat how you're working them into types, and I wonder how much of that Python will do for itself in the long run. It sure seems like it's doing more and more.
The serialization piece is what I'm most focused on, followed by validation. Like you, I'm using this for network boundaries primarily. I think there is room for multiple approaches to all these problems, but its the first-class support of Python primitives that really drives me. I want to be able to start with native python features, sprinkle in some hints about how they should work in different contexts, and have the redundant parts of serialization be reduced and simplified to reduce human error.
I think that validation is best done in the destination type. In fact I'd probably define validation this way. Deserialization and coercion deal with putting things into the right types, while validation enforces further constraints. Your validation is interesting in that it often uses subclasses to implement them. In that sense, it rather blurs the line between serialization and validation. And I think that's actually a good thing. I expect that over time more and more validation will be able to be analyzed by type checkers. I suspect that your approach to validation is relatively less likely to stand the test of time than a strict serialization library, largely because new ways of writing these constraints are likely to be added to the type checker.
For deserialization, the rule that I want to enforce is that the type coming in matches the type that I expect the serializer to produce. Anything outside of that would fall under coercion, and I think can be left as a different concern.
A good many features of typical, due to its principles, are likely to be redundant and therefore counterproductive over time. It is wise to think about how even good features should come to be discouraged when language-preferred alternatives are available.
Ok, depending how in sync we are with those thoughts, here's what I might suggest:
- Decide what
typical
is and should be, and communicate that effectively. Perhaps its best, as its legacy, to leave it with its original purpose intact because the stability of what this package fundamentally means is most important. Or perhaps a change of focus, including breaking backward compatibility when needed, is a better fit for whattypical
is meant to be. Either way, document that in the form of a compatibility policy, so that (hopefully) people know what to expect. - Start with serialization at the center. This could be a new package distribution if we don't want to change typical too much.
- Build features that are reasonably likely to be deprecated or discouraged in the future as some form of extension modules. Perhaps as separate packages that are extras that can be installed. A well-designed plugin system can even allow us to support more non-core Python libraries.
Let's see if I can distill it down to a shorter call to action. If you agree with me that nailing a serialization and deserialization protocol and extensibility approach is critical, is it better to (a) do that under the typical
package name, or (b) create a new one? You've done a lot of work on this project already in your new branch, and I'm sure you've learned a ton, and I'd be interested to see what a super-minimal serialization library should look like. I'll admit, that PR was too big for me to attempt any kind of decent review.
It's scary, but I think if it were me, I'd document that there's going to be hard pivot in this focus of this package for the purpose of nailing down this core API.By using the typical
package instead of the cute typic
namespace, in a similar way to what attrs
did, we can maintain backward compatibility while we do this work. We can go slower in that core namespace, and nail the serialization and deserialization generic API.
JSON Schema feels like it would, ultimately, fit really well as a separate extension package. Hugely useful, but tangential to the core mission. Much of the validation feels like it would fit in a similar category as well. But that I'm less sure of. Validation and JSON schema feel like they're likely to be more tightly coupled together, and less generic, just because the space of solutions already becomes very large, and being generic at that level is probably too much work.
OK, time for you to gather your thoughts before I keep going. I keep getting the feeling that I'm being too handwavy about how this can work, and that I'm missing important details like how the serialization format (e.g. JSON) is a critical piece of knowledge to how the serialization and deserialization work, and that even that might ultimately not be something we can unify into a single protocol effectively.
from typical.
Related Issues (20)
- AttributeError when the annotation contains Callable
- Support new type union operator `|` in Python3.10
- Still not working with PEP 604 Union Operators HOT 5
- Unpredictable behavior with Union[pd.Series, pd.DataFrame] HOT 1
- float should not been converted to int with Union[float, int] HOT 3
- typic.field not compatible with old version, parameter `default` should be postional
- Wrong behavior when Union with constrained types HOT 8
- Couldn't evaluate type ForwardRef pandas DataFrame HOT 2
- callable instance should be the treated as a function or class? HOT 1
- Not compatible with pandas v1.3.3 HOT 2
- add parameter to typic.klass
- Transmute is broken with dicts of dataclasses with default values HOT 2
- TypeError if annotated typing.Deque[int]
- issubclass error after version 2.6.4
- tojson and primitive fails when class is derived
- Decimal serialization and deserialization HOT 2
- Use orjson for other libraries benchmark
- Version conflicts installed the pinned typing-extensions HOT 2
- Is there a way to use choices with values and labels when generating schemas? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from typical.