Giter Club home page Giter Club logo

Comments (2)

seandstewart avatar seandstewart commented on June 26, 2024

Hey @ryanhiebert -

Thanks so much for the thoughtful post. It's clear you have read through my documentation and you have hit the nail on the head, as it were, to the fundamental conflict in this library to-date.

A Brief History of Time

Typical began as a part of my first foray into typed Python. At the time, I was a new convert to not only type hints, but Python3. I was struggling with the concept of strongly typed code, but I wanted the guarantees it provided. Too much of my functions and methods at the time were devoted to boilerplate validation and coercion of inputs. Thus came the @typic.al decorator - which was inspired by the attrs library's own "cute" aliases. Along the way I came up with the @typic.klass decorator, which took the same ideas from @typic.al and put them into a dataclass-style decorator.

Speaking frankly, these two features were definitely good learning experiences, but I regret them. You can see from the design of the library that my view on how to use type-hints changed from a means to save developers from lazily-typed upstream code to a means to describe protocols for serialization, deserialization, and runtime validation of types described by the Python type-system.

If you look at code written in the v1 era vs the v2 era, you can start to see this evolution.

This has to do with my own experience with SerDes libraries in statically-typed libraries like Java, Go, etc. I also gained a critical understanding of how to write strongly-typed Python and realized that the "magic" of auto-coercion was largely unnecessary if I was just very careful and explicit about the types I was passing around. Today, my production code is still a heavy user of typical, but basically only at network boundaries. Within my applications, my type-hints and mypy do the rest of the work and give me much greater peace of mind.

Moving Forward

I've been hard at work on v3 for the last 6 months. In v3, which you can take a look at here: https://github.com/seandstewart/typical/tree/v3-routine-factories, you can see I now view this library as a SerDes library first and foremost. I've given very little thought to those two areas of the typical API, but I'm a fan of your thinking and like the idea of essentially "quarantining" them behind a magic sub-package. They do have their use, especially in larger code-bases where a developer may have less control over how well-typed external callers may be, so I don't want to get rid of them entirely.

Some notable changes in v3:

  1. Limited code-gen, instead preferring explicitly-defined "routines" and closures. This will aid in debugging and make the library much less mysterious.
  2. Promoting the constraints engine to a "core" feature set.
  3. Turning off implicit coercion in the class decorator - users must opt-in to this behavior.

Things I've considered:

  1. Completely dropping the jsonschema generation.
  2. Completely dropping the class decorator.

Thusfar I have done neither. There is even a core schema package for schema generation, which is built to be completely extensible and pluggable. I personally follow the schema-first approach in my own code, but I haven't moved forward with isolating/dropping schema generation because I feel it's a very popular feature and I'm unsure of the impact if it were to just go missing from the library. As for the class decorator... I personally know of a few heavy uses of it, but even there I've been discouraging its use for some time, pointing people instead to the Protocol or Functional API. Still, I'm wary of dropping it completely for fear of drastically increasing the pain to upgrade for the sake of what amounts to a personal style preference on my part.

Things You've Made Me Consider

I want to close this comment out by saying - your submission has opened my mind to a middle way. Typical can ship two isolated packages. The first can maintain the cutesy typic. and contain the useful-yet-problematic feature-set (@typic.al, @typic.klass, maybe schema-gen?). The second could be your "serious business" package, beginning with typical.

WRT "strict" mode - yes... it's honestly quite nasty to wrestle with, I'm honestly not even sure what typical looks like without it at this point! The constraints engine has its own limitations which make it less-than-desirable when it comes to using for SerDes. Perhaps the solution is to simply do away with the juggling.

When you invoke transmute(...), you are telling typical to take this input and make it the targeted output type. This is explicit and there's no reason to toggle the underlying behavior. When you invoke validate(...), you are telling typical to check if the input can be considered a member of the target type.

A major caveat: Currently, the constraints engine allows for validating mappings against user types (e.g., dataclasses). So validate(MyType, input_dict) could succeed if the values in the input_dict are aligned with MyType. I still think this is valid behavior because the structure of the input meets the requirements of the defined type. The problem with this is we lose a critical guarantee, which is that the validated input is not actually MyType. One option could be to transmute valid complex data-types, but that also breaks a guarantee: the output type is now different than the input type, which could be surprising. Additionally, it's computationally expensive to do both operations. So I've now thought myself into a recursive loop and I break out without a decision on how to handle it.

What do you think about all this?

from typical.

ryanhiebert avatar ryanhiebert commented on June 26, 2024

Wow, thank you for taking the time to respond so thoroughly to my inquiry!

Thank you for telling me a bit more about the history, and about how your thinking has changed since then. One thing I'll point out is that less-than-ideal APIs made while learning are inevitable and shouldn't be regretted. Instead, it is better to think about what the legacy and future of the API is, and how we can make those movements most effectively.

I think that JSON Schema generation is a really neat feature. I also, at least currently, define JSON schemas directly, but I see great independent value in being able to validate that the schemas match or are compatible. This is a challenging problem in its own right, dealing with what kinds of interfaces are breaking changes and which are not. But I can also see it being tangential to the focus of a small package. This is really the question: what is the scope of the package? How much is too much to expect to all be one well in the same package?


I agree that the serialization and deserialization aspect is the central aspect of Typical. I think this is necessary, because (a) it's a hard, large problem at the center of everything Typical does, and (b) as Python typing and other language features grow, I think that the explicit principles of Typical encourage you to actually discourage or even remove now-redundant interfaces, and that serialization and deserialization are the ones least likely to be soon added to the language.

I see, and I think you do as well, Serialization, validation, and coercion as different things. In the spirit of explicit being better than implicit, I think that it's wise to separate them as much as possible. You have some neat interfaces for the validation. It's neat how you're working them into types, and I wonder how much of that Python will do for itself in the long run. It sure seems like it's doing more and more.

The serialization piece is what I'm most focused on, followed by validation. Like you, I'm using this for network boundaries primarily. I think there is room for multiple approaches to all these problems, but its the first-class support of Python primitives that really drives me. I want to be able to start with native python features, sprinkle in some hints about how they should work in different contexts, and have the redundant parts of serialization be reduced and simplified to reduce human error.

I think that validation is best done in the destination type. In fact I'd probably define validation this way. Deserialization and coercion deal with putting things into the right types, while validation enforces further constraints. Your validation is interesting in that it often uses subclasses to implement them. In that sense, it rather blurs the line between serialization and validation. And I think that's actually a good thing. I expect that over time more and more validation will be able to be analyzed by type checkers. I suspect that your approach to validation is relatively less likely to stand the test of time than a strict serialization library, largely because new ways of writing these constraints are likely to be added to the type checker.

For deserialization, the rule that I want to enforce is that the type coming in matches the type that I expect the serializer to produce. Anything outside of that would fall under coercion, and I think can be left as a different concern.

A good many features of typical, due to its principles, are likely to be redundant and therefore counterproductive over time. It is wise to think about how even good features should come to be discouraged when language-preferred alternatives are available.


Ok, depending how in sync we are with those thoughts, here's what I might suggest:

  1. Decide what typical is and should be, and communicate that effectively. Perhaps its best, as its legacy, to leave it with its original purpose intact because the stability of what this package fundamentally means is most important. Or perhaps a change of focus, including breaking backward compatibility when needed, is a better fit for what typical is meant to be. Either way, document that in the form of a compatibility policy, so that (hopefully) people know what to expect.
  2. Start with serialization at the center. This could be a new package distribution if we don't want to change typical too much.
  3. Build features that are reasonably likely to be deprecated or discouraged in the future as some form of extension modules. Perhaps as separate packages that are extras that can be installed. A well-designed plugin system can even allow us to support more non-core Python libraries.

Let's see if I can distill it down to a shorter call to action. If you agree with me that nailing a serialization and deserialization protocol and extensibility approach is critical, is it better to (a) do that under the typical package name, or (b) create a new one? You've done a lot of work on this project already in your new branch, and I'm sure you've learned a ton, and I'd be interested to see what a super-minimal serialization library should look like. I'll admit, that PR was too big for me to attempt any kind of decent review.

It's scary, but I think if it were me, I'd document that there's going to be hard pivot in this focus of this package for the purpose of nailing down this core API.By using the typical package instead of the cute typic namespace, in a similar way to what attrs did, we can maintain backward compatibility while we do this work. We can go slower in that core namespace, and nail the serialization and deserialization generic API.

JSON Schema feels like it would, ultimately, fit really well as a separate extension package. Hugely useful, but tangential to the core mission. Much of the validation feels like it would fit in a similar category as well. But that I'm less sure of. Validation and JSON schema feel like they're likely to be more tightly coupled together, and less generic, just because the space of solutions already becomes very large, and being generic at that level is probably too much work.


OK, time for you to gather your thoughts before I keep going. I keep getting the feeling that I'm being too handwavy about how this can work, and that I'm missing important details like how the serialization format (e.g. JSON) is a critical piece of knowledge to how the serialization and deserialization work, and that even that might ultimately not be something we can unify into a single protocol effectively.

from typical.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.