Comments (4)
Handling unspecified extensions to existing Lexicons is not very flushed out, both in our implementations and in the specifications. It should all come together at the protocol level, but how to expose things well in other languages, especially strongly typed, isn't totally clear. And there are sharp edges around decoding/recording records.
Likewise what it looks like to work with records (or other data, in JSON or DAG-CBOR) for which a Lexicon is not available. There are restrictions on what that data can look like: records are always supposed to have $type
for example, and Links (CIDs) and binary data ($bytes
) are encoded a particular way. But it isn't very clear if folks are expected to parse and enforce those constraints rigorously, and if so in what situations.
We do want it to be possible to parse through data with an unknown schema and identity things like blobs. Moderation tooling can use this to do things like extract and label blobs even if the application lexicon isn't known. When we added self-labels recently, we also did that in a way to ensure that the $type
is always set, to make it possible to parse and extract labels, even if the lexicon isn't known. This is all somewhat unexplored territory though.
For unions, yes, open unions are very flexible. Implementations are only really expected to handle the enumerated (known) types, and just "not fail" when they encounter an unknown type in that position. This is distinct from closed unions, where an unknown type would be an error. IIRC there might be ambiguity about whether literals are allowed in unions, because there is no $type
field.
from atproto.
Had not seen python-libipld
, using Rust, interesting! Hope it will be possible to do a pure-python implementation of some kind, but using a hardened/safe/fast existing library for (DAG-)CBOR and CAR files makes a lot of sense.
Thanks for these notes, will review and fold this in to our test vectors.
from atproto.
Hi @bnewbold! That's a brilliant idea! To be honest, I invented something similar for my unit tests.
I have a data collector: https://github.com/MarshalX/atproto/blob/main/tests/models/fetch_test_data.py
And the saved data: https://github.com/MarshalX/atproto/tree/main/tests/models/test_data
As you can see, the most problematic edge cases are about custom lexicon and parsing. For example parsing of literals and Union types. It will be awesome to have only one database with the whole test data!
The test data for the CAR file will be useful for my python-libipld project.
Some bugs that I have with parsing and it's covered by unit tests now:
- Parsing of tokens #128
- Parsing of similar union types (and types that look the same except the class name and $type): #129
- Fallbacking to the dictionary in case of custom record. For example, post records with a custom "via" field from third-party clients
Also, I am confused about this and idk is it a reproducible thing. Any Union type could be any object with $type and fields: Code ref: https://github.com/bluesky-social/atproto/blob/b01e47b61730d05a780f7a42667b91ccaa192e8e/packages/lex-cli/src/codegen/lex-gen.ts#L325. It will be awesome to cover this case with the test data too.
from atproto.
@bnewbold thank you for your explanation!
Had not seen
python-libipld
, using Rust, interesting! Hope it will be possible to do a pure-python implementation of some kind, but using a hardened/safe/fast existing library for (DAG-)CBOR and CAR files makes a lot of sense.Thanks for these notes, will review and fold this in to our test vectors.
Pure Python implementation exists but is abandoned and slow as hell. That's why I moved to Rust. Pls check the recent performance boost update: https://github.com/MarshalX/atproto/releases/tag/v0.0.26
from atproto.
Related Issues (20)
- Make models less strict HOT 4
- get_suggested_follows_by_actor fails with ValidationError for Response HOT 1
- Uploading an image as a blob to be used as a card image doesn't work ! HOT 1
- Python Version Depency Issue HOT 2
- Would you recommend any specific gunicorn settings for a feed? HOT 1
- AtUri.from_str() returns invalid host for some AT uri's (Fix included) HOT 2
- Implement autogenerated Record Namespaces HOT 1
- Add the ability to submit posts that include labels
- Delete deprecated "subject" argument of .like() and .repost() methods
- Delete deprecated record models called "Main" instead of "Record"
- Auth token handling improvements HOT 10
- Delete deprecated SessionString class
- decode_dag_multi does not decode fully HOT 4
- Misspelling in get_author_feed HOT 3
- Subscribing to feed? HOT 4
- Parsing Alt Text HOT 2
- Add support for event stream HOT 8
- get_blob errors on redirects HOT 2
- High memory usage: from atproto import Client HOT 3
- Failing on authentication HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from atproto.