Comments (5)
Proposal 1:
Sure, props can be included, but your proposed solution has an issue in that maps are not ordered and so the output of the JSON may not be constant (comparable as a string), which would not be great.
Proposal 2:
The way ocf.NewEncoder() handles the schema is a little confusing:
It is parsed as it needs a schema object to encode and to know the passed string is actually a schema. It is re-encoded to be sure there is nothing random in the initial string that could be a security issue.
.String() isn't the same as .MarshalJSON(), and I'm not sure what its purpose is. It seems the convention in this package is to look like JSON, but not completely.
.String()
returns the canonical schema which is needed for schema resolution. While not specifically stated, it is the minimal form of schema needed for reading, which is why it was chosen.
I have no issue with using the JSON form as long as it is constant, but to not make this a breaking change, I think it should be configurable.
from avro.
Everything you said makes sense. Thank you for the link about "Parsing Canonical Form", it clarifies a lot. I understand why the JSON and canonical forms are not treated the same way.
your proposed solution has an issue in that maps are not ordered and so the output of the JSON may not be constant (comparable as a string), which would not be great.
If we continue using Parsing Canonical Form in ocf.NewEncoder()
, then does deterministic JSON key ordering matter in the JSON form? If the difference is in unit tests, then I can adjust those.
The way ocf.NewEncoder() handles the schema is a little confusing:
It is parsed as it needs a schema object to encode and to know the passed string is actually a schema. It is re-encoded to be sure there is nothing random in the initial string that could be a security issue.
Makes sense.
.String() returns the canonical schema which is needed for schema resolution. While not specifically stated, it is the minimal form of schema needed for reading, which is why it was chosen.
This is interesting because the Apache Iceberg reference implementation (Java) generates Avro files that include doc
attributes, which are explicitly named as to be stripped in Transforming into Parsing Canonical Form.
New Proposal 1
In JSON form, always emit all attributes, including properties, doc
, aliases
.
Do not worry about JSON field ordering in JSON form.
New Proposal 2
In Parsing Canonical Form, add properties to the output, and make the property output optional, defaulting to not including properties.
from avro.
In JSON form, always emit all attributes, including properties, doc, aliases.
This makes sense. I would still prefer this to be ordered.
In Parsing Canonical Form, add properties to the output
This is a no go. From the spec, this form cannot contain other fields other than those specified in [STRIP]
. It also has implications on the schema fingerprint.
from avro.
I'll work on the JSON form proposal, with deterministic ordering.
from avro.
The merged PR is enough to satisfy my needs, as I've written code to write files without the ocf package. Thanks for your help, @nrwiersma
from avro.
Related Issues (20)
- Support TextMarshaler/TextUnmarshaler for map keys HOT 2
- need an method generate schema from struct HOT 7
- Bug for tree typed schema HOT 2
- [BUG] array schema cannot be correctly parsed HOT 4
- Local timestamp logical types HOT 1
- Add support for Zstandard compression
- Enum schema evolution for missing value in reader schema but with default HOT 3
- Question about max byte slice HOT 6
- Performance degradation in v2.19.0 HOT 4
- Wrong decoding of nested map HOT 1
- Infinite loop parsing recursive array type HOT 1
- "unknown union type long" error HOT 2
- Support nested array of record HOT 3
- Bug with encoding union + fixed + decimal HOT 6
- Problem with empty slices vs nil. HOT 8
- Decode Array - Panic: Allocation size out of range Error HOT 1
- I struggle to have more than one "registered" type in the same field of a schema HOT 9
- Reasoning behind swallowing EOF errors? HOT 2
- Decoding of map[string]any behavior changed HOT 3
- [avrogen] tags - having omitEmpty in the json generated object HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from avro.