confluent: bump twmb/avro to v1.6.0, generalize avro reference hydration#4247
confluent: bump twmb/avro to v1.6.0, generalize avro reference hydration#4247
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Commits Review LGTM |
|
I wonder do we want to bump deps twice in one patch. |
I can squash this if preferable! I'm trying not to release so much; migrating iceberg-go also popped up a few small edge cases I needed to improve (and I only realized across a patch and then a minor) |
resolveAvroReferences previously only handled one specific schema
shape: a root that parses as a JSON array of reference names
(`["Ref1", "Ref2"]`), the Confluent Schema Registry convention for
"subject is a union of other subjects". Any other shape — a record
whose field type references another subject, an array whose items
reference another subject, a map whose values reference another
subject, etc. — produced the misleading error "parsing root schema
as enum" because it tried to json.Unmarshal the root as `[]string`.
It also had two real bugs:
- On WalkReferences failure, the function returned `("", nil)`,
silently swallowing the walk error. Callers then parsed an empty
string and got an unhelpful downstream parse error.
- Transitive references were not inlined. If the root referenced
Foo and Foo referenced Bar, WalkReferences collected both into
refsMap (topological order), but the hand-rolled hydrator only
substituted names appearing at the root array level. Bar remained
as a string inside the inlined Foo, unresolved.
Replaces the hand-rolled hydrator with a recursive walker
(hydrateAvroRefs) that traverses type positions throughout the
schema tree — record field types, array items, map values, union
branches — and inlines any reference name it finds. Non-type-position
strings (name, namespace, doc, aliases, enum symbols) that happen to
match a reference name are left alone.
Each named type is inlined at most once per walk; subsequent
references to the same name are left as string name references so
Avro's one-definition-many-references semantics are preserved. This
correctly handles self-referential types (linked-list-style), mutual
recursion across subjects, and shared subgraphs.
11 new TestHydrateAvroRefs subtests cover each of these cases plus a
regression for the legacy Confluent union-of-names pattern.
Bumps twmb/avro from v1.3.4 to v1.6.0. v1.5.0 changed decimal
decoding to return *big.Rat instead of json.Number for *any targets;
preserveLogicalTypeOpts gets a decimal CustomType to convert back to
json.Number for the SetStructuredMut path. v1.6.0 adds
ocf.WithReaderSchemaFunc, aligns null-decode semantics with
encoding/json/v2, and tightens numeric-decode overflow handling —
none of these v1.6.0 changes affect connect's avro usage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0b5f172 to
838f442
Compare
|
Commits Review LGTM |
Summary
Three changes:
1. Bump
github.com/twmb/avroto v1.6.0v1.5.0 decodes decimal logical types to
*big.Ratinstead ofjson.Numberfor*anytargets, matching hamba/avro and linkedin/goavro.It also exports
RatFromBytesfor CustomType callbacks and fixesEncodeJSON
json.Numbercoercion bugs for int/long schemas.2. Add decimal CustomType for preserve_logical_types path
v1.5.0's
*big.Ratdecimal type can't pass throughSetStructuredMut(which uses
json.Marshal, and*big.Ratdoesn't implementjson.Marshaler). Added a decimalCustomTypeinpreserveLogicalTypeOptsthat converts the raw[]bytetojson.Numberviaavro.RatFromBytesat decode time.3. Resolve avro references for arbitrary schema shapes
resolveAvroReferencespreviously only handled one specific schemashape: a root that parses as a JSON array of reference names
(
["Ref1", "Ref2"]), the Confluent Schema Registry convention for"subject is a union of other subjects". Any other shape — a record
whose field type references another subject, an array whose items
reference another subject, etc. — produced the misleading error
"parsing root schema as enum".It also had two bugs:
returned
("", nil), silently dropping the error.and Foo referenced Bar, Bar remained unresolved inside the inlined Foo.
Replaced the hand-rolled hydrator with a recursive walker
hydrateAvroRefsthat traverses type positions throughout the schematree and inlines any reference name it finds. Each named type is
inlined at most once per walk; subsequent references are left as string
name references.
Test plan
go test ./internal/impl/confluent/...passesgo test ./internal/impl/avro/...passesTestAvroReferences(legacy shape) still passesTestHydrateAvroRefssubtests covering each supported shape🤖 Generated with Claude Code