Skip to content

confluent: bump twmb/avro to v1.6.0, generalize avro reference hydration#4247

Open
twmb wants to merge 1 commit intomainfrom
twmb/bump-avro-v1.4.1
Open

confluent: bump twmb/avro to v1.6.0, generalize avro reference hydration#4247
twmb wants to merge 1 commit intomainfrom
twmb/bump-avro-v1.4.1

Conversation

@twmb
Copy link
Copy Markdown
Contributor

@twmb twmb commented Apr 11, 2026

Summary

Three changes:

1. Bump github.com/twmb/avro to v1.6.0

v1.5.0 decodes decimal logical types to *big.Rat instead of
json.Number for *any targets, matching hamba/avro and linkedin/goavro.
It also exports RatFromBytes for CustomType callbacks and fixes
EncodeJSON json.Number coercion bugs for int/long schemas.

2. Add decimal CustomType for preserve_logical_types path

v1.5.0's *big.Rat decimal type can't pass through SetStructuredMut
(which uses json.Marshal, and *big.Rat doesn't implement
json.Marshaler). Added a decimal CustomType in
preserveLogicalTypeOpts that converts the raw []byte to
json.Number via avro.RatFromBytes at decode time.

3. Resolve avro references for arbitrary schema shapes

resolveAvroReferences previously only handled one specific schema
shape: a root that parses as a JSON array of reference names
(["Ref1", "Ref2"]), the Confluent Schema Registry convention for
"subject is a union of other subjects". Any other shape — a record
whose field type references another subject, an array whose items
reference another subject, etc. — produced the misleading error
"parsing root schema as enum".

It also had two bugs:

  • Swallowed WalkReferences error. On walk failure, the function
    returned ("", nil), silently dropping the error.
  • Transitive references not inlined. If the root referenced Foo
    and Foo referenced Bar, Bar remained unresolved inside the inlined Foo.

Replaced the hand-rolled hydrator with a recursive walker
hydrateAvroRefs that traverses type positions throughout the schema
tree and inlines any reference name it finds. Each named type is
inlined at most once per walk; subsequent references are left as string
name references.

Test plan

  • go test ./internal/impl/confluent/... passes
  • go test ./internal/impl/avro/... passes
  • Existing TestAvroReferences (legacy shape) still passes
  • 11 new TestHydrateAvroRefs subtests covering each supported shape

🤖 Generated with Claude Code

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@twmb twmb changed the title chore: bump twmb/avro to v1.4.1 chore: bump twmb/avro to v1.4.1; resolve avro references for arbitrary schema shapes Apr 11, 2026
@twmb twmb changed the title chore: bump twmb/avro to v1.4.1; resolve avro references for arbitrary schema shapes confluent: resolve avro references for arbitrary schema shapes, bump twmb/avro to v1.4.1 Apr 11, 2026
@twmb twmb changed the title confluent: resolve avro references for arbitrary schema shapes, bump twmb/avro to v1.4.1 confluent: generalize avro reference hydration Apr 11, 2026
@twmb twmb changed the title confluent: generalize avro reference hydration confluent: bump twmb/avro to v1.5.0, generalize avro reference hydration Apr 13, 2026
@claude
Copy link
Copy Markdown

claude bot commented Apr 13, 2026

Commits
LGTM

Review
Three well-scoped commits: a dependency bump (twmb/avro v1.3.4 → v1.5.0), a rewrite of resolveAvroReferences with a recursive schema walker (hydrateAvroRefs) that correctly handles record field types, array items, map values, union branches, transitive/self-referential references, and shared subgraphs, plus a decimal CustomType to maintain SetStructuredMut compatibility after the library's *big.Rat change. The old code only handled the Confluent union-of-names root shape and silently swallowed WalkReferences errors — both bugs are fixed. 11 new subtests cover each schema shape and edge case.

LGTM

@mmatczuk
Copy link
Copy Markdown
Contributor

I wonder do we want to bump deps twice in one patch.

@twmb
Copy link
Copy Markdown
Contributor Author

twmb commented Apr 14, 2026

I wonder do we want to bump deps twice in one patch.

I can squash this if preferable! I'm trying not to release so much; migrating iceberg-go also popped up a few small edge cases I needed to improve (and I only realized across a patch and then a minor)

resolveAvroReferences previously only handled one specific schema
shape: a root that parses as a JSON array of reference names
(`["Ref1", "Ref2"]`), the Confluent Schema Registry convention for
"subject is a union of other subjects". Any other shape — a record
whose field type references another subject, an array whose items
reference another subject, a map whose values reference another
subject, etc. — produced the misleading error "parsing root schema
as enum" because it tried to json.Unmarshal the root as `[]string`.

It also had two real bugs:

- On WalkReferences failure, the function returned `("", nil)`,
  silently swallowing the walk error. Callers then parsed an empty
  string and got an unhelpful downstream parse error.

- Transitive references were not inlined. If the root referenced
  Foo and Foo referenced Bar, WalkReferences collected both into
  refsMap (topological order), but the hand-rolled hydrator only
  substituted names appearing at the root array level. Bar remained
  as a string inside the inlined Foo, unresolved.

Replaces the hand-rolled hydrator with a recursive walker
(hydrateAvroRefs) that traverses type positions throughout the
schema tree — record field types, array items, map values, union
branches — and inlines any reference name it finds. Non-type-position
strings (name, namespace, doc, aliases, enum symbols) that happen to
match a reference name are left alone.

Each named type is inlined at most once per walk; subsequent
references to the same name are left as string name references so
Avro's one-definition-many-references semantics are preserved. This
correctly handles self-referential types (linked-list-style), mutual
recursion across subjects, and shared subgraphs.

11 new TestHydrateAvroRefs subtests cover each of these cases plus a
regression for the legacy Confluent union-of-names pattern.

Bumps twmb/avro from v1.3.4 to v1.6.0. v1.5.0 changed decimal
decoding to return *big.Rat instead of json.Number for *any targets;
preserveLogicalTypeOpts gets a decimal CustomType to convert back to
json.Number for the SetStructuredMut path. v1.6.0 adds
ocf.WithReaderSchemaFunc, aligns null-decode semantics with
encoding/json/v2, and tightens numeric-decode overflow handling —
none of these v1.6.0 changes affect connect's avro usage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@twmb twmb force-pushed the twmb/bump-avro-v1.4.1 branch from 0b5f172 to 838f442 Compare April 14, 2026 20:54
@claude
Copy link
Copy Markdown

claude bot commented Apr 14, 2026

Commits
LGTM

Review
Bumps twmb/avro to v1.6.0 and replaces the flat array-only reference hydrator with a recursive walker (hydrateAvroRefs) that correctly handles record field types, array items, map values, and union branches. Fixes three real bugs: swallowed WalkReferences error, refsMap population on mapSchema failure, and missing transitive reference inlining. Adds a decimal CustomType to handle the v1.5.0 breaking change (*big.Ratjson.Number). Comprehensive test coverage with 11 subtests including self-referential types, shared subgraphs, and non-substitution of non-type-position strings.

LGTM

@twmb twmb changed the title confluent: bump twmb/avro to v1.5.0, generalize avro reference hydration confluent: bump twmb/avro to v1.6.0, generalize avro reference hydration Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants