Skip to content

Conversation

@hmit
Copy link
Contributor

@hmit hmit commented Apr 8, 2025

What is the purpose of the change

This pull request improves deserialization of data when using default SpecificData instance.

Why?
Often times, SpecificDatumReader and ReflectDatumReader are configured using the default SpecificData instance. This leads to deserialization error when reading fields with union type and inner logical type (see test-case).

Deserialization works correctly when using the custom SpecificData instance stored in the MODEL$ field of the schema class that extends SpecificRecord. The custom SpecificData has the necessary conversions stored to handle fields with union and logical types. The issue has been reported multiple times, notably in [AVRO-3989](linked JIRA).

Verifying this change

This change added tests and can be verified as follows:

  • The added test passes when the code is present but fails when the code is removed

Documentation

  • Does this pull request introduce a new feature? (no)

@github-actions github-actions bot added the Java Pull Requests for Java binding label Apr 8, 2025
@hmit
Copy link
Contributor Author

hmit commented May 23, 2025

CC: @Fokko I'd appreciate it if you could look at this PR and leave any feedback for me! Also, apologies for the 1:1 ping in advance.

@Test
public void testUnionWithLogicalType() throws IOException {
File file = new File(DIR.getPath(), "testSpecificDatumReaderDefaultCtorWithOptionalLogicalType");
Schema s1 = new Schema.Parser().parse("{\"type\":\"record\",\"name\":\"Bar\",\"namespace\":\"foo\","
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason to use duplicate the schema in this string? Or is it sufficient to read the schema from fooBar.avsc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I addressed it in this update.

@opwvhk
Copy link
Contributor

opwvhk commented Jun 10, 2025

Hi @hmit,
What's your reason to include the change only here, and not (also) in GenericData and ReflectData? Arguably, both of these could benefit from the change as well.

@hmit
Copy link
Contributor Author

hmit commented Jun 10, 2025

Hi @hmit, What's your reason to include the change only here, and not (also) in GenericData and ReflectData? Arguably, both of these could benefit from the change as well.

Adding it to ReflectData is a good idea, I'll do that. However, adding it to GenericData is not necessary IMO since there are no logical type conversions involved.

Thanks @opwvhk for taking a look at this PR. I appreciate it!

@opwvhk
Copy link
Contributor

opwvhk commented Jun 11, 2025

However, adding it to GenericData is not necessary IMO since there are no logical type conversions involved.

There are if the schema contains logical types: the conversions are used to determine the actual value, regardless of whether there is a typed field for the value to go into.

@opwvhk
Copy link
Contributor

opwvhk commented Jun 11, 2025

Another thing to consider (just remembered): starting with version 1.11.4 and 1.12.0, conversions are loaded using a ServiceLoader. We could also implement this PR by adding a file META-INF/services/org.apache.avro.Conversion, containing:

org.apache.avro.Conversions.UUIDConversion
org.apache.avro.Conversions.DecimalConversion
org.apache.avro.Conversions.BigDecimalConversion
org.apache.avro.data.TimeConversions.DateConversion
org.apache.avro.data.TimeConversions.TimestampMillisConversion
org.apache.avro.data.TimeConversions.TimestampMicrosConversion
org.apache.avro.data.TimeConversions.TimestampNanosConversion
org.apache.avro.data.TimeConversions.LocalTimestampMillisConversion
org.apache.avro.data.TimeConversions.LocalTimestampMicrosConversion
org.apache.avro.data.TimeConversions.LocalTimestampNanosConversion

@hmit
Copy link
Contributor Author

hmit commented Jun 11, 2025

META-INF/services/org.apache.avro.Conversion
I went down this approach, however, found that ServiceLoader based setting applies to all instances of ReflectData. I intend it to apply to the default INSTANCE only and not any other instances created elsewhere.
For eg: this test fails when moved to ServiceLoader - TestTimeConversions.dynamicSchemaWithDateTimeMicrosConversion

I also think the GenericData class is correctly setup today with values preserved in their canonical type (and not logical type). I did try the change, however, and noticed that unit-tests like this one started failing - TestGenericLogicalTypes.writeLocalTimestampMicros

@opwvhk
Copy link
Contributor

opwvhk commented Jun 12, 2025

I also ericDataclass is correctly setup today with values preserved in their canonical type (and not logical type). I did try the change, however, and noticed that unit-tests like this one started failing -TestGenericLogicalTypes.writeLocalTimestampMicros`

You have a point there, although I believe it's more a lack of being able to ignore a logical type (the class API does not provide for this).

Adjusting that would be too much for this PR though.

@opwvhk opwvhk merged commit f131a5c into apache:main Jun 12, 2025
9 checks passed
@hmit
Copy link
Contributor Author

hmit commented Jun 12, 2025

Thanks for the merge @opwvhk!

opwvhk pushed a commit that referenced this pull request Jun 13, 2025
…ficData (#3354)

* AVRO-3989: [java] add conversion classes to default instance of SpecificData

* add test case, exclude decimal conversion

* review comment: use schema from the generated class for `fooBar.avsc`

* add logical type conversion classes to ReflectData as well

* refactor to single method

---------

Co-authored-by: Harshit Mittal <hmittal@netflix.com>
(cherry picked from commit f131a5c)
@hmitnflx
Copy link
Contributor

@opwvhk Is it possible to patch avro1.11 as well with this change and perhaps released as v1.11.5?

opwvhk pushed a commit to opwvhk/avro that referenced this pull request Sep 5, 2025
…ficData (apache#3354)

* AVRO-3989: [java] add conversion classes to default instance of SpecificData

* add test case, exclude decimal conversion

* review comment: use schema from the generated class for `fooBar.avsc`

* add logical type conversion classes to ReflectData as well

* refactor to single method

---------

Co-authored-by: Harshit Mittal <hmittal@netflix.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Java Pull Requests for Java binding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants