-
Notifications
You must be signed in to change notification settings - Fork 1.7k
AVRO-3520: [java] expose read schema in custom encoding #3445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
AVRO-3520: [java] expose read schema in custom encoding #3445
Conversation
|
@opwvhk If you could review would be awesome thanks |
|
@opwvhk fixed prettier thanks |
76ebd9a to
f378f08
Compare
|
@KalleOlaviNiemitalo @opwvhk could you please have a look? Much appreciated |
|
@opwvhk, following up to see if you have any time to review this PR. Thanks |
| * opportunity to detect schema changes and behave accordingly. Useful for | ||
| * maintaining backwards compatibility. | ||
| * | ||
| * @param schema the schema detected during read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#withSchema() is used also at https://github.com/apache/avro/pull/3445/files#diff-87216da701d0b2e3135497e3a9ffc4065a56d31142729d9867e0ca2e7dd2b2d1R1072 where the schema is write. The javadoc seems not correct by saying read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated comment
|
|
||
| private FieldAccessor[] createAccessorsFor(Schema schema) { | ||
|
|
||
| var byNameSchema = buildByName(clazz, schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how often #createAccessorsFor(Schema) is called but buildByName(clazz, schema) seems to be wasteful. It uses reflection to extract the field names -> accessor map and then drops it. Next time this method is used again it does it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is only called by #getAccessorsFor(Schema schema which caches this
| if (enc != null) { | ||
| try { | ||
| return new CustomEncodingWrapper(enc.using().getDeclaredConstructor().newInstance()); | ||
| var customEncodingClass = enc.using().getDeclaredConstructor().newInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...Class suggests that this is a java.lang.Class, but actually it is an instance of CustomEncoding.class.
| * @param schema the schema detected during read. | ||
| * @return custom encoding to be used. | ||
| */ | ||
| public CustomEncoding<T> withSchema(Schema schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API bothers me a bit.
Does it need to be thread-safe ? Because the overrides of this method could do something like this.schema = schema.
IMO this method should make sure somehow that it always returns a new instance of CustomEncoding that uses the provided schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, it is only called right after creation, so the libraries use is thread safe.
We could make it so that instead of a zero-argument constructor, it looks for a single-argument constructor and passes the schema in that way. It might be a little less discoverable as a feature, but safer.
They would need two constructors, as the zero-argument constructor will be needed to identify the schema
Alternativly perform an identity comparison after calling this method. Would mean default method would perform a reflection creation which I don't paticually like.
|
@martin-g thank you for the review. Have addressed or replied to the feedback. If you would like to see the multiple constructor approach happy to make that change. |
What is the purpose of the change
Allows custom encodings to support file changes.
Used #1692 as inspiration
Verifying this change
(Please pick one of the following options)
This change added tests and can be verified as follows:
Test custom encodings with several different versions, confirming that the encoder is linked to the schema
Documentation