-
Notifications
You must be signed in to change notification settings - Fork 1.7k
AVRO-3527: Generated equals() and hashCode() for SpecificRecords #1708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/FieldTest.java
Show resolved
Hide resolved
opwvhk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I like the care that was given to CharSequence, and that it also works for Strings.
I would like to know though if these changes work equally well for CharSequence and String for string fields.
steven-aerts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@opwvhk as you were curious, I locally updated the perf test to also benchmark strings as CharSequences.
Those results also look good, especially for .hashCode() we are on the higher end of what we saw up till now (factor 12).
Before:
Benchmark Mode Cnt Score Error Units
SpecficTest.equals thrpt 3 14115336.148 ± 15794798.446 ops/s
SpecficTest.hashCode thrpt 3 10566693.395 ± 11566539.470 ops/s
After:
Benchmark Mode Cnt Score Error Units
SpecficTest.equals thrpt 3 191368870.430 ± 123807130.335 ops/s
SpecficTest.hashCode thrpt 3 136284950.236 ± 113094363.351 ops/s
lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/FieldTest.java
Show resolved
Hide resolved
|
Thanks for the PR, this looks really interesting! I set the target for 1.12.0 because it's a change in how classes are generated... but I'm curious whether that should really be the case. I really don't think there's any reason this could be a major change! What do you think? |
lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java
Fixed
Show fixed
Hide fixed
|
@RyanSkraba there is one small corner case, which let me doubt. When using If we choose to introduce this change as a minor version. It would mean that for the Do you see that as a reason to introduce it in a major version only? |
Thanks! I think it's a good idea to bump it to the next major version, but we usually do one later in the year. I used to think that Avro required that the code-generator version must be in sync with the avro library version, because that's the way my company always did it, but I've learned since that it's not always the case. |
This is the case for ANTLR, and is a common practice because some generated code is so tightly coupled to the library, that using the same version is a requirement to ensure compatibility. This is not the case for Avro. |
|
@RyanSkraba is there still something I can do to help to get this change submitted? |
|
Hey, thanks so much for your patience! The RC1 for 1.11.1 is out for testing, so hopefully we should be getting around to PRs targetted for 1.12.0 in August. |
Update the compiler to generate the implementation of the `.equals()` and `.hashCode() function, instead of relying on the implementation of GenericData. This improves the performance of those functions significantly. The generated implementations are factor 10 to 20 faster for `.equals()` and a factor 5 to 10 for `.hashCode()`. Result of Perf test before the change: ``` Benchmark Mode Cnt Score Error Units SpecficTest.equals thrpt 3 12598610.194 +/- 11160265.279 ops/s SpecficTest.hashCode thrpt 3 24729446.862 +/- 29051332.794 ops/s ``` Results using generated functions: ``` Benchmark Mode Cnt Score Error Units SpecficTest.equals thrpt 3 211314296.950 +/- 104154793.126 ops/s SpecficTest.hashCode thrpt 3 180349506.632 +/- 143639246.771 ops/s ``` Signed-off-by: Steven Aerts <steven.aerts@gmail.com>
|
Rebased the patch on master, as it still causes a huge improvement gain when comparing |
Update the compiler to generate the implementation of the `.equals()` and `.hashCode() function, instead of relying on the implementation of GenericData. This improves the performance of those functions significantly. The generated implementations are factor 10 to 20 faster for `.equals()` and a factor 5 to 10 for `.hashCode()`. Result of Perf test before the change: ``` Benchmark Mode Cnt Score Error Units SpecficTest.equals thrpt 3 12598610.194 +/- 11160265.279 ops/s SpecficTest.hashCode thrpt 3 24729446.862 +/- 29051332.794 ops/s ``` Results using generated functions: ``` Benchmark Mode Cnt Score Error Units SpecficTest.equals thrpt 3 211314296.950 +/- 104154793.126 ops/s SpecficTest.hashCode thrpt 3 180349506.632 +/- 143639246.771 ops/s ``` Signed-off-by: Steven Aerts <steven.aerts@gmail.com> (cherry picked from commit c880f47)
Update the compiler to generate the implementation of the `.equals()` and `.hashCode() function, instead of relying on the implementation of GenericData. This improves the performance of those functions significantly. The generated implementations are factor 10 to 20 faster for `.equals()` and a factor 5 to 10 for `.hashCode()`. Result of Perf test before the change: ``` Benchmark Mode Cnt Score Error Units SpecficTest.equals thrpt 3 12598610.194 +/- 11160265.279 ops/s SpecficTest.hashCode thrpt 3 24729446.862 +/- 29051332.794 ops/s ``` Results using generated functions: ``` Benchmark Mode Cnt Score Error Units SpecficTest.equals thrpt 3 211314296.950 +/- 104154793.126 ops/s SpecficTest.hashCode thrpt 3 180349506.632 +/- 143639246.771 ops/s ``` Signed-off-by: Steven Aerts <steven.aerts@gmail.com>
|
This PR introduced a bug in the Avro tools 1.12.1 when a schema includes a field with name In the following example, the schema contains two fields:
The @Override
public int hashCode() {
int result = 1;
result = 31 * result + (eventHeader == null ? 0 : eventHeader.hashCode());
result = 31 * result + (result == null ? 0 : result.hashCode());
result = 31 * result + (anotherField == null ? 0 : ((java.lang.Enum) anotherField).ordinal());
return result;
}Now since the schema has a field Trying to compile a project using the generated java class throws: For the sake of validating my findings, I have updated my schema and renamed my field One workaround would be to use custom Velocity templates, to rename @steven-aerts @RyanSkraba It would be great if we could rename the If this Github thread is not the right place to report this bug, please let me know where to report it. Thanks! |
|
@benoitcanton can you make a bug at https://issues.apache.org/? I already have a patch ready which fixes your issue by generating: result = 31 * result + (this.result == null ? 0 : this.result.hashCode()); |
|
Hi @steven-aerts thanks for the quick answer. I created https://issues.apache.org/jira/browse/AVRO-4202 |
Update the compiler to generate the implementation of the
.equals()and `.hashCode() function, instead of relying on theimplementation of GenericData. This improves the performance of those functions significantly.
The generated implementations are factor 10 to 20 faster for
.equals()and a factor 5 to 10 for.hashCode().The implementation generates the same hashCode as the genericData, which is validated by existing tests
Result of Perf test before the change:
Results using generated functions:
Jira
Tests
equals()andhashCode()Commits
Documentation