HIVE-29205: Addendum: Iceberg: Upgrade iceberg version to 1.10.0#6235
HIVE-29205: Addendum: Iceberg: Upgrade iceberg version to 1.10.0#6235deniskuzZ merged 2 commits intoapache:masterfrom
Conversation
|
@deniskuzZ , can we upgrade I was planning to upgrade it but now we have addendum, i believe we can target it now. |
09e5f14 to
0cd0756
Compare
(cherry picked from commit 39f4ab6)
0cd0756 to
f4f440e
Compare
|
@deniskuzZ , i was checking #6238 and observed that there is iceberg-1.9.1 getting used in iceberg tests. Meaning tests are running with older version of iceberg for that particular module. It's better to handle in this PR itersef than #6238 to get early feedback on UT. hive/itests/hive-iceberg/pom.xml Line 30 in 0c0bc15 |
thanks! I missed that. fixed |
|
| // NOTE: we intentionally do not call commitTransaction(), so this property change is never published. | ||
| Transaction tx = table.newTransaction(); | ||
| tx.updateProperties() | ||
| .remove(TableProperties.DEFAULT_FILE_FORMAT) |
There was a problem hiding this comment.
In older iceberg/iceberg-handler/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java When file format was ORC, we were using AVRO based stats and now we will start using Parquet (DEFAULT_FILE_FORMAT_DEFAULT).
Question:
For iceberg table created with 1.9.1 having some pre-existing data, we have stats based on AVRO and after upgrading to iceberg 1.10.0, the new data will have the new incremental stats file based on AVRO + Parquet. Is my understanding correct?
There was a problem hiding this comment.
yes, users would have to drop the old stats. https://issues.apache.org/jira/browse/HIVE-28170 might have helped
There was a problem hiding this comment.
Post rewrite_data_files / compaction, i think it should be ok.
There was a problem hiding this comment.
@deniskuzZ why don't we rather stick to AVRO itself & maintain the behaviour like it is currently, we can put DEFAULT_FILE_FORMAT as AVRO here & it should behave same as it is now? is it for some perf reasons?
|
LGTM +1 |




What changes were proposed in this pull request?
PartitionStatsHandlerfrom iceberg and drop the patched class;Why are the changes needed?
Get rid of code duplication
Does this PR introduce any user-facing change?
No
How was this patch tested?
Jenkins