HIVE-29507: Create a slim hive-iceberg-handler core JAR#6364
HIVE-29507: Create a slim hive-iceberg-handler core JAR#6364pudidic wants to merge 1 commit intoapache:masterfrom
Conversation
|
|
@pudidic, please update the PR description. |
|
Spark uses If Spark add the Iceberg's cc @zabetak, @ayushtkn, @okumin, @zhangbutao WDYT? I’m not really a fan of slim JARs. Maybe Spark could instead use the shading plugin to exclude duplicated iceberg classes from |
This doesn't look like mess created at Hive. IMO It should be ideally sorted at Iceberg or Spark side |
|
In general, lots of problems come both from shading and from releasing multiple jars per module. This proposal to create an additional jar from the iceberg-handler module makes the situation even more complicated. I am not very familiar with the purpose of I see that there is a need for Spark to get only the classes inside the hive-iceberg-handler module and this seems to be a fair request if we want other projects to establish dependencies on the hive-iceberg-handler. However, I can't comment about what's the best path forward unless I first understand the recommended usage patterns for the current module. |
@zabetak |
|
@deniskuzZ Is the |



What changes were proposed in this pull request?
Create a slim
hive-iceberg-handlercore JAR file to avoidInvalidClassException.Before:
iceberg-shadingmaven-shade-pluginshades Iceberg and other dependencies.iceberg-handlermaven-dependency-pluginunpacksiceberg-shadingandiceberg-catalogthen packs them together.After:
iceberg-shadingmaven-shade-pluginshades Iceberg and other dependencies.iceberg-handlermaven-shade-pluginshadesiceberg-shadingandiceberg-catalogwithout relocation, which results the same JAR file asmaven-dependency-plugindid.maven-jar-plugincreates a new slim JAR without shaded classes.maven-dependency-plugininiceberg-handleroverwrites the class directory, somaven-jar-pluginis affected. Its solution is to use<configuration><includes></includes></configuration>, but as there are many shared Java packages across artifacts, almost 100 individual class names should be explicitly configured. That number looks hard to maintain when any class is changed in those packages.Why are the changes needed?
HiveIcebergStorageHandlerwas used for interoperability use cases between Apache Hive 3.x and Apache Spark. The handler was packaged iniceberg-hive-runtime.jaruntil Apache Iceberg 1.7. The Hive runtime is deleted from Apache Iceberg 1.8. Apache Hive also provideshive-iceberg-handler.jar. Apache Spark can importhive-iceberg-handler.jarandiceberg-spark-runtime.jartogether for interoperability use cases.However, there are differences between classes in
iceberg-spark-runtime.jarandhive-iceberg-handler.jarwhile there's no such between iniceberg-spark-runtime.jarandiceberg-hive-runtime.jar. It causes anInvalidClassException. For example,Does this PR introduce any user-facing change?
No.
How was this patch tested?
Built locally.