Skip to content

HIVE-29507: Create a slim hive-iceberg-handler core JAR#6364

Open
pudidic wants to merge 1 commit intoapache:masterfrom
pudidic:HIVE-29507
Open

HIVE-29507: Create a slim hive-iceberg-handler core JAR#6364
pudidic wants to merge 1 commit intoapache:masterfrom
pudidic:HIVE-29507

Conversation

@pudidic
Copy link
Contributor

@pudidic pudidic commented Mar 14, 2026

What changes were proposed in this pull request?

Create a slim hive-iceberg-handler core JAR file to avoid InvalidClassException.

Before:

  • iceberg-shading
    • maven-shade-plugin shades Iceberg and other dependencies.
  • iceberg-handler
    • maven-dependency-plugin unpacks iceberg-shading and iceberg-catalog then packs them together.

After:

  • iceberg-shading
    • maven-shade-plugin shades Iceberg and other dependencies.
  • iceberg-handler
    • maven-shade-plugin shades iceberg-shading and iceberg-catalog without relocation, which results the same JAR file as maven-dependency-plugin did.
    • maven-jar-plugin creates a new slim JAR without shaded classes.

maven-dependency-plugin in iceberg-handler overwrites the class directory, so maven-jar-plugin is affected. Its solution is to use <configuration><includes></includes></configuration>, but as there are many shared Java packages across artifacts, almost 100 individual class names should be explicitly configured. That number looks hard to maintain when any class is changed in those packages.

Why are the changes needed?

HiveIcebergStorageHandler was used for interoperability use cases between Apache Hive 3.x and Apache Spark. The handler was packaged in iceberg-hive-runtime.jar until Apache Iceberg 1.7. The Hive runtime is deleted from Apache Iceberg 1.8. Apache Hive also provides hive-iceberg-handler.jar. Apache Spark can import hive-iceberg-handler.jar and iceberg-spark-runtime.jar together for interoperability use cases.

However, there are differences between classes in iceberg-spark-runtime.jar and hive-iceberg-handler.jar while there's no such between in iceberg-spark-runtime.jar and iceberg-hive-runtime.jar. It causes an InvalidClassException. For example,

java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class incompatible: stream classdesc serialVersionUID = 8569836863676564712, local class serialVersionUID = -8072381884098305524

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Built locally.

@sonarqubecloud
Copy link

@deniskuzZ
Copy link
Member

@pudidic, please update the PR description. iceberg-spark-runtime.jar never used iceberg-mr as a dependency

@deniskuzZ
Copy link
Member

deniskuzZ commented Mar 17, 2026

Spark uses hive-exec-core jar that loads HiveIcebergStorageHandler class. this class is part of hive-iceberg-handler.jar.

ERROR metadata.Hive: [main]: org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
    at org.apache.hadoop.hive.ql.metadata.HiveUtils.getStorageHandler(HiveUtils.java:389)
    at org.apache.hadoop.hive.ql.metadata.Hive.createStorageHandler(Hive.java:5881)
    at org.apache.hadoop.hive.ql.metadata.Hive$6.getHook(Hive.java:5861)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getHook(HiveMetaStoreClient.java:3583)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.executePostGetTableHook(HiveMetaStoreClient.java:2760)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:2748)
    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:281)
    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:267)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:216)
    at jdk.proxy2/jdk.proxy2.$Proxy67.getTable(Unknown Source)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:4483)
    at jdk.proxy2/jdk.proxy2.$Proxy67.getTable(Unknown Source)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1763)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1718)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1660)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1676)
    at org.apache.spark.sql.hive.client.Shim_v0_12.getTable(HiveShim.scala:652)
    at org.apache.spark.sql.hive.client.HiveClientImpl.getRawTableOption(HiveClientImpl.scala:439)
    at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$tableExists$1(HiveClientImpl.scala:454)

If Spark add the hive-iceberg-handler.jar they are getting

java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class incompatible: stream classdesc serialVersionUID = 8569836863676564712, local class serialVersionUID = -8072381884098305524

Iceberg's BaseFile class doesn't set fixed serialVersionUID and is packaged in both hive-iceberg-handler.jar and spark-iceberg-runtime.

cc @zabetak, @ayushtkn, @okumin, @zhangbutao WDYT?

I’m not really a fan of slim JARs. Maybe Spark could instead use the shading plugin to exclude duplicated iceberg classes from spark-iceberg-runtime.jar?

@ayushtkn
Copy link
Member

Iceberg's BaseFile class doesn't set fixed serialVersionUID and is packaged in both hive-iceberg-handler.jar and spark-iceberg-runtime.

This doesn't look like mess created at Hive. IMO It should be ideally sorted at Iceberg or Spark side

@zabetak
Copy link
Member

zabetak commented Mar 19, 2026

In general, lots of problems come both from shading and from releasing multiple jars per module. This proposal to create an additional jar from the iceberg-handler module makes the situation even more complicated.

I am not very familiar with the purpose of hive-iceberg-handler module so if someone has more insights about the expected usage of hive-iceberg-handler module then I would be curious to learn more.

I see that there is a need for Spark to get only the classes inside the hive-iceberg-handler module and this seems to be a fair request if we want other projects to establish dependencies on the hive-iceberg-handler. However, I can't comment about what's the best path forward unless I first understand the recommended usage patterns for the current module.

@deniskuzZ
Copy link
Member

deniskuzZ commented Mar 19, 2026

In general, lots of problems come both from shading and from releasing multiple jars per module. This proposal to create an additional jar from the iceberg-handler module makes the situation even more complicated.

I am not very familiar with the purpose of hive-iceberg-handler module so if someone has more insights about the expected usage of hive-iceberg-handler module then I would be curious to learn more.

I see that there is a need for Spark to get only the classes inside the hive-iceberg-handler module and this seems to be a fair request if we want other projects to establish dependencies on the hive-iceberg-handler. However, I can't comment about what's the best path forward unless I first understand the recommended usage patterns for the current module.

@zabetak hive-iceberg-handler is the bridge that lets Hive read, write, and manage Apache Iceberg tables. It implements Hive's storage handler plugin API so that Iceberg tables can be used through standard HiveQL

@zabetak
Copy link
Member

zabetak commented Mar 19, 2026

@deniskuzZ Is the hive-iceberg-handler module meant to be used by other projects outside Hive? Why are we packaging all these additional classes/deps inside a single jar?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants