Skip to content

Conversation

@difin
Copy link
Contributor

@difin difin commented Jan 14, 2026

What changes were proposed in this pull request?

Adds a standalone HMS REST Catalog Server that can scale independently from HMS.
Currently, HMS REST Catalog Server is tied to HMS and can only be started together with HMS in a single instance.
This PR introduces a standalone REST Catalog server that:

  • Runs independently of HMS in a separate JVM/process
  • Connects to external HMS instances via Thrift
  • Supports horizontal scaling through Kubernetes load balancing
  • Provides a health check endpoint for Kubernetes readiness/liveness probes
  • Reuses the same REST Catalog servlet code used in embedded HMS mode

Architecture:
Client → Kubernetes Load Balancer/API Gateway → Standalone REST Catalog Server → HMS

Why are the changes needed?

Allows independent scaling of the HMS REST Catalog Server from HMS, enabling:

  • Horizontal Scaling: Scale REST Catalog instances based on REST API load, independent of HMS load
  • Resource Optimization: Allocate resources separately for REST API serving vs. metadata operations
  • Deployment Flexibility: Deploy REST Catalog servers separately from HMS
  • Cloud-Native Deployment: Leverage Kubernetes for load balancing, health checks, and auto-scaling

Does this PR introduce any user-facing change?

Yes. Adds a new standalone server mode:

  • New Class: StandaloneRESTCatalogServer - can be run as a standalone application
  • Backward Compatible: Embedded HMS REST Catalog mode remains unchanged and continues to work as before
  • Port Conflict Detection: Enhanced error messages when port conflicts occur between embedded and standalone modes

How was this patch tested?

New integration tests.

@okumin
Copy link
Contributor

okumin commented Jan 14, 2026

The intention sounds great! I have one challenge: Do we really need active-passive with ZooKeeper? RESTful API should always be able to make use of a load balancer, whose configuration is typically easier than ZK.

@difin
Copy link
Contributor Author

difin commented Jan 15, 2026

@okumin Active-passive mode is not necessary for scaling, but active-active seems to what is needed. I used Zookeeper for consistency and code reuse, because it is already used in several places in Hive.

@deniskuzZ
Copy link
Member

deniskuzZ commented Jan 15, 2026

@okumin Active-passive mode is not necessary for scaling, but active-active seems to what is needed. I used Zookeeper for consistency and code reuse, because it is already used in several places in Hive.

@difin FYI there is an ongoing work to decommission Zookeeper.
cc @abstractdog

btw, why do we need coordinator here? i would envision the following flow:
Client → Load Balancer / API Gateway → HMS REST instance

@abstractdog
Copy link
Contributor

g work to decommission Zookeeper.

@okumin Active-passive mode is not necessary for scaling, but active-active seems to what is needed. I used Zookeeper for consistency and code reuse, because it is already used in several places in Hive.

@difin FYI there is an ongoing work to decommission Zookeeper. cc @abstractdog

btw, why do we need coordinator here? i would envision the following flow: Client → Load Balancer / API Gateway → HMS REST instance

while we're looking for native kubernetes alternatives for things we're currently doing with ZK, ZK is still a valid choice, as getting rid of it in the whole hive codebase would be too much in one go, especially because it's battle-tested, so reusing ZkRegistryBase is fine now (not to mention that hive still runs in old-school clusters with ZK nowadays in many places I guess)

@okumin
Copy link
Contributor

okumin commented Jan 15, 2026

Client → Load Balancer / API Gateway → HMS REST instance

I also think this is more than enough in most cases.

@difin
Copy link
Contributor Author

difin commented Jan 19, 2026

Thanks everyone! I am changing the implementation to support this flow, without Zookeeper:

Client → Load Balancer / API Gateway → HMS REST instance

@difin difin force-pushed the hms_rest_catalog_server_scaling branch from b20f5e7 to 7afe72b Compare January 26, 2026 21:18
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants