Skip to content

Implement Hash-Based Routing#519

Draft
hoffmaen wants to merge 28 commits intocloudfoundry:developfrom
sap-contributions:hash-based-routing-with-overload
Draft

Implement Hash-Based Routing#519
hoffmaen wants to merge 28 commits intocloudfoundry:developfrom
sap-contributions:hash-based-routing-with-overload

Conversation

@hoffmaen
Copy link
Contributor

@hoffmaen hoffmaen commented Nov 11, 2025

Summary

This Pull-Request implements #505.

  • Implement hash-based routing in Gorouter
    • Gorouter routes requests to endpoints based on the value of a configured request header
    • Gorouter distributes requests to different predefined application instances, when the current request count to an app instance exceeds the balance factor
    • Gorouter routes requests to the "next instance" in the lookup table on retries
  • Follow decisions documented in this comment

Backward Compatibility

Breaking Change? Yes/No

@b1tamara b1tamara force-pushed the hash-based-routing-with-overload branch from e9cbe0c to 7b456f8 Compare December 3, 2025 09:32
@hoffmaen hoffmaen force-pushed the hash-based-routing-with-overload branch from 9f21f6b to 513c575 Compare December 9, 2025 08:24
Copy link
Contributor

@peanball peanball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First set of comments. I have not looked at the implementation of maglev and the hash based instance determination.

I've focussed on the integration with the existing Gorouter code, route lookup, LB algo handling, etc.

if e != nil {
e.RLock()
defer e.RUnlock()
return e.endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e might not be valid anymore, as it is retrieved without lock. You then acquire a lock and should re-fetch the value to be sure that it's still there.

@hoffmaen hoffmaen force-pushed the hash-based-routing-with-overload branch from 5030bbc to 560f935 Compare January 16, 2026 14:29
Copy link
Contributor

@peanball peanball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments, now on the hash_based.go

balanceFactor := h.pool.HashRoutingProperties.BalanceFactor

if isEndpointOverloaded {
h.logger.Debug("hash-based-routing-endpoint-overloaded", slog.String("host", h.pool.host), slog.String("endpoint-id", endpoint.PrivateInstanceId), slog.Int64("endpoint-connections", currentInFlightRequestCount))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as debug log is usually not on, for performance's sake it's better to guard it by checking if debug level is enabled at all in the logger.

something like

if !h.logger.Enabled(context.Background(), slog.LevelDebug) {
    h.logger.Debug(...)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave that to the logger to decide. Where I see this as relevant is when the mere call to .Debug would cause expensive up-front allocation / computation. But here we are just passing around arguments. We could probably optimize this a little by not using slog.String and instead just passing the args individually. This would defer any additional allocations until after the enabled check, avoiding the potential overhead they may cause. It is a bit less efficient for the logger itself but given that this is debug logging we don't care anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide an example? Do you mean something like this?:

h.logger.Debug("hash-based-routing-endpoint-overloaded", "host", h.pool.host, "endpoint-id", endpoint.PrivateInstanceId, "endpoint-connections", currentInFlightRequestCount)

This would be fine for me too. the call to Debug is then almost as free as with the wrapping condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we did a small excursion into the internals of go / slog:

  • One zero-allocation variant is to always use Logger.LogAttrs with at most 5 attributes as the slog package optimizes for that case.
  • The other option is to continue using the guard statement as the code path never gets executed in this case.
  • The "convenience" option is Logger.(Debug|Info|...) but it comes at the cost of one allocation per argument for the boxing of the passed argument into any. Though the cost is negligible in the grand scheme of things, unless you do expensive up-front operations like fmt.Sprintf.

For readability I prefer either the LogAttrs for performance critical code or the Debug|Info|... functions where we don't need to squeeze every last ns out of the code. If we were to decide we want to optimize as much as we can the right thing to do would be to refactor the entire code-base to move to LogAttrs. I don't think that's needed so I'd just stick with Debug without the guard for this use-case, reserving the guard for cases we have truly expensive upfront work.

@b1tamara b1tamara requested a review from peanball January 21, 2026 07:22
@peanball
Copy link
Contributor

LGTM. Once we have the performance tests, we should move to community review.

@hoffmaen
Copy link
Contributor Author

hoffmaen commented Jan 28, 2026

Good news - we evaluated the performance of gorouter with hash-based routing support.

Our test setup constists of scaled out HAProxy (as a load balancer in front of gorouter), scaled out Diego Cells and a very simple and performant application with 30 instances. This allows us to performance test gorouter via external requests, where gorouter is the bottleneck.

We compared the performance of the latest stable routing-release, and the same release including changes from this Pull-Request. Using oha (inofficial successor of hey), we put various load patterns on the application and measured throughput, CPU usage of gorouter, and latency. In summary,

  • both versions performed absolutely the same when using round-robin
  • we achieved the same performance using hash-based routing and round-robin, when requests are distributed across all instances (achieved by running oha with multiple instances, each using a different hash_header value)
  • various mixed load scenarios (load on a route with round-robin parallel to load on a route with hash) showed no impact on performance

We also explored that requests are routed to next instances as expected, when the max. connection limit of one app instance was reached. The hash_balance factor is effective and allows distribution of requests to next instances before overloading it.

Please feel free to reach out for details on the results!

peanball

This comment was marked as outdated.

@b1tamara b1tamara requested a review from peanball January 28, 2026 13:35
@peanball peanball dismissed their stale review January 28, 2026 14:00

Changes were addressed. As this is still subject to change, I cannot give "final" approval yet.

@hoffmaen hoffmaen marked this pull request as ready for review January 28, 2026 14:53
@hoffmaen hoffmaen requested a review from a team as a code owner January 28, 2026 14:53
@a18e a18e force-pushed the hash-based-routing-with-overload branch from 21cc8b7 to b414a38 Compare January 30, 2026 10:31
@b1tamara
Copy link
Contributor

b1tamara commented Feb 5, 2026

During performance tests, we figured out that the Gorouter memory consumption is higher than expected if we keep the permutation table (table which is used to fill the final maglev lookup table) in memory. We want to investigate this finding further and optimize it.
Moving PR back to Draft.

@b1tamara b1tamara marked this pull request as draft February 5, 2026 08:21
Copy link
Contributor Author

@hoffmaen hoffmaen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only documentation / log related comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

6 participants