Implement Hash-Based Routing#519
Conversation
7c44bfe to
2c23efd
Compare
2c23efd to
91f5968
Compare
src/code.cloudfoundry.org/gorouter/proxy/round_tripper/proxy_round_tripper.go
Outdated
Show resolved
Hide resolved
e9cbe0c to
7b456f8
Compare
9f21f6b to
513c575
Compare
peanball
left a comment
There was a problem hiding this comment.
First set of comments. I have not looked at the implementation of maglev and the hash based instance determination.
I've focussed on the integration with the existing Gorouter code, route lookup, LB algo handling, etc.
| if e != nil { | ||
| e.RLock() | ||
| defer e.RUnlock() | ||
| return e.endpoint |
There was a problem hiding this comment.
e might not be valid anymore, as it is retrieved without lock. You then acquire a lock and should re-fetch the value to be sure that it's still there.
This commit provides the basic implementation for hash-based routing. It does not consider the balance factor yet. Co-authored-by: Clemens Hoffmann <clemens.hoffmann@sap.com> Co-authored-by: Tamara Boehm <tamara.boehm@sap.com> Co-authored-by: Soha Alboghdady <soha.alboghdady@sap.com>
5030bbc to
560f935
Compare
peanball
left a comment
There was a problem hiding this comment.
Some more comments, now on the hash_based.go
| balanceFactor := h.pool.HashRoutingProperties.BalanceFactor | ||
|
|
||
| if isEndpointOverloaded { | ||
| h.logger.Debug("hash-based-routing-endpoint-overloaded", slog.String("host", h.pool.host), slog.String("endpoint-id", endpoint.PrivateInstanceId), slog.Int64("endpoint-connections", currentInFlightRequestCount)) |
There was a problem hiding this comment.
as debug log is usually not on, for performance's sake it's better to guard it by checking if debug level is enabled at all in the logger.
something like
if !h.logger.Enabled(context.Background(), slog.LevelDebug) {
h.logger.Debug(...)
}There was a problem hiding this comment.
I would leave that to the logger to decide. Where I see this as relevant is when the mere call to .Debug would cause expensive up-front allocation / computation. But here we are just passing around arguments. We could probably optimize this a little by not using slog.String and instead just passing the args individually. This would defer any additional allocations until after the enabled check, avoiding the potential overhead they may cause. It is a bit less efficient for the logger itself but given that this is debug logging we don't care anyway.
There was a problem hiding this comment.
can you provide an example? Do you mean something like this?:
h.logger.Debug("hash-based-routing-endpoint-overloaded", "host", h.pool.host, "endpoint-id", endpoint.PrivateInstanceId, "endpoint-connections", currentInFlightRequestCount)This would be fine for me too. the call to Debug is then almost as free as with the wrapping condition.
There was a problem hiding this comment.
So we did a small excursion into the internals of go / slog:
- One zero-allocation variant is to always use
Logger.LogAttrswith at most 5 attributes as the slog package optimizes for that case. - The other option is to continue using the guard statement as the code path never gets executed in this case.
- The "convenience" option is
Logger.(Debug|Info|...)but it comes at the cost of one allocation per argument for the boxing of the passed argument intoany. Though the cost is negligible in the grand scheme of things, unless you do expensive up-front operations likefmt.Sprintf.
For readability I prefer either the LogAttrs for performance critical code or the Debug|Info|... functions where we don't need to squeeze every last ns out of the code. If we were to decide we want to optimize as much as we can the right thing to do would be to refactor the entire code-base to move to LogAttrs. I don't think that's needed so I'd just stick with Debug without the guard for this use-case, reserving the guard for cases we have truly expensive upfront work.
|
LGTM. Once we have the performance tests, we should move to community review. |
|
Good news - we evaluated the performance of gorouter with hash-based routing support. Our test setup constists of scaled out HAProxy (as a load balancer in front of gorouter), scaled out Diego Cells and a very simple and performant application with 30 instances. This allows us to performance test gorouter via external requests, where gorouter is the bottleneck. We compared the performance of the latest stable routing-release, and the same release including changes from this Pull-Request. Using
We also explored that requests are routed to next instances as expected, when the max. connection limit of one app instance was reached. The hash_balance factor is effective and allows distribution of requests to next instances before overloading it. Please feel free to reach out for details on the results! |
Changes were addressed. As this is still subject to change, I cannot give "final" approval yet.
21cc8b7 to
b414a38
Compare
|
During performance tests, we figured out that the Gorouter memory consumption is higher than expected if we keep the permutation table (table which is used to fill the final maglev lookup table) in memory. We want to investigate this finding further and optimize it. |
Co-authored-by: Alexander Nicke <alexander.nicke@sap.com>
hoffmaen
left a comment
There was a problem hiding this comment.
Only documentation / log related comments.
Hello, I am requesting approver status for the Networking group. My contributions so far: Pull-Requests: cloudfoundry/gorouter#442 cloudfoundry/gorouter#435 cloudfoundry/routing-release#519 cloudfoundry/routing-release#514 cloudfoundry/routing-release#504 cloudfoundry/routing-release#453 cloudfoundry/routing-release#434 cloudfoundry/cli#3378 cloudfoundry/cloud_controller_ng#4080 cloudfoundry/cloud_controller_ng#4199 Co-Authorship: cloudfoundry/routing-release#478 Pull-Request Reviews: cloudfoundry/gorouter#443 cloudfoundry/routing-release#452 cloudfoundry/cli#3372 GitHub Issues: cloudfoundry/routing-release#529 cloudfoundry/routing-release#468 cloudfoundry/routing-release#445 cloudfoundry/routing-release#429 cloudfoundry/cloud_controller_ng#4198
Summary
This Pull-Request implements #505.
Backward Compatibility
Breaking Change? Yes/No