Shutdown Websocket Connections Gracefully in the queue-proxy#16362
Shutdown Websocket Connections Gracefully in the queue-proxy#16362knative-prow[bot] merged 10 commits intoknative:mainfrom
Conversation
dprotaso
commented
Jan 27, 2026
- have the websocket server test image send a close message when shutting down gracefully
- include an e2e test that asserts websockets connections are gracefully handled
- include a special handler to track hijacked connections and create a drain method
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dprotaso The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16362 +/- ##
==========================================
+ Coverage 80.18% 80.22% +0.03%
==========================================
Files 216 217 +1
Lines 13440 13507 +67
==========================================
+ Hits 10777 10836 +59
- Misses 2298 2306 +8
Partials 365 365 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
One observation I have is that the activator will drop hijacked connections similar to the queue proxy. Hijacking happens implicitly with the The solution in the queue-proxy is straight forward because we know K8s will send a TERM signal to the user container - thus we can expect apps to return a close frame when that happens. Those are the changes I made to our websocket server. This isn't necessary but just maybe it's a good practice With the activator though - there's no way to signal downstream to end the connection. We maybe could send a close frame on the connection but it makes the activator protocol aware and I don't think we can write a message on an upstream connection like this (it could be interleaved with another). Given this - I don't think the activator issue should prevent this change from merging - but just the solution there isn't straighforward. |
|
/retest hmm
|
9f80c61 to
cd8e486
Compare
b889e35 to
b434a0a
Compare
|
/retest |
411fe17 to
55df508
Compare
| gotestsum --format testname -- \ | ||
| -race -count=1 -parallel=1 -tags=e2e \ | ||
| -timeout=30m \ | ||
| -timeout=35m \ |
There was a problem hiding this comment.
With the extra test we need to bump this since we run things with -parallel=1 due to github actions default runners being crappy.
There was a problem hiding this comment.
I adjusted the test to not use time.Sleep to speed it up
There was a problem hiding this comment.
Maybe not that relevant for this PR but timing tests could also be sped up using synctest after requiring go 1.25, see https://go.dev/blog/testing-time
There was a problem hiding this comment.
Not for e2e. It would run faster if we ran tests in parallel - but with kind+github action runners they are so under provisioned that tests flake out etc.
|
/assign @linkvt @elijah-rou |
|
@dprotaso: GitHub didn't allow me to assign the following users: elijah-rou. Note that only knative members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
linkvt
left a comment
There was a problem hiding this comment.
Looks good to me in general, took me a while to figure out the concurrency and what is actually tested, I left a few minor comments though.
pkg/http/handler/hijack_test.go
Outdated
| <-inHandler | ||
| close(drainStarted) | ||
| drainResult <- h.Drain(context.Background()) | ||
| }() | ||
|
|
||
| <-drainStarted | ||
| close(handlerWait) |
There was a problem hiding this comment.
I'm not sure if this always behaves as expected, couldn't a potential sequence be:
- B: inHandler is unblocked
- B: close drainStarted
- Test: drainStarted is unblocked
- Test: close handlerWait
- A: handlerWait is unblocked, returns and inflight requests equals 0
- B: nothing to Drain
Should this test verify that drain works correctly when a connection is closed after the tracker was asked to Drain?
There was a problem hiding this comment.
You think I should add a sleep in-between drainStarted being unblocked and closing handlerWait?
There was a problem hiding this comment.
I think I was thinking that close(drainStarted) wouldn't yield the go-routine so the drain does start but I don't believe there is a guarantee like that in Go
There was a problem hiding this comment.
Yes I think we can remove drainStarted, move <-inHandler out of the second goroutine between the two goroutines and add a sleep before close(handlerWait).
This way the handler would start, unblock <-inHandler, start the goroutine to drain, sleep in the test func so that Drain will run the loop a few times before close(handlerWait) lets the handler finish.
Edit: We could even add a select to check whether <-drainResult has received a value before this sleep passes.
pkg/http/handler/hijack_test.go
Outdated
| <-inHandler | ||
| close(drainStarted) | ||
| drainResult <- h.Drain(context.Background()) | ||
| }() | ||
|
|
||
| <-drainStarted | ||
| close(handlerWait) |
There was a problem hiding this comment.
Yes I think we can remove drainStarted, move <-inHandler out of the second goroutine between the two goroutines and add a sleep before close(handlerWait).
This way the handler would start, unblock <-inHandler, start the goroutine to drain, sleep in the test func so that Drain will run the loop a few times before close(handlerWait) lets the handler finish.
Edit: We could even add a select to check whether <-drainResult has received a value before this sleep passes.
|
/lgtm |