TICKET-597: Fix worker orphan processes and add max test timeout cap#710
Draft
TICKET-597: Fix worker orphan processes and add max test timeout cap#710
Conversation
98b4235 to
e17503a
Compare
for more information, see https://pre-commit.ci
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Worker processes running student tests could become orphaned when the
RQworker crashed or restarted before cleanup ran, becausestart_new_session=Truefully detached test processes into their own session. This made them invisible to process supervision and allowed them to run indefinitely, blocking the dedicated test user slot.Implementation
start_new_session=Truefrom the subprocess call in order to reap test processes when the worker dies.max_test_timeoutserver-side configuration (default 600s) that caps per-test timeouts, preventing instructor-configured or missing timeouts from letting tests hang forever during normal operation.start_new_sessionmeansos.killpgcan no longer be used in the same-user (dev) path without killing the worker itself, so it is replaced with proc.kill() + proc.wait() to target the child process directly.