Skip to content

Fix worker poll hang: use client default timeout instead of disabling it#412

Open
manan164 wants to merge 1 commit into
mainfrom
fix/poll-request-timeout-hang
Open

Fix worker poll hang: use client default timeout instead of disabling it#412
manan164 wants to merge 1 commit into
mainfrom
fix/poll-request-timeout-hang

Conversation

@manan164

@manan164 manan164 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Issue

The REST clients passed timeout=None per request, which httpx interprets as "no timeout" (infinite), not "use the client default". On a half-open connection (request sent, no response, socket never closed — e.g. an idle keep-alive flow silently dropped by an LB/NAT) the poll then hangs forever and the worker silently stops polling until restarted. This affects both sync (rest.py) and async (async_rest.py) workers.

Fix

  • rest.py and async_rest.py: pass httpx.USE_CLIENT_DEFAULT instead of None so the client's configured timeout actually applies and a stuck read fails on a bounded timeout instead of hanging.
  • Added tests/unit/api_client/test_poll_timeout.py: points each client at a half-open server and asserts the request raises on a bounded timeout instead of hanging.

Notes / possible follow-ups

  • After this fix, recovery is bounded by the client default (httpx.Timeout(120.0)); a shorter/ configurable read timeout for polls would make recovery faster than 120s.
  • The async runner could also reset a broken connection on poll failure (the sync runner already does) — parity hardening, not required here.

The REST clients passed timeout=None per request, which httpx interprets
as "no timeout" (infinite) rather than "use the client default". On a
half-open connection the poll then hangs forever and the worker silently
stops polling until restarted (affects both sync and async workers).

Pass httpx.USE_CLIENT_DEFAULT instead so the client's configured timeout
applies. Adds a regression test that points each client at a half-open
server and asserts the request fails on a bounded timeout instead of hanging.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/conductor/client/http/async_rest.py 40.00% <100.00%> (+16.75%) ⬆️
src/conductor/client/http/rest.py 84.07% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@manan164 manan164 marked this pull request as ready for review June 12, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant