Add Retry Foundation - RequestType, RetryPolicy, and RetryTimeoutManager#1220
Add Retry Foundation - RequestType, RetryPolicy, and RetryTimeoutManager#1220nishkarsh-db wants to merge 5 commits intodatabricks:mainfrom
Conversation
…ryUtils - Add RequestType enum with retry policy classification (IDEMPOTENT/NON_IDEMPOTENT) - Add RetryPolicy enum for categorizing request types - Add RetryTimeoutManager for managing timeout budgets per status code - Add RetryUtils with extractRetryException() method - Add RetryTimeoutManagerTest with 6 comprehensive tests This PR establishes the foundation for unified retry handling across HTTP and Thrift operations. Components work together to make intelligent retry decisions based on request type, status codes, and timeout budgets. Co-authored-by: Cursor <cursoragent@cursor.com>
- Add apiRetriableCodesTimeoutMillis field with independent timeout budget - Add boolean parameter to evaluateRetryTimeoutForResponse() to distinguish API codes - When isApiRetriableCode=true, only API codes timeout is deducted - Add 2 new tests: testApiRetriableCodesTimeout and testApiRetriableCodesIndependentTimeout - Total 8 tests pass (6 original + 2 new) This allows custom API retriable codes (e.g. 404, 400) to have their own separate timeout budget (300s default) independent from standard retry codes. Strategies will pass true when status code is in custom ApiRetriableHttpCodes list. Co-authored-by: Cursor <cursoragent@cursor.com>
…mbers - Extract timeout values as class constants (TEMP_UNAVAILABLE_TIMEOUT_SECONDS, etc.) - Calculate test delays as expressions (timeout/4, timeout/3) instead of hardcoded values - Use (constant + 1) pattern for exhaustion tests instead of magic numbers - Improves maintainability and makes test intent clearer - All 8 tests pass Co-authored-by: Cursor <cursoragent@cursor.com>
- Update REQUEST_TIMEOUT_SECONDS from 10 to 120 - Update REQUEST_EXCEPTION_TIMEOUT_SECONDS from 10 to 120 - Update tests to reflect new 120-second timeout values - All 8 tests pass Co-authored-by: Cursor <cursoragent@cursor.com>
| * @param isApiRetriableCode true if this is a custom API retriable code, false otherwise | ||
| * @return true if the request should be retried, false otherwise | ||
| */ | ||
| public boolean evaluateRetryTimeoutForResponse( |
There was a problem hiding this comment.
can we add debug logs so that it is abundantly clear why a retry was stopped
| */ | ||
| public static DatabricksRetryHandlerException extractRetryException(Throwable e) { | ||
| Throwable cause = e.getCause(); | ||
| while (cause != null) { |
There was a problem hiding this comment.
do we want to skip top level e?
There was a problem hiding this comment.
Yes as it would always be of the type TTransportException, so we do want to skip it, I'll add a comment for that.
| /** | ||
| * Evaluates retry decision based on an exception and updates timeout accordingly. | ||
| * | ||
| * @param exception the exception that occurred during request execution |
There was a problem hiding this comment.
this is not passed
There was a problem hiding this comment.
Yes, passing it is a part of next PR.
| // Check if any timeout has been exceeded | ||
| return tempUnavailableTimeoutMillis > 0 | ||
| && rateLimitTimeoutMillis > 0 | ||
| && otherErrorCodesTimeoutMillis > 0; |
There was a problem hiding this comment.
can we just move these checks to the individual case in the switch block
| */ | ||
| public class RetryUtils { | ||
| public static final long REQUEST_TIMEOUT_SECONDS = 120; | ||
| public static final long REQUEST_EXCEPTION_TIMEOUT_SECONDS = 120; |
There was a problem hiding this comment.
let's add DEFAULT_ at the start
32e2032 to
98a6f6d
Compare
- Add debug logs to RetryTimeoutManager explaining why retry was stopped - Move timeout exhaustion checks into switch statement cases - Add comment explaining why extractRetryException skips top-level exception - Rename timeout constants to DEFAULT_REQUEST_TIMEOUT_SECONDS and DEFAULT_REQUEST_EXCEPTION_TIMEOUT_SECONDS - Fix Javadoc to remove unused @param exception Changes address review feedback from @vikrantpuppala: - Debug logs make retry stop reason abundantly clear - Timeout checks moved to individual switch cases for clarity - Comment added to extractRetryException about TTransportException wrapper - Constants renamed with DEFAULT_ prefix for clarity Co-authored-by: Cursor <cursoragent@cursor.com>
|
This PR has been marked as Stale because it has been open for 30 days with no activity. If you would like the PR to remain open, please remove the stale label or comment on the PR. |
|
This PR was closed because it has been inactive for 7 days since being marked as stale. |
Description
Introduces foundational components for unified retry handling mechanism:
TemporarilyUnavailableRetryTimeoutRateLimitRetryTimeoutApiRetryTimeoutextractRetryException()method and timeout constantsThis PR is the first in a stack of 4 PRs that will migrate retry handling from Apache HTTP client's native handlers to application-layer control.
Testing
All tests pass (8 tests):
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0RetryTimeoutManagerTestcovers:Notes for Reviewer