Skip to content

feat(ollama): add max_retries to chat generator#2899

Open
Keyur-S-Patel wants to merge 1 commit intodeepset-ai:mainfrom
Keyur-S-Patel:feat/ollama-chat-max-retries
Open

feat(ollama): add max_retries to chat generator#2899
Keyur-S-Patel wants to merge 1 commit intodeepset-ai:mainfrom
Keyur-S-Patel:feat/ollama-chat-max-retries

Conversation

@Keyur-S-Patel
Copy link
Contributor

@Keyur-S-Patel Keyur-S-Patel commented Mar 1, 2026

PR : feat/ollama-chat-max-retries

Related Issues

  • partially addresses #9309

Proposed Changes:

  • Added max_retries support to OllamaChatGenerator.
  • Added tenacity-based retries for both sync and async chat calls.
  • Added exponential backoff via wait_exponential().
  • Kept retry handling local to run() / run_async() using @retry-decorated callables instead of separate helper methods.
  • Added tenacity to the Ollama integration package dependencies.
  • Updated unit tests for retry behavior, including the success-after-retry path and retry exhaustion path.
  • Adjusted the retry success test fixture to include valid Ollama token count metadata.

How did you test it?

  • hatch run fmt-check
  • hatch run test:types
  • hatch run test:unit

Result:

  • formatting checks passed
  • type checks passed
  • Ollama unit test suite passed on this branch (34 passed)

Notes for the reviewer

  • tenacity.stop_after_attempt(...) uses total attempts, so the implementation uses max_retries + 1 to preserve the expected semantics of “initial call + N retries”.
  • Retry policy currently retries generic Exception from Ollama chat calls, matching the previous broad retry behavior while switching to a standard retry library.
  • Full package test:all still depends on a running local Ollama instance for integration tests; only unit tests were used for branch validation.

Checklist

@Keyur-S-Patel Keyur-S-Patel requested a review from a team as a code owner March 1, 2026 02:48
@Keyur-S-Patel Keyur-S-Patel requested review from bogdankostic and removed request for a team March 1, 2026 02:48
@github-actions github-actions bot added integration:ollama type:documentation Improvements or additions to documentation labels Mar 1, 2026
@Keyur-S-Patel Keyur-S-Patel force-pushed the feat/ollama-chat-max-retries branch from ae3c423 to 83278a1 Compare March 1, 2026 02:58
@Keyur-S-Patel
Copy link
Contributor Author

@anakin87 Can you help me close this PR, its similar to the one you reviewed here

@Keyur-S-Patel Keyur-S-Patel changed the title add max_retries to chat generator Open feat(ollama): add max_retries to chat generator Mar 1, 2026
@Keyur-S-Patel Keyur-S-Patel changed the title Open feat(ollama): add max_retries to chat generator feat(ollama): add max_retries to chat generator Mar 1, 2026
Copy link
Contributor Author

@Keyur-S-Patel Keyur-S-Patel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to this one #2875

Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR @Keyur-S-Patel! I left two comments regarding the retry logic. Also, it would be great if we could add tests for run_async.

@retry(
reraise=True,
stop=stop_after_attempt(self.max_retries + 1),
retry=retry_if_exception_type(Exception),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrying on all kind of exceptions is too broad.

Copy link
Contributor Author

@Keyur-S-Patel Keyur-S-Patel Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry on 429 and 5xx?

Lemme know if you want to retry on something else as well.

Comment on lines +527 to +545
@retry(
reraise=True,
stop=stop_after_attempt(self.max_retries + 1),
retry=retry_if_exception_type(Exception),
wait=wait_exponential(),
)
def chat_with_retry() -> ChatResponse | Iterator[ChatResponse]:
return self._client.chat(
model=self.model,
messages=ollama_messages,
tools=ollama_tools,
stream=is_stream, # type: ignore[call-overload] # Ollama expects Literal[True] or Literal[False], not bool
keep_alive=self.keep_alive,
options=generation_kwargs,
format=self.response_format,
think=self.think,
)

response = chat_with_retry()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this not nested inside of run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Retry annotation requires function so can

  1. use wrapper function or
  2. Build new method and put retry annotation there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will separate out function in next commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:ollama type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants