[Bug] LLM call fails with ReadTimeout after 3 retry attempts

**Tags:** `bug`, `api`, `performance`
**Quality Rating:** ⭐ 7/10

---

**Reporter:** 董江涵

## Description

When the digital employee (Agent) processes complex or long-context requests, the LLM API call fails with the following error:

```
[LLM call error] ReadTimeout: Connection failed after 3 attempts
```

The error occurs intermittently, particularly when:
- The conversation context is long (multi-turn with rich content)
- The agent needs to perform complex reasoning or generate lengthy responses
- Multiple tool calls are involved in a single turn

The system retries 3 times and then gives up, returning the raw error message to the user without any graceful fallback or user-friendly explanation.

## Steps to Reproduce

1. Start a conversation with a digital employee (Agent) on Clawith platform
2. Build up a long conversation context (e.g., request a comprehensive research report)
3. Send a follow-up message that requires the agent to process the full context
4. Observe that the agent returns `[LLM call error] ReadTimeout: Connection failed after 3 attempts`

## Expected Behavior

1. The LLM call should succeed, or if it times out, the system should:
   - Use a longer timeout for complex requests
   - Implement exponential backoff retry strategy (e.g., 2s → 4s → 8s intervals)
   - Provide a user-friendly error message instead of exposing raw error
   - Optionally offer to retry or simplify the request

## Actual Behavior

- The LLM call times out and fails after 3 consecutive attempts
- The raw error `[LLM call error] ReadTimeout: Connection failed after 3 attempts` is displayed directly to the user
- No graceful degradation or recovery mechanism is triggered
- The user has to manually resend the message and hope it works

## Suggested Improvements

1. **Increase timeout** — Raise HTTP timeout from default (likely 30s) to 60–120s for complex requests
2. **Exponential backoff** — Implement retry with increasing intervals (e.g., 2s → 4s → 8s) instead of immediate retries
3. **Increase retry count** — Consider 5 retries instead of 3
4. **Streaming support** — Use streaming mode for LLM responses to avoid long-wait timeouts
5. **Context-aware timeout** — Dynamically adjust timeout based on prompt length / token count
6. **User-friendly error** — Show a helpful message like "The AI is taking longer than expected. Please try again or simplify your request."
7. **Auto-retry with context trimming** — On timeout, automatically retry with a trimmed context

## Additional Context

- This error was observed multiple times during a single conversation session on 2026-03-12
- The conversation involved research tasks requiring extensive web searches and long-form generation
- The error appears to be more frequent during peak hours, suggesting possible server-side load issues


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] LLM call fails with ReadTimeout after 3 retry attempts #43

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Suggested Improvements

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] LLM call fails with ReadTimeout after 3 retry attempts #43

Description

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Suggested Improvements

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions