Responses from LLM's running from LocalAI are not detected

### Before submitting your bug report

- [x] I've tried using the "Ask AI" feature on the [Continue docs site](https://docs.continue.dev/) to see if the docs have an answer
- [x] I'm not able to find a related conversation on [GitHub discussions](https://github.com/continuedev/continue/discussions) that reports the same bug
- [x] I'm not able to find an [open issue](https://github.com/continuedev/continue/issues?q=is%3Aopen+is%3Aissue) that reports the same bug
- [x] I've seen the [troubleshooting guide](https://docs.continue.dev/troubleshooting) on the Continue Docs

### Relevant environment info

```Markdown
- OS: Linux (Kubunu 24.04) and Windows 10
- Continue version: 1.3.30
- IDE version: VSCode 1.108.2
- Model: Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF, GLM-4.7-Flash-Derestricted-GGUF and gpt-oss-20b
- LocalAI version: v3.0.0 and v3.10.0
- config:
  
name: My Config
version: 1.0.0
schema: v1

models:
  - name: qwen3-coder
    provider: openai
    model: Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF
    apiKey: <Omitted>
    apiBase: https://ai.lobadk.com/v1
```

### Description

Responses from any LLM model running on LocalAI does not appear to be picked up by Continue.

Inspecting the requests via Wireshark, I get this:
```json
{
  "messages": [
    {
      "role": "system",
      "content": "<important_rules>\n  You are in chat mode.\n\n  If the user asks to make changes to files offer that they can use the Apply Button on the code block, or switch to Agent Mode to make the suggested updates automatically.\n  If needed concisely explain to the user they can switch to agent mode using the Mode Selector dropdown and provide no other details.\n\n  Always include the language and file name in the info string when you write code blocks.\n  If you are editing \"src/main.py\" for example, your code block should start with '```python src/main.py'\n\n  When addressing code modification requests, present a concise code snippet that\n  emphasizes only the necessary changes and uses abbreviated placeholders for\n  unmodified sections. For example:\n\n  ```language /path/to/file\n  // ... existing code ...\n\n  {{ modified code here }}\n\n  // ... existing code ...\n\n  {{ another modification }}\n\n  // ... rest of code ...\n  ```\n\n  In existing files, you should always restate the function or class that the snippet belongs to:\n\n  ```language /path/to/file\n  // ... existing code ...\n\n  function exampleFunction() {\n    // ... existing code ...\n\n    {{ modified code here }}\n\n    // ... rest of function ...\n  }\n\n  // ... rest of code ...\n  ```\n\n  Since users have access to their complete file, they prefer reading only the\n  relevant modifications. It's perfectly acceptable to omit unmodified portions\n  at the beginning, middle, or end of files using these \"lazy\" comments. Only\n  provide the complete file when explicitly requested. Include a concise explanation\n  of changes unless the user specifically asks for code only.\n\n</important_rules>"
    },
    {
      "role": "user",
      "content": "Hello"
    }
  ],
  "model": "Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF",
  "max_tokens": 4096,
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

```
This appears to be correct, and putting the inspected request into something like Postman also correctly returns the streamed response, and, if streaming is disabled, the entire response when it is completed.

Shortly after the initial request from Continue, Wireshark does pick up the streamed response, and inspecting them all does reveal chunks of the response, but they never seem to reach Continue itself.

Continue does appear to wait until the entire response has been streamed, so it must understand some parts of the incoming data.

I would've tested without streaming in Continue, but it does not appear to have any configuration options for disabling streaming.

### To reproduce

1. Setup LocalAI. Can be local or remote.
2. Setup any LLM that can fit on the LocalAI machine.
3. Configure a model which points to the LocalAI model in Continue.
4. Send any form of message.
5. Watch and wait for Continue to receive the response.
6. See empty result.

### Log output

```Shell
Type: Chat
Result: Success
Prompt Tokens: 421
Generated Tokens: 0
ThinkingTokens: 0
Total Time: 7.62s
 Prompt
system
<important_rules>
  You are in chat mode.

  If the user asks to make changes to files offer that they can use the Apply Button on the code block, or switch to Agent Mode to make the suggested updates automatically.
  If needed concisely explain to the user they can switch to agent mode using the Mode Selector dropdown and provide no other details.

  Always include the language and file name in the info string when you write code blocks.
  If you are editing "src/main.py" for example, your code block should start with ' src/main.py'

  When addressing code modification requests, present a concise code snippet that
  emphasizes only the necessary changes and uses abbreviated placeholders for
  unmodified sections. For example:

   /path/to/file
  // ... existing code ...

  {{ modified code here }}

  // ... existing code ...

  {{ another modification }}

  // ... rest of code ...
  

  In existing files, you should always restate the function or class that the snippet belongs to:

   /path/to/file
  // ... existing code ...

  function exampleFunction() {
    // ... existing code ...

    {{ modified code here }}

    // ... rest of function ...
  }

  // ... rest of code ...
  

  Since users have access to their complete file, they prefer reading only the
  relevant modifications. It's perfectly acceptable to omit unmodified portions
  at the beginning, middle, or end of files using these "lazy" comments. Only
  provide the complete file when explicitly requested. Include a concise explanation
  of changes unless the user specifically asks for code only.

</important_rules>
user
hello
 Options
{
  "model": "Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF",
  "maxTokens": 4096,
  "reasoning": false,
  "requestBody": {
    "messages": [
      {
        "role": "system",
        "content": "<important_rules>\n  You are in chat mode.\n\n  If the user asks to make changes to files offer that they can use the Apply Button on the code block, or switch to Agent Mode to make the suggested updates automatically.\n  If needed concisely explain to the user they can switch to agent mode using the Mode Selector dropdown and provide no other details.\n\n  Always include the language and file name in the info string when you write code blocks.\n  If you are editing \"src/main.py\" for example, your code block should start with ' src/main.py'\n\n  When addressing code modification requests, present a concise code snippet that\n  emphasizes only the necessary changes and uses abbreviated placeholders for\n  unmodified sections. For example:\n\n   /path/to/file\n  // ... existing code ...\n\n  {{ modified code here }}\n\n  // ... existing code ...\n\n  {{ another modification }}\n\n  // ... rest of code ...\n  \n\n  In existing files, you should always restate the function or class that the snippet belongs to:\n\n   /path/to/file\n  // ... existing code ...\n\n  function exampleFunction() {\n    // ... existing code ...\n\n    {{ modified code here }}\n\n    // ... rest of function ...\n  }\n\n  // ... rest of code ...\n  \n\n  Since users have access to their complete file, they prefer reading only the\n  relevant modifications. It's perfectly acceptable to omit unmodified portions\n  at the beginning, middle, or end of files using these \"lazy\" comments. Only\n  provide the complete file when explicitly requested. Include a concise explanation\n  of changes unless the user specifically asks for code only.\n\n</important_rules>"
      },
      {
        "role": "user",
        "content": "hello"
      }
    ],
    "model": "Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF",
    "max_tokens": 4096,
    "stream": true
  }
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Responses from LLM's running from LocalAI are not detected #10113

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Responses from LLM's running from LocalAI are not detected #10113

Description

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions