Before submitting your bug report
Relevant environment info
- OS: Linux (Kubunu 24.04) and Windows 10
- Continue version: 1.3.30
- IDE version: VSCode 1.108.2
- Model: Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF, GLM-4.7-Flash-Derestricted-GGUF and gpt-oss-20b
- LocalAI version: v3.0.0 and v3.10.0
- config:
name: My Config
version: 1.0.0
schema: v1
models:
- name: qwen3-coder
provider: openai
model: Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF
apiKey: <Omitted>
apiBase: https://ai.lobadk.com/v1
Description
Responses from any LLM model running on LocalAI does not appear to be picked up by Continue.
Inspecting the requests via Wireshark, I get this:
{
"messages": [
{
"role": "system",
"content": "<important_rules>\n You are in chat mode.\n\n If the user asks to make changes to files offer that they can use the Apply Button on the code block, or switch to Agent Mode to make the suggested updates automatically.\n If needed concisely explain to the user they can switch to agent mode using the Mode Selector dropdown and provide no other details.\n\n Always include the language and file name in the info string when you write code blocks.\n If you are editing \"src/main.py\" for example, your code block should start with '```python src/main.py'\n\n When addressing code modification requests, present a concise code snippet that\n emphasizes only the necessary changes and uses abbreviated placeholders for\n unmodified sections. For example:\n\n ```language /path/to/file\n // ... existing code ...\n\n {{ modified code here }}\n\n // ... existing code ...\n\n {{ another modification }}\n\n // ... rest of code ...\n ```\n\n In existing files, you should always restate the function or class that the snippet belongs to:\n\n ```language /path/to/file\n // ... existing code ...\n\n function exampleFunction() {\n // ... existing code ...\n\n {{ modified code here }}\n\n // ... rest of function ...\n }\n\n // ... rest of code ...\n ```\n\n Since users have access to their complete file, they prefer reading only the\n relevant modifications. It's perfectly acceptable to omit unmodified portions\n at the beginning, middle, or end of files using these \"lazy\" comments. Only\n provide the complete file when explicitly requested. Include a concise explanation\n of changes unless the user specifically asks for code only.\n\n</important_rules>"
},
{
"role": "user",
"content": "Hello"
}
],
"model": "Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF",
"max_tokens": 4096,
"stream": true,
"stream_options": {
"include_usage": true
}
}
This appears to be correct, and putting the inspected request into something like Postman also correctly returns the streamed response, and, if streaming is disabled, the entire response when it is completed.
Shortly after the initial request from Continue, Wireshark does pick up the streamed response, and inspecting them all does reveal chunks of the response, but they never seem to reach Continue itself.
Continue does appear to wait until the entire response has been streamed, so it must understand some parts of the incoming data.
I would've tested without streaming in Continue, but it does not appear to have any configuration options for disabling streaming.
To reproduce
- Setup LocalAI. Can be local or remote.
- Setup any LLM that can fit on the LocalAI machine.
- Configure a model which points to the LocalAI model in Continue.
- Send any form of message.
- Watch and wait for Continue to receive the response.
- See empty result.
Log output
Type: Chat
Result: Success
Prompt Tokens: 421
Generated Tokens: 0
ThinkingTokens: 0
Total Time: 7.62s
Prompt
system
<important_rules>
You are in chat mode.
If the user asks to make changes to files offer that they can use the Apply Button on the code block, or switch to Agent Mode to make the suggested updates automatically.
If needed concisely explain to the user they can switch to agent mode using the Mode Selector dropdown and provide no other details.
Always include the language and file name in the info string when you write code blocks.
If you are editing "src/main.py" for example, your code block should start with ' src/main.py'
When addressing code modification requests, present a concise code snippet that
emphasizes only the necessary changes and uses abbreviated placeholders for
unmodified sections. For example:
/path/to/file
// ... existing code ...
{{ modified code here }}
// ... existing code ...
{{ another modification }}
// ... rest of code ...
In existing files, you should always restate the function or class that the snippet belongs to:
/path/to/file
// ... existing code ...
function exampleFunction() {
// ... existing code ...
{{ modified code here }}
// ... rest of function ...
}
// ... rest of code ...
Since users have access to their complete file, they prefer reading only the
relevant modifications. It's perfectly acceptable to omit unmodified portions
at the beginning, middle, or end of files using these "lazy" comments. Only
provide the complete file when explicitly requested. Include a concise explanation
of changes unless the user specifically asks for code only.
</important_rules>
user
hello
Options
{
"model": "Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF",
"maxTokens": 4096,
"reasoning": false,
"requestBody": {
"messages": [
{
"role": "system",
"content": "<important_rules>\n You are in chat mode.\n\n If the user asks to make changes to files offer that they can use the Apply Button on the code block, or switch to Agent Mode to make the suggested updates automatically.\n If needed concisely explain to the user they can switch to agent mode using the Mode Selector dropdown and provide no other details.\n\n Always include the language and file name in the info string when you write code blocks.\n If you are editing \"src/main.py\" for example, your code block should start with ' src/main.py'\n\n When addressing code modification requests, present a concise code snippet that\n emphasizes only the necessary changes and uses abbreviated placeholders for\n unmodified sections. For example:\n\n /path/to/file\n // ... existing code ...\n\n {{ modified code here }}\n\n // ... existing code ...\n\n {{ another modification }}\n\n // ... rest of code ...\n \n\n In existing files, you should always restate the function or class that the snippet belongs to:\n\n /path/to/file\n // ... existing code ...\n\n function exampleFunction() {\n // ... existing code ...\n\n {{ modified code here }}\n\n // ... rest of function ...\n }\n\n // ... rest of code ...\n \n\n Since users have access to their complete file, they prefer reading only the\n relevant modifications. It's perfectly acceptable to omit unmodified portions\n at the beginning, middle, or end of files using these \"lazy\" comments. Only\n provide the complete file when explicitly requested. Include a concise explanation\n of changes unless the user specifically asks for code only.\n\n</important_rules>"
},
{
"role": "user",
"content": "hello"
}
],
"model": "Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF",
"max_tokens": 4096,
"stream": true
}
}
Before submitting your bug report
Relevant environment info
Description
Responses from any LLM model running on LocalAI does not appear to be picked up by Continue.
Inspecting the requests via Wireshark, I get this:
{ "messages": [ { "role": "system", "content": "<important_rules>\n You are in chat mode.\n\n If the user asks to make changes to files offer that they can use the Apply Button on the code block, or switch to Agent Mode to make the suggested updates automatically.\n If needed concisely explain to the user they can switch to agent mode using the Mode Selector dropdown and provide no other details.\n\n Always include the language and file name in the info string when you write code blocks.\n If you are editing \"src/main.py\" for example, your code block should start with '```python src/main.py'\n\n When addressing code modification requests, present a concise code snippet that\n emphasizes only the necessary changes and uses abbreviated placeholders for\n unmodified sections. For example:\n\n ```language /path/to/file\n // ... existing code ...\n\n {{ modified code here }}\n\n // ... existing code ...\n\n {{ another modification }}\n\n // ... rest of code ...\n ```\n\n In existing files, you should always restate the function or class that the snippet belongs to:\n\n ```language /path/to/file\n // ... existing code ...\n\n function exampleFunction() {\n // ... existing code ...\n\n {{ modified code here }}\n\n // ... rest of function ...\n }\n\n // ... rest of code ...\n ```\n\n Since users have access to their complete file, they prefer reading only the\n relevant modifications. It's perfectly acceptable to omit unmodified portions\n at the beginning, middle, or end of files using these \"lazy\" comments. Only\n provide the complete file when explicitly requested. Include a concise explanation\n of changes unless the user specifically asks for code only.\n\n</important_rules>" }, { "role": "user", "content": "Hello" } ], "model": "Qwen3-Coder-30B-A3B-Instruct-RTPurbo-i1-GGUF", "max_tokens": 4096, "stream": true, "stream_options": { "include_usage": true } }This appears to be correct, and putting the inspected request into something like Postman also correctly returns the streamed response, and, if streaming is disabled, the entire response when it is completed.
Shortly after the initial request from Continue, Wireshark does pick up the streamed response, and inspecting them all does reveal chunks of the response, but they never seem to reach Continue itself.
Continue does appear to wait until the entire response has been streamed, so it must understand some parts of the incoming data.
I would've tested without streaming in Continue, but it does not appear to have any configuration options for disabling streaming.
To reproduce
Log output