loop detection in chat#3013
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request implements loop detection for chat tool calling to prevent infinite loops, particularly for the Gemini model family. The feature detects two types of loops: repeated tool calls with the same arguments, and repeated text in model responses.
Changes:
- Added loop detection functions with configurable thresholds for tool call loops and text loops
- Integrated loop detection into the main tool calling loop and inline chat, only activating for Gemini family models
- Added telemetry tracking and new metadata fields to distinguish between loop-detected exits and iteration-limit exits
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/extension/prompt/common/toolCallRound.ts | Adds detectToolCallLoop and detectTextLoop functions with heuristic-based loop detection logic |
| src/extension/intents/node/toolCallingLoop.ts | Integrates loop detection into the main tool calling loop, adds telemetry events and handlers for detected loops |
| src/extension/inlineChat/node/inlineChatIntent.ts | Integrates tool call loop detection into inline chat with Gemini-specific handling |
| src/extension/prompt/common/conversation.ts | Extends IResultMetadata interface with new fields for tracking loop detection exit reasons |
| src/extension/prompt/node/chatParticipantTelemetry.ts | Updates telemetry signature to support new 'toolLoop' response type |
| src/extension/prompt/node/defaultIntentRequestHandler.ts | Updates telemetry handling to use the new toolCallExitReason metadata field |
Comments suppressed due to low confidence (1)
src/extension/intents/node/toolCallingLoop.ts:204
- This use of variable 'lastResult' always evaluates to true.
if (textLoopDetection && lastResult) {
|
|
||
| function splitSentences(text: string): string[] { | ||
| return text | ||
| .split(/[\.\!\?\n\r]+/g) |
There was a problem hiding this comment.
The regex pattern contains unescaped special characters in a character class. In regex character classes, the exclamation mark and question mark don't need escaping, but the period does. The current pattern /[\.\!\?\n\r]+/g should be /[.!?\n\r]+/g - the backslashes before ! and ? are unnecessary (though harmless), but the period is correctly escaped. However, for clarity and consistency with regex best practices, consider using /[.!?\n\r]+/g to remove unnecessary escapes.
| .split(/[\.\!\?\n\r]+/g) | |
| .split(/[.!?\n\r]+/g) |
| export function detectToolCallLoop(toolCallRounds: readonly IToolCallRound[]): ToolCallLoopDetectionResult | undefined { | ||
| const allCalls: IToolCall[] = []; | ||
| for (const round of toolCallRounds) { | ||
| if (!round.toolCalls.length) { | ||
| continue; | ||
| } | ||
| for (const call of round.toolCalls) { | ||
| allCalls.push(call); | ||
| } | ||
| } | ||
|
|
||
| // Require a minimum number of calls overall before we even consider this a loop. | ||
| const minTotalCalls = 12; | ||
| if (allCalls.length < minTotalCalls) { | ||
| return undefined; | ||
| } | ||
|
|
||
| // Look at a sliding window of the most recent calls to see if | ||
| // the model is bouncing between the same one or two tool invocations. | ||
| const windowSize = 20; | ||
| const recent = allCalls.slice(-Math.min(windowSize, allCalls.length)); | ||
| if (recent.length < minTotalCalls) { | ||
| return undefined; | ||
| } | ||
|
|
||
| const toolCountsWindow: Record<string, number> = Object.create(null); | ||
| for (const call of recent) { | ||
| const key = `${call.name}:${call.arguments}`; | ||
| toolCountsWindow[key] = (toolCountsWindow[key] || 0) + 1; | ||
| } | ||
|
|
||
| const keys = Object.keys(toolCountsWindow); | ||
| const uniqueToolKeyCount = keys.length; | ||
| if (uniqueToolKeyCount === 0) { | ||
| return undefined; | ||
| } | ||
|
|
||
| // We only consider it a loop if the recent window is dominated by | ||
| // one or two repeating tool+argument combinations. | ||
| const maxKeyCount = keys.reduce((max, key) => Math.max(max, toolCountsWindow[key]), 0); | ||
| const maxDistinctKeys = 2; | ||
| const minRepeatsForLoop = 6; | ||
| if (uniqueToolKeyCount <= maxDistinctKeys && maxKeyCount >= minRepeatsForLoop) { | ||
| return { | ||
| toolCountsWindow, | ||
| windowSize: recent.length, | ||
| uniqueToolKeyCount, | ||
| maxKeyCount, | ||
| totalToolCallRounds: toolCallRounds.length, | ||
| totalToolCalls: allCalls.length, | ||
| }; | ||
| } | ||
|
|
||
| return undefined; | ||
| } | ||
|
|
||
| export interface ITextLoopDetectionResult { | ||
| readonly repeatCount: number; | ||
| readonly totalSentences: number; | ||
| readonly totalRounds: number; | ||
| readonly responseLength: number; | ||
| } | ||
|
|
||
| function splitSentences(text: string): string[] { | ||
| return text | ||
| .split(/[\.\!\?\n\r]+/g) | ||
| .map(s => s.trim()) | ||
| .filter(s => s.length > 0); | ||
| } | ||
|
|
||
| function normalizeSentence(sentence: string): string { | ||
| return sentence | ||
| .toLowerCase() | ||
| .replace(/[^a-z0-9]+/g, ' ') | ||
| .trim(); | ||
| } | ||
|
|
||
| export function detectTextLoop(toolCallRounds: readonly IToolCallRound[]): ITextLoopDetectionResult | undefined { | ||
| const lastRound = toolCallRounds.at(-1); | ||
| if (!lastRound) { | ||
| return undefined; | ||
| } | ||
|
|
||
| const response = lastRound.response; | ||
| const minResponseLength = 200; | ||
| if (!response || response.length < minResponseLength) { | ||
| return undefined; | ||
| } | ||
|
|
||
| const sentences = splitSentences(response); | ||
| if (sentences.length < 3) { | ||
| return undefined; | ||
| } | ||
|
|
||
| const sentenceCounts: Record<string, number> = Object.create(null); | ||
| let maxCount = 0; | ||
| for (const sentence of sentences) { | ||
| const normalized = normalizeSentence(sentence); | ||
| if (normalized.length < 30) { | ||
| continue; | ||
| } | ||
| const count = (sentenceCounts[normalized] || 0) + 1; | ||
| sentenceCounts[normalized] = count; | ||
| if (count > maxCount) { | ||
| maxCount = count; | ||
| } | ||
| } | ||
|
|
||
| const minRepeatsForTextLoop = 3; | ||
| if (maxCount >= minRepeatsForTextLoop) { | ||
| return { | ||
| repeatCount: maxCount, | ||
| totalSentences: sentences.length, | ||
| totalRounds: toolCallRounds.length, | ||
| responseLength: response.length, | ||
| }; | ||
| } | ||
|
|
||
| return undefined; | ||
| } |
There was a problem hiding this comment.
The new loop detection functions detectToolCallLoop and detectTextLoop contain complex heuristics with multiple thresholds and conditions, but lack test coverage. Consider adding unit tests to verify the loop detection logic works correctly across various scenarios, such as:
- Detecting loops with different numbers of repeated tool calls
- Not triggering false positives with legitimate tool call sequences
- Correctly identifying repeated sentences in text
- Handling edge cases like empty inputs, short sequences, etc.
|
|
||
| this.toolCallRounds.push(result.round); | ||
| const loopDetection = isGeminiFamily ? detectToolCallLoop(this.toolCallRounds) : undefined; | ||
| if (loopDetection && lastResult) { |
There was a problem hiding this comment.
This use of variable 'lastResult' always evaluates to true.
This issue also appears in the following locations of the same file:
- line 204
See below for a potential fix:
if (loopDetection) {
lastResult = this.hitToolCallLoop(outputStream, lastResult, loopDetection);
break;
}
const textLoopDetection = isGeminiFamily ? detectTextLoop(this.toolCallRounds) : undefined;
if (textLoopDetection) {
lastResult = this.hitTextLoop(outputStream, lastResult, textLoopDetection);
break;
}
if (!result.round.toolCalls.length || result.response.type !== ChatFetchResponseType.Success) {
break;
}
} catch (e) {
if (isCancellationError(e) && lastResult) {
There was a problem hiding this comment.
Pull request overview
Adds loop-detection heuristics to the tool-calling flow (with Gemini-specific gating) and reports loop outcomes via chat/tool-calling telemetry so the extension can better detect and diagnose runaway tool iterations.
Changes:
- Add tool-call loop detection and repeated-text loop detection helpers.
- Abort tool-calling early on detected loops (Gemini family) and annotate result metadata / emit telemetry.
- Extend tool-calling telemetry to include a new terminal response type (
toolLoop) and plumb it through inline chat + default handler.
Show a summary per file
| File | Description |
|---|---|
| src/extension/prompt/node/defaultIntentRequestHandler.ts | Maps tool-call exit metadata to a telemetry response type (toolLoop vs maxToolCalls). |
| src/extension/prompt/node/chatParticipantTelemetry.ts | Extends sendToolCallingTelemetry() responseType union to include toolLoop. |
| src/extension/prompt/common/toolCallRound.ts | Introduces detectToolCallLoop() and detectTextLoop() heuristics. |
| src/extension/prompt/common/conversation.ts | Adds metadata fields for tool-call exit reason and text-loop detection. |
| src/extension/intents/node/toolCallingLoop.ts | Runs loop detection (Gemini) during tool iterations; emits telemetry/logs and sets result metadata on abort. |
| src/extension/inlineChat/node/inlineChatIntent.ts | Adds loop detection to inline chat tool strategy and reports toolLoop outcome. |
Copilot's findings
Comments suppressed due to low confidence (2)
src/extension/intents/node/toolCallingLoop.ts:396
- These error logs include
detection.toolCountsWindow, which is keyed bytool name + arguments. Tool arguments can contain file paths or user content, so logging the raw window risks leaking sensitive data into logs and can be very large. Consider logging only tool names/counts, or a redacted/hashed/truncated form.
this._logService.error(`Tool calling loop detected for conversation ${this.options.conversation.sessionId} (model: ${this.options.request.model?.id ?? 'unknown'})`);
this._logService.error(`Tool calling loop window: ${JSON.stringify(detection.toolCountsWindow)}`);
src/extension/intents/node/toolCallingLoop.ts:416
toolCountsWindowis serialized from keys that include tool arguments and sent in telemetry. Tool arguments can include user/workspace data, so this is likely not "SystemMetaData" and may violate telemetry data classification. Consider removing arguments from the payload (e.g. aggregate by tool name) or sending a hashed/redacted representation with a strict size cap.
this._telemetryService.sendMSFTTelemetryEvent('toolCalling.loopDetected', {
model: this.options.request.model?.id,
conversationId: this.options.conversation.sessionId,
toolCountsWindow: JSON.stringify(detection.toolCountsWindow),
}, {
windowSize: detection.windowSize,
- Files reviewed: 6/6 changed files
- Comments generated: 4
| const toolCountsWindow: Record<string, number> = Object.create(null); | ||
| for (const call of recent) { | ||
| const key = `${call.name}:${call.arguments}`; | ||
| toolCountsWindow[key] = (toolCountsWindow[key] || 0) + 1; | ||
| } |
There was a problem hiding this comment.
detectToolCallLoop builds its window key from call.arguments, which can be very large and may include user/workspace content. This risks high memory/CPU overhead (long string keys) and can feed sensitive data into downstream logging/telemetry that consumes toolCountsWindow. Consider using only call.name, or a redacted/hashed/truncated representation of arguments.
| const endpoint = await this._endpointProvider.getChatEndpoint(this.options.request); | ||
| const isGeminiFamily = endpoint.family.toLowerCase().includes('gemini'); |
There was a problem hiding this comment.
This adds an extra getChatEndpoint() call at the start of run() just to compute isGeminiFamily, but runOne() later resolves the endpoint again (e.g. for token counting). Consider caching the resolved endpoint on the loop instance (or passing it into runOne) to avoid repeated async resolution and model metadata fetch/logging.
This issue also appears in the following locations of the same file:
- line 394
- line 411
| const metadata = result.chatResult.metadata as IResultMetadata | undefined; | ||
| if (metadata?.maxToolCallsExceeded) { | ||
| const responseType = metadata.toolCallExitReason === 'loopDetected' ? 'toolLoop' : 'maxToolCalls'; | ||
| loop.telemetry.sendToolCallingTelemetry(result.toolCallRounds, result.availableTools, responseType); |
There was a problem hiding this comment.
When hitTextLoop() sets metadata.textLoopDetected, DefaultIntentRequestHandler never calls sendToolCallingTelemetry() because it only checks for maxToolCallsExceeded (or a terminal non-tool response). That means tool-calling telemetry/details are skipped for text-loop aborts. Consider also sending tool-calling telemetry when textLoopDetected is true (and pick an appropriate responseType).
See below for a potential fix:
const sentTerminalToolCallingTelemetry = !result.round.toolCalls.length || result.response.type !== ChatFetchResponseType.Success;
if (sentTerminalToolCallingTelemetry) {
loop.telemetry.sendToolCallingTelemetry(result.toolCallRounds, result.availableTools, this.token.isCancellationRequested ? 'cancelled' : result.response.type);
}
result.chatResult ??= {};
const metadata = result.chatResult.metadata as IResultMetadata | undefined;
if (!sentTerminalToolCallingTelemetry && (metadata?.maxToolCallsExceeded || metadata?.textLoopDetected)) {
const responseType = metadata.textLoopDetected
? 'textLoop'
: metadata.toolCallExitReason === 'loopDetected' ? 'toolLoop' : 'maxToolCalls';
| this._logService.error(`Inline chat tool calling loop detected (model: ${endpoint.model})`); | ||
| this._logService.error(`Inline chat tool calling loop window: ${JSON.stringify(loopDetection.toolCountsWindow)}`); | ||
| didDetectLoop = true; | ||
| break; | ||
| } |
There was a problem hiding this comment.
Inline chat logs include loopDetection.toolCountsWindow (keyed by tool name + arguments). Tool arguments can contain user/workspace content and the JSON can be large; consider redacting/hashing/truncating, or logging only tool names/counts.
No description provided.