feat: add graceful timeout and early session links for Slack/Discord webhooks#755
feat: add graceful timeout and early session links for Slack/Discord webhooks#755kilo-code-bot[bot] wants to merge 1 commit intomainfrom
Conversation
…webhooks Add a 750s internal timeout (below Vercel's 800s maxDuration) to both the Slack and Discord webhook handlers so users get a friendly message instead of a silent failure when Cloud Agent sessions run long. When the timeout fires, post a message explaining the session is still running and swap the processing reaction to complete. Also send the Cloud Agent session link as soon as the session is created (not after it finishes). For Slack this is an ephemeral message with a View Session button; for Discord it's a regular reply with the session URL.
| const timeoutMessage = truncateForDiscord( | ||
| 'The Cloud Agent session is taking longer than expected. ' + | ||
| 'Our endpoint is shutting down, but the session is still running. ' + | ||
| 'Check the session link above to follow along.' |
There was a problem hiding this comment.
[WARNING]: Race condition — session link may not be posted yet when timeout message references it
The onCloudAgentSessionCreated callback fires postSessionLinkMessage as a fire-and-forget promise. That function performs 3 async operations (getInstallationByGuildId, getDbSessionIdFromCloudAgentId, postDiscordMessage) before the link actually appears in the channel.
If the timeout fires shortly after the session is created, this message says "Check the session link above to follow along" but the link message may not have been posted yet. The user would see the timeout message first, then the link message second — or the link message might fail entirely, leaving the user with no way to follow along.
Consider either:
awaiting the session link promise before posting the timeout message (ifsessionLinkSentis true)- Changing the wording to not assume the link is "above"
| const timeoutMessage = markdownToSlackMrkdwn( | ||
| 'The Cloud Agent session is taking longer than expected. ' + | ||
| 'Our endpoint is shutting down, but the session is still running. ' + | ||
| 'Check the session link above to follow along.' |
There was a problem hiding this comment.
[WARNING]: Same race condition as Discord — the ephemeral session link may not have been delivered yet when this message references it
The onCloudAgentSessionCreated callback fires postSessionLinkEphemeral as fire-and-forget. If the timeout fires shortly after session creation, the user sees "Check the session link above" before the ephemeral link is actually posted.
Same mitigation options as the Discord side: await the pending link promise before posting the timeout message, or adjust the wording.
| ]); | ||
|
|
||
| console.log(`[SlackBot:Webhook] processSlackMessage (${event.type}) timed out gracefully`); | ||
| return; |
There was a problem hiding this comment.
[WARNING]: logSlackBotRequest is skipped on the timeout path
The early return here means timed-out requests are never logged via logSlackBotRequest (which runs at line 354). This creates a blind spot in admin debugging — the longest-running requests (the ones most likely to need investigation) won't appear in the logs.
Consider adding a logSlackBotRequest call with status: 'timeout' before returning.
| /** | ||
| * Build the session URL for a cloud agent session based on the owner type | ||
| */ | ||
| function buildSessionUrl(dbSessionId: string, owner: Owner): string { |
There was a problem hiding this comment.
[SUGGESTION]: buildSessionUrl and getDbSessionIdFromCloudAgentId are now duplicated identically between the Discord and Slack webhook routes
Both functions have the exact same implementation. Consider extracting them into a shared module (e.g. src/lib/session-url.ts or similar) to keep them in sync.
| }); | ||
|
|
||
| const timeoutPromise = new Promise<null>(resolve => { | ||
| setTimeout(() => resolve(null), ENDPOINT_TIMEOUT_MS); |
There was a problem hiding this comment.
[SUGGESTION]: The setTimeout timer is never cleared on the normal (non-timeout) path
When botPromise resolves before the timeout, the 750-second timer continues running in the background. On a serverless platform this is unlikely to cause real issues (the function will be killed), but it's good hygiene to clear it:
let timeoutId: ReturnType<typeof setTimeout>;
const timeoutPromise = new Promise<null>(resolve => {
timeoutId = setTimeout(() => resolve(null), ENDPOINT_TIMEOUT_MS);
});
const result = await Promise.race([botPromise, timeoutPromise]);
clearTimeout(timeoutId!);Same applies to the Slack handler at src/app/slack/webhook/route.ts:265.
Code Review SummaryStatus: 5 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
SUGGESTION
Files Reviewed (4 files)
|
Summary
maxDuration) so users receive a friendly notification instead of a silent failure when Cloud Agent sessions run longChanges
src/app/slack/webhook/route.tsENDPOINT_TIMEOUT_MSconstant (750s)processKiloBotMessageagainst the timeout; on timeout, post a friendly message and swap reactionsonCloudAgentSessionCreatedcallback as soon as session is spawned (moved from post-completion)src/app/discord/webhook/route.tsENDPOINT_TIMEOUT_MSconstant (750s)buildSessionUrl,getDbSessionIdFromCloudAgentId, andpostSessionLinkMessagehelpers (mirrors Slack's existing pattern)processDiscordBotMessageagainst the timeout; on timeout, post a friendly message and swap reactionsonCloudAgentSessionCreatedcallback as soon as session is spawnedsrc/lib/slack-bot.tsprocessKiloBotMessageto accept an optionalonCloudAgentSessionCreatedcallbackcloudAgentSessionIdis set during tool executionsrc/lib/discord-bot.tsprocessDiscordBotMessageto accept an optionalonCloudAgentSessionCreatedcallbackcloudAgentSessionIdis set during tool executionBuilt for Remon Oldenbeuving by Kilo for Slack