Skip to content

Add security agent auto-analysis queue system#625

Open
jeanduplessis wants to merge 5 commits intomainfrom
jdp/security-agent-auto-analysis
Open

Add security agent auto-analysis queue system#625
jeanduplessis wants to merge 5 commits intomainfrom
jdp/security-agent-auto-analysis

Conversation

@jeanduplessis
Copy link
Contributor

@jeanduplessis jeanduplessis commented Feb 26, 2026

Summary

  • Adds a queue-based auto-analysis pipeline that automatically triages and analyzes security findings based on configurable severity thresholds
  • Introduces a Cloudflare Worker (cloudflare-security-auto-analysis) that dispatches due owners on a per-minute cron, claims queued findings with pessimistic locking (FOR UPDATE SKIP LOCKED), runs LLM triage to filter noise, then launches full analysis sessions via cloud-agent-next
  • New DB tables security_analysis_queue and security_analysis_owner_state (migration 0038)

Architecture

The worker connects to Postgres via Hyperdrive using @kilocode/db + Drizzle ORM (matching the cloudflare-security-sync reference pattern). It uses CF service bindings to cloud-agent-next (for launching analysis sessions) and git-token-service (for repo access). A dead letter queue captures permanently failed messages.

Pipeline flow:

  1. Enqueue — findings are queued during Dependabot sync (sync-service.ts) when auto_analysis_enabled is true and the finding meets the auto_analysis_min_severity threshold
  2. Dispatch — a per-minute cron discovers owners with queued work and sends them to a CF Queue
  3. Consume — the queue consumer claims rows per-owner with atomic lease acquisition, resolves an actor (org owner → member fallback), and processes each finding
  4. Triage — an LLM-based Tier 1 triage determines whether sandbox analysis is needed, filtering noise before launching expensive sessions
  5. Launch — eligible findings are analyzed via cloud-agent-next, with InsufficientCreditsError handling that blocks the owner and requeues remaining work
  6. Callback — session results update findings via the existing callback endpoint, with session-mismatch detection to prevent stale callbacks from overwriting newer results

Config

  • auto_analysis_enabled (default: false) and auto_analysis_min_severity (default: 'high') on the security agent config
  • auto_analysis_enabled_at timestamp on security_analysis_owner_state prevents retroactive analysis of old findings

Other changes

  • GDPR soft-delete coverage for new tables (user.ts / user.test.ts)
  • CI pipeline and production deploy workflow for the new worker
  • Runbook documentation (docs/security-auto-analysis-runbook.md)
  • Unrelated: removed unused FlatCompat import from root eslint.config.mjs

Post-deployment

No manual steps required. Migration 0038 runs automatically via the run-migrations job. Secrets, Hyperdrive, and service bindings are all shared resources that already exist. The feature is gated behind auto_analysis_enabled (defaults to false), so the cron will be a no-op until an owner opts in.

Verify after first deploy:

  • Confirm the dead letter queues exist (security-auto-analysis-owner-dlq and -dlq-dev). Wrangler should auto-create them, but if not:
    wrangler queues create security-auto-analysis-owner-dlq
    wrangler queues create security-auto-analysis-owner-dlq-dev
    

@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Feb 26, 2026

Code Review Summary

Status: No New Issues Found | Recommendation: Address existing review comments, then merge

Overview

This is a large, well-structured PR that introduces a new Cloudflare Worker (cloudflare-security-auto-analysis) for automated security finding triage and analysis, along with supporting changes in the main app (queue sync during Dependabot sync, callback handling, GDPR deletion, config plumbing).

The previous review rounds (72 inline comments) identified and addressed a comprehensive set of issues including:

  • CRITICAL: .strict() on Zod schemas rejecting stored configs, YAML structural error in deploy workflow, config upsert regression
  • WARNING: Missing updated_at in DB helpers, non-transactional queue sync, stuck queue rows on various failure paths, TOCTOU races, timing-safe comparison issues, error classification gaps
  • SUGGESTION: Eager secret fetching, blocked-owner filtering in dispatcher, documentation corrections

All flagged issues appear to have been addressed in subsequent commits. No new issues were identified in this review pass.

Files Reviewed (48 files)
  • .github/workflows/ci.yml - CI job for new worker
  • .github/workflows/deploy-production.yml - Production deploy wiring
  • .github/workflows/deploy-security-auto-analysis.yml - New deploy workflow
  • DEVELOPMENT.md - Minor doc update
  • cloudflare-ai-attribution/src/ai-attribution.worker.ts - Removed stale ts-expect-error
  • cloudflare-security-auto-analysis/.gitignore - New
  • cloudflare-security-auto-analysis/README.md - New worker documentation
  • cloudflare-security-auto-analysis/eslint.config.mjs - New
  • cloudflare-security-auto-analysis/package.json - New
  • cloudflare-security-auto-analysis/src/consumer.ts - Queue consumer logic
  • cloudflare-security-auto-analysis/src/db/queries.ts - Worker DB queries
  • cloudflare-security-auto-analysis/src/dispatcher.ts - Cron dispatcher
  • cloudflare-security-auto-analysis/src/index.ts - Worker entrypoint
  • cloudflare-security-auto-analysis/src/launch.ts - Analysis launch logic
  • cloudflare-security-auto-analysis/src/logger.ts - Logger setup
  • cloudflare-security-auto-analysis/src/token.ts - JWT token generation
  • cloudflare-security-auto-analysis/src/triage.ts - LLM triage logic
  • cloudflare-security-auto-analysis/src/types.test.ts - Type tests
  • cloudflare-security-auto-analysis/src/types.ts - Shared types/constants
  • cloudflare-security-auto-analysis/tsconfig.json - New
  • cloudflare-security-auto-analysis/vitest.config.ts - New
  • cloudflare-security-auto-analysis/worker-configuration.d.ts - Type declarations
  • cloudflare-security-auto-analysis/wrangler.jsonc - Worker config
  • cloudflare-webhook-agent-ingest/src/index.ts - Removed stale ts-expect-error
  • eslint.config.mjs - ESLint config cleanup
  • jest.config.ts - Test config update
  • packages/db/src/migrations/0039_naive_yellow_claw.sql - Migration
  • packages/db/src/migrations/meta/0039_snapshot.json - Migration snapshot
  • packages/db/src/migrations/meta/_journal.json - Migration journal
  • packages/db/src/schema.ts - New tables + indexes
  • pnpm-lock.yaml - Lock file
  • pnpm-workspace.yaml - Workspace config
  • src/app/api/internal/security-analysis-callback/[findingId]/route.test.ts - Callback tests
  • src/app/api/internal/security-analysis-callback/[findingId]/route.ts - Callback handler
  • src/lib/security-agent/core/constants.ts - Constants + config parser
  • src/lib/security-agent/core/schemas.ts - Zod schemas
  • src/lib/security-agent/core/types.ts - Core types
  • src/lib/security-agent/db/security-analysis.test.ts - Analysis DB tests
  • src/lib/security-agent/db/security-analysis.ts - Queue sync + analysis DB
  • src/lib/security-agent/db/security-config.ts - Config upsert
  • src/lib/security-agent/db/security-findings.ts - Upsert returns metadata
  • src/lib/security-agent/services/analysis-service.test.ts - Analysis service tests
  • src/lib/security-agent/services/analysis-service.ts - Analysis service
  • src/lib/security-agent/services/sync-service.test.ts - New sync queue tests
  • src/lib/security-agent/services/sync-service.ts - Queue sync integration
  • src/lib/user.test.ts - GDPR deletion tests
  • src/lib/user.ts - GDPR soft-delete
  • src/routers/organizations/organization-security-agent-router.ts - Org router
  • src/routers/security-agent-router.ts - User router

@jeanduplessis jeanduplessis force-pushed the jdp/security-agent-auto-analysis branch 2 times, most recently from 71a84d0 to 7e56799 Compare February 26, 2026 20:12
@jeanduplessis jeanduplessis force-pushed the jdp/security-agent-auto-analysis branch from 7e56799 to e929176 Compare March 2, 2026 10:24
@jeanduplessis jeanduplessis force-pushed the jdp/security-agent-auto-analysis branch from 3289a11 to fdf5974 Compare March 2, 2026 11:21
@jeanduplessis jeanduplessis force-pushed the jdp/security-agent-auto-analysis branch 2 times, most recently from 8d45b7d to c2670ca Compare March 2, 2026 12:03
@jeanduplessis jeanduplessis force-pushed the jdp/security-agent-auto-analysis branch from c2670ca to 8e07fb9 Compare March 2, 2026 14:32
@jeanduplessis jeanduplessis force-pushed the jdp/security-agent-auto-analysis branch from 71ee7d0 to bbc9f85 Compare March 2, 2026 16:47
@jeanduplessis jeanduplessis force-pushed the jdp/security-agent-auto-analysis branch from bbc9f85 to e366fc3 Compare March 3, 2026 08:16
@jeanduplessis jeanduplessis force-pushed the jdp/security-agent-auto-analysis branch from 545453e to 7097c84 Compare March 3, 2026 10:41
@jeanduplessis jeanduplessis force-pushed the jdp/security-agent-auto-analysis branch 2 times, most recently from 41a79e2 to 47f9d7f Compare March 3, 2026 11:06
Cloudflare Worker that automatically triages and analyzes security
findings via a queue-based pipeline. Dispatches due owners on a cron
schedule, claims queued findings per-owner with pessimistic locking,
runs LLM triage to filter noise, then launches full analysis sessions
via cloud-agent-next.

Uses @kilocode/db with Drizzle ORM for all database access through
Hyperdrive, matching the cloudflare-security-sync reference pattern.
Includes DB migration for security_analysis_queue and
security_analysis_owner_state tables, plus indexes on
security_findings for in-flight analysis tracking.
@jeanduplessis jeanduplessis force-pushed the jdp/security-agent-auto-analysis branch from 47f9d7f to 616f9dd Compare March 3, 2026 14:28
- Wrap finalizeAnalysis in try/catch so queue transitions to 'failed' on throw
- Widen transitionAutoAnalysisQueueFromCallback to match 'pending' OR 'running'
- Document fallback CTE previousStatus limitation during concurrent insert races
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants