You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
improvement(memory): replace unbounded server caches with lru-cache to fix heap growth (#4652)
* fix(memory): prune toolSchemaCache and semaphores to prevent heap growth
toolSchemaCache (lib/copilot/chat/payload.ts): module-level Map keyed by
userId:workspaceId never deleted expired entries, only checked TTL on read.
With 100K+ unique user/workspace pairs each holding 50-200KB of tool schemas,
this was the primary driver of the 24MB -> 25GB heap growth observed in
CloudWatch. Add a setInterval sweep every 30s (matching the TTL) with .unref()
so it does not prevent graceful shutdown.
semaphores (lib/core/async-jobs/backends/database.ts): acquireSlot created
Semaphore entries that releaseSlot never deleted. With per-execution UUID keys
(e.g. scheduleJobId), each scheduled workflow run would add a permanent entry.
Store the concurrency limit on the Semaphore struct and delete the entry from
the Map when all slots are free and no waiters remain.
validatorCache (lib/copilot/tools/server/generated-schema.ts): validated as
bounded (93 tools x 2 schema kinds = 186 max entries, ~2-9MB). No fix needed.
isolated-vm nativeContexts: validated as deferred GC, self-healed by worker
rotation at MAX_EXECUTIONS_PER_WORKER=200. externalMB spikes trace to
concurrent isolate heaps at peak load (128MB limit x active isolates), not a
reference leak. No fix needed.
* fix(memory): prune effectiveEnvCache and instrument cache sizes in telemetry
effectiveEnvCache (lib/environment/utils.ts): same unbounded accumulation
pattern as toolSchemaCache — module-level Map keyed by userId:workspaceId
with a 15s TTL that is only checked on read, never proactively evicted.
Adds a periodic sweep matching the TTL interval with .unref().
cache-registry (lib/monitoring/cache-registry.ts): lightweight registry
so modules can expose their cache sizes to telemetry without coupling.
toolSchemaCache and effectiveEnvCache both register on module load.
memory-telemetry: emits cacheSizes in every Memory snapshot log so
CloudWatch can confirm the caches stay bounded post-deploy.
* improvement(memory): replace manual TTL Maps with lru-cache for toolSchemaCache and effectiveEnvCache
Replaces the homegrown Map + setInterval sweep pattern with LRUCache from
the lru-cache npm package, which is the standard Node.js solution for
bounded in-process caching with TTL.
Changes per cache:
- Removes manual ToolSchemaCacheEntry / EffectiveEnvCacheEntry types
- Removes setInterval sweep timers (and the .unref() boilerplate)
- Removes the two-phase promise->value entry update inside the IIFE
- Stores Promise<T> directly — in-flight and resolved states share one type
- max: 200 (toolSchemaCache) / max: 500 (effectiveEnvCache) as hard ceilings
- TTL behaviour and concurrent-request deduplication are preserved exactly
- cache-registry .size reporting works unchanged via lru-cache's .size prop
* fix(memory): remove redundant waiters guard in releaseSlot
0 commit comments