feat(telemetry): local JSONL sink with per-session files#931
feat(telemetry): local JSONL sink with per-session files#931
Conversation
LocalJsonlSink is the first concrete TelemetrySink. Each VS Code
session writes its own files at
globalStorage/telemetry/telemetry-YYYY-MM-DD-{sessionId8}.jsonl,
with .N.jsonl size segments. One writer per file means concurrent
windows do not race on appends or rotation, so no lockfile is needed.
Tunables come from coder.telemetry.localJsonl (object-typed, not
registered in package.json yet, same convention as coder.telemetry.level).
Defaults: 15s flush, 100-event batch trigger, 500-event hard cap, 5MB
per file, 30 day age, 100MB total. The sink subscribes via
watchConfigurationChanges so changes apply without a window reload.
cleanupFiles moves from sshProcess.ts to src/util/fileCleanup.ts and
is shared between both. PathResolver gains getTelemetryPath().
Also drops a redundant value re-check in TelemetryService's level
watcher: readLevel already sanitizes its return.
Closes #901
b2cb18e to
f40313f
Compare
…nd tests Move readTelemetryLevel and readLocalJsonlConfig (with the setting key constants and defaults) into src/settings/telemetry.ts, matching the existing notifications and cli settings pattern. Both functions take a WorkspaceConfiguration argument so they are easy to test. LocalJsonlSink keeps its static start() factory but the constructor is now side-effect-free: it just assigns fields. start() is the single place that reads the config, subscribes the watcher, schedules the flush timer, and kicks off cleanup. The class implements vscode.Disposable explicitly. Tighten comments on the sink and on cleanupFiles. Inline #fileName into #doFlush since it had only one caller. Tests: per-test setup() harness instead of beforeEach-installed helpers. Module-level state is gone, all setup happens inside each test through setup(), and afterEach disposes any active sinks. Drop the "warns once" log-side-effect assertion from the buffer overflow test; the file content already proves the behavior. fileCleanup tests now verify the contract by what is left in the volume after the call rather than by spying on fs.stat or pick. The debug-summary log assertion is dropped, and the promise-identity check in the coalescer test is dropped (call count plus final file content already prove the contract).
9e7a6dc to
20d3006
Compare
|
/coder-agents-review |
There was a problem hiding this comment.
Clean extraction of cleanupFiles into a shared utility, good isolation via per-session file naming, and the coalescing flush design is solid. The chained setTimeout over setInterval prevents flush pileup, and the test that verifies coalescing uses real microtask timing rather than faking call counts.
4 P2, 10 P3, 2 Nit, 2 Note.
The P2 cluster centers on observability during failure. The #overflowWarned flag enters a permanent silent state when flushes fail (DEREM-4), the flush-failure log omits event counts (DEREM-8), and the #overflowWarned reset behavior is untested after the assertion was dropped (DEREM-5). Together these mean a disk-full incident produces one overflow warning, then silence, while events accumulate and drop with no magnitude signal. Separately, the outer catch {} in cleanupFiles swallows all errors (DEREM-6).
One product concern needs a human decision: readTelemetryLevel defaults to "local", and this PR adds the first sink that acts on it (DEREM-7). Before this PR, "default local" was a no-op (zero sinks). Now events persist to disk for every user. coder.telemetry.level is not in package.json, so users cannot discover or change it without reading source. Either register the setting or change the default to "off".
Hisoka on the overflow silence: "the total number of silently dropped events is unbounded."
🤖 This review was automatically generated with Coder Agents.
- Register coder.telemetry.level and coder.telemetry.localJsonl in package.json so they show up in the settings UI with descriptions, enums, per-field minimums, and defaults. - Use Buffer.byteLength for UTF-8 size accounting so rotation triggers on actual bytes, not UTF-16 code units. - Seed #current.size from disk on the first flush after activation so an Extension Host restart does not let segment 0 grow past maxFileBytes. - Reset #overflowWarned in the flush failure path and report the discarded event count, so persistent failures stay visible in logs. - Warn when bufferLimit < flushBatchSize at start and on config change; cross-field invariant is otherwise silently broken. - Scope cleanupFiles' outer try/catch to readdir only, log non-ENOENT errors at debug, and basename-clamp pick names so unlinks cannot escape the cleanup directory. - Refactor LocalJsonlSink for readability: extract pickByAgeAndSize, reorder methods newspaper-style, drop the unused #ready/#dirEnsured flags now that mkdir runs on every flush (idempotent and cheap). - Add tests for the settings module, cleanup edge cases, the warn reset invariant, post-dispose writes, and multi-file deletion.
|
/coder-agents-review |
There was a problem hiding this comment.
All four R1 P2 findings are verified fixed. All ten R1 P3 findings are verified fixed. Clean response across the board; the author addressed every finding with appropriate fixes and good test coverage.
DEREM-16 (contested Nit, as TelemetryLevel cast) is closed. Six panelists independently traced the readLevel → readTelemetryLevel chain and confirmed the cast is safe by construction. The changes map can only hold valid TelemetryLevel values. DEREM-17 and DEREM-18 (Notes) are accepted as intentional design.
7 P3, 1 Nit, 1 Note new this round. The main themes: (1) the DEREM-10 fix's date-mismatch branch doesn't check wouldExceed, so the "segment 0 already full" edge case remains (DEREM-19); (2) the statBytes helper swallows all errors including EMFILE, inconsistent with the cleanupFiles pattern established in this PR (DEREM-24); (3) the fire-and-forget cleanup can race with the first flush and delete freshly written events under small maxTotalBytes (DEREM-25); (4) two R1 fixes lack test coverage (DEREM-22 for the rotation guard, DEREM-23 for warnIfBufferTooSmall).
Mafuuu on the lifecycle: "Init → ready → flushing → ready → disposed. No terminal state that blocks retry."
🤖 This review was automatically generated with Coder Agents.
- Apply rotation check after seeding from disk so an EH restart that resumes a full segment 0 immediately rotates instead of overrunning. - Skip files this session is writing to in cleanup, so a low maxTotalBytes cannot delete events the sink just wrote. - Enforce per-field minimums in readLocalJsonlConfig matching the package.json schema, so direct settings.json edits below the floor fall back to defaults instead of producing pathological values. - Tighten minimums where the schema floor was meaninglessly low (bufferLimit 1 -> 10, maxFileBytes 1024 -> 4096, maxTotalBytes 1024 -> 4096). - Re-schedule the flush timer when flushIntervalMs changes so a config update takes effect immediately rather than waiting up to one full interval. - Drop the dead .catch in #scheduleNextFlush; flush() never rejects. - Distinguish ENOENT in statBytes; log other errors at debug. - Make LocalJsonlConfig fields readonly; un-export MINIMUMS. - Add tests for: oversized single payload (rotation guard), the bufferLimit-too-small warning, and the third-caller coalesce invariant (p2 === p3).
- Rename setting `coder.telemetry.localJsonl` → `coder.telemetry.local` (aligns with the level value; drops format from the user-facing key). - Rename internals: LocalSinkConfig / LOCAL_SINK_SETTING / readLocalSinkConfig. - fileCleanup: rename options for clarity — fileType → label, match → filter, pick → select. Update sshProcess.ts caller. - pathResolver#getTelemetryPath: trim oversized comment. - localJsonlSink: fold reschedule into scheduleNextFlush; use spread-style optional fields in serializeEvent for consistency with service.ts. - package.json: tighten setting descriptions, drop confusing "at least local" wording. - Tests: harden coalescing test with a deterministic appendFile barrier; switch warning matcher to stringContaining; rename symbols.
Summary
Adds
LocalJsonlSink, the first concreteTelemetrySinkon top of the Phase A foundation. Closes #901.Each VS Code session writes its own files at:
One writer per file means concurrent VS Code windows do not race on appends or rotation, so no lockfile is needed. Cleanup matches every session's files for shared retention.
The sink is best-effort:
writeis sync and never throws, all I/O happens inflush/dispose, errors are logged via the provided logger and never propagate to callers.Configuration
Tunables come from
coder.telemetry.localJsonl(object-typed, not registered inpackage.jsonyet, matching thecoder.telemetry.levelconvention). Defaults:flushIntervalMsflushBatchSizebufferLimitmaxFileBytesmaxAgeDaysmaxTotalBytesThe sink subscribes via
watchConfigurationChanges, so changes apply without a window reload. Invalid or missing fields fall back to defaults.Why per-session naming over a shared file
fs.appendFileis only atomic perwrite()syscall up to PIPE_BUF (~4KB). A 100-event batch can split across syscalls, letting two windows interleave bytes mid-line.proper-lockfileexists in the repo (used byBinaryLock) but its workload is one-shot and waits-or-skips; a hot-path lock with stale-timeout tradeoffs is a worse fit than just having one writer per file.