Skip to content

feat(telemetry): local JSONL sink with per-session files#931

Open
EhabY wants to merge 5 commits intomainfrom
telemetry/local-jsonl-sink
Open

feat(telemetry): local JSONL sink with per-session files#931
EhabY wants to merge 5 commits intomainfrom
telemetry/local-jsonl-sink

Conversation

@EhabY
Copy link
Copy Markdown
Collaborator

@EhabY EhabY commented May 4, 2026

Summary

Adds LocalJsonlSink, the first concrete TelemetrySink on top of the Phase A foundation. Closes #901.

Each VS Code session writes its own files at:

globalStorage/telemetry/telemetry-YYYY-MM-DD-{sessionId8}.jsonl
                       telemetry-YYYY-MM-DD-{sessionId8}.1.jsonl   (size rotation)
                       telemetry-YYYY-MM-DD-{sessionId8}.2.jsonl

One writer per file means concurrent VS Code windows do not race on appends or rotation, so no lockfile is needed. Cleanup matches every session's files for shared retention.

The sink is best-effort: write is sync and never throws, all I/O happens in flush/dispose, errors are logged via the provided logger and never propagate to callers.

Configuration

Tunables come from coder.telemetry.localJsonl (object-typed, not registered in package.json yet, matching the coder.telemetry.level convention). Defaults:

Field Default
flushIntervalMs 15000
flushBatchSize 100
bufferLimit 500
maxFileBytes 5242880
maxAgeDays 30
maxTotalBytes 104857600

The sink subscribes via watchConfigurationChanges, so changes apply without a window reload. Invalid or missing fields fall back to defaults.

Why per-session naming over a shared file

fs.appendFile is only atomic per write() syscall up to PIPE_BUF (~4KB). A 100-event batch can split across syscalls, letting two windows interleave bytes mid-line. proper-lockfile exists in the repo (used by BinaryLock) but its workload is one-shot and waits-or-skips; a hot-path lock with stale-timeout tradeoffs is a worse fit than just having one writer per file.

LocalJsonlSink is the first concrete TelemetrySink. Each VS Code
session writes its own files at
globalStorage/telemetry/telemetry-YYYY-MM-DD-{sessionId8}.jsonl,
with .N.jsonl size segments. One writer per file means concurrent
windows do not race on appends or rotation, so no lockfile is needed.

Tunables come from coder.telemetry.localJsonl (object-typed, not
registered in package.json yet, same convention as coder.telemetry.level).
Defaults: 15s flush, 100-event batch trigger, 500-event hard cap, 5MB
per file, 30 day age, 100MB total. The sink subscribes via
watchConfigurationChanges so changes apply without a window reload.

cleanupFiles moves from sshProcess.ts to src/util/fileCleanup.ts and
is shared between both. PathResolver gains getTelemetryPath().

Also drops a redundant value re-check in TelemetryService's level
watcher: readLevel already sanitizes its return.

Closes #901
@EhabY EhabY force-pushed the telemetry/local-jsonl-sink branch 2 times, most recently from b2cb18e to f40313f Compare May 4, 2026 15:01
…nd tests

Move readTelemetryLevel and readLocalJsonlConfig (with the setting
key constants and defaults) into src/settings/telemetry.ts, matching
the existing notifications and cli settings pattern. Both functions
take a WorkspaceConfiguration argument so they are easy to test.

LocalJsonlSink keeps its static start() factory but the constructor
is now side-effect-free: it just assigns fields. start() is the
single place that reads the config, subscribes the watcher, schedules
the flush timer, and kicks off cleanup. The class implements
vscode.Disposable explicitly.

Tighten comments on the sink and on cleanupFiles. Inline #fileName
into #doFlush since it had only one caller.

Tests: per-test setup() harness instead of beforeEach-installed
helpers. Module-level state is gone, all setup happens inside each
test through setup(), and afterEach disposes any active sinks.

Drop the "warns once" log-side-effect assertion from the buffer
overflow test; the file content already proves the behavior.
fileCleanup tests now verify the contract by what is left in the
volume after the call rather than by spying on fs.stat or pick.
The debug-summary log assertion is dropped, and the promise-identity
check in the coalescer test is dropped (call count plus final file
content already prove the contract).
@EhabY EhabY force-pushed the telemetry/local-jsonl-sink branch from 9e7a6dc to 20d3006 Compare May 4, 2026 15:12
@EhabY EhabY self-assigned this May 4, 2026
@EhabY
Copy link
Copy Markdown
Collaborator Author

EhabY commented May 4, 2026

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean extraction of cleanupFiles into a shared utility, good isolation via per-session file naming, and the coalescing flush design is solid. The chained setTimeout over setInterval prevents flush pileup, and the test that verifies coalescing uses real microtask timing rather than faking call counts.

4 P2, 10 P3, 2 Nit, 2 Note.

The P2 cluster centers on observability during failure. The #overflowWarned flag enters a permanent silent state when flushes fail (DEREM-4), the flush-failure log omits event counts (DEREM-8), and the #overflowWarned reset behavior is untested after the assertion was dropped (DEREM-5). Together these mean a disk-full incident produces one overflow warning, then silence, while events accumulate and drop with no magnitude signal. Separately, the outer catch {} in cleanupFiles swallows all errors (DEREM-6).

One product concern needs a human decision: readTelemetryLevel defaults to "local", and this PR adds the first sink that acts on it (DEREM-7). Before this PR, "default local" was a no-op (zero sinks). Now events persist to disk for every user. coder.telemetry.level is not in package.json, so users cannot discover or change it without reading source. Either register the setting or change the default to "off".

Hisoka on the overflow silence: "the total number of silently dropped events is unbounded."

🤖 This review was automatically generated with Coder Agents.

Comment thread src/settings/telemetry.ts
Comment thread test/unit/telemetry/sinks/localJsonlSink.test.ts Outdated
Comment thread src/telemetry/sinks/localJsonlSink.ts
Comment thread src/telemetry/sinks/localJsonlSink.ts Outdated
Comment thread test/unit/telemetry/sinks/localJsonlSink.test.ts
Comment thread src/telemetry/sinks/localJsonlSink.ts Outdated
Comment thread src/settings/telemetry.ts Outdated
Comment thread src/telemetry/service.ts
Comment thread src/telemetry/sinks/localJsonlSink.ts
Comment thread src/telemetry/sinks/localJsonlSink.ts
- Register coder.telemetry.level and coder.telemetry.localJsonl in
  package.json so they show up in the settings UI with descriptions,
  enums, per-field minimums, and defaults.
- Use Buffer.byteLength for UTF-8 size accounting so rotation triggers
  on actual bytes, not UTF-16 code units.
- Seed #current.size from disk on the first flush after activation so
  an Extension Host restart does not let segment 0 grow past
  maxFileBytes.
- Reset #overflowWarned in the flush failure path and report the
  discarded event count, so persistent failures stay visible in logs.
- Warn when bufferLimit < flushBatchSize at start and on config
  change; cross-field invariant is otherwise silently broken.
- Scope cleanupFiles' outer try/catch to readdir only, log non-ENOENT
  errors at debug, and basename-clamp pick names so unlinks cannot
  escape the cleanup directory.
- Refactor LocalJsonlSink for readability: extract pickByAgeAndSize,
  reorder methods newspaper-style, drop the unused #ready/#dirEnsured
  flags now that mkdir runs on every flush (idempotent and cheap).
- Add tests for the settings module, cleanup edge cases, the warn
  reset invariant, post-dispose writes, and multi-file deletion.
@EhabY
Copy link
Copy Markdown
Collaborator Author

EhabY commented May 4, 2026

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All four R1 P2 findings are verified fixed. All ten R1 P3 findings are verified fixed. Clean response across the board; the author addressed every finding with appropriate fixes and good test coverage.

DEREM-16 (contested Nit, as TelemetryLevel cast) is closed. Six panelists independently traced the readLevelreadTelemetryLevel chain and confirmed the cast is safe by construction. The changes map can only hold valid TelemetryLevel values. DEREM-17 and DEREM-18 (Notes) are accepted as intentional design.

7 P3, 1 Nit, 1 Note new this round. The main themes: (1) the DEREM-10 fix's date-mismatch branch doesn't check wouldExceed, so the "segment 0 already full" edge case remains (DEREM-19); (2) the statBytes helper swallows all errors including EMFILE, inconsistent with the cleanupFiles pattern established in this PR (DEREM-24); (3) the fire-and-forget cleanup can race with the first flush and delete freshly written events under small maxTotalBytes (DEREM-25); (4) two R1 fixes lack test coverage (DEREM-22 for the rotation guard, DEREM-23 for warnIfBufferTooSmall).

Mafuuu on the lifecycle: "Init → ready → flushing → ready → disposed. No terminal state that blocks retry."

🤖 This review was automatically generated with Coder Agents.

Comment thread src/telemetry/sinks/localJsonlSink.ts
Comment thread test/unit/telemetry/sinks/localJsonlSink.test.ts Outdated
Comment thread src/settings/telemetry.ts Outdated
Comment thread test/unit/telemetry/sinks/localJsonlSink.test.ts Outdated
Comment thread src/telemetry/sinks/localJsonlSink.ts
Comment thread src/telemetry/sinks/localJsonlSink.ts
Comment thread src/telemetry/sinks/localJsonlSink.ts
Comment thread src/telemetry/sinks/localJsonlSink.ts Outdated
Comment thread src/telemetry/sinks/localJsonlSink.ts
- Apply rotation check after seeding from disk so an EH restart that
  resumes a full segment 0 immediately rotates instead of overrunning.
- Skip files this session is writing to in cleanup, so a low
  maxTotalBytes cannot delete events the sink just wrote.
- Enforce per-field minimums in readLocalJsonlConfig matching the
  package.json schema, so direct settings.json edits below the floor
  fall back to defaults instead of producing pathological values.
- Tighten minimums where the schema floor was meaninglessly low
  (bufferLimit 1 -> 10, maxFileBytes 1024 -> 4096, maxTotalBytes
  1024 -> 4096).
- Re-schedule the flush timer when flushIntervalMs changes so a config
  update takes effect immediately rather than waiting up to one full
  interval.
- Drop the dead .catch in #scheduleNextFlush; flush() never rejects.
- Distinguish ENOENT in statBytes; log other errors at debug.
- Make LocalJsonlConfig fields readonly; un-export MINIMUMS.
- Add tests for: oversized single payload (rotation guard), the
  bufferLimit-too-small warning, and the third-caller coalesce
  invariant (p2 === p3).
@EhabY EhabY requested a review from jakehwll May 5, 2026 09:45
- Rename setting `coder.telemetry.localJsonl` → `coder.telemetry.local`
  (aligns with the level value; drops format from the user-facing key).
- Rename internals: LocalSinkConfig / LOCAL_SINK_SETTING /
  readLocalSinkConfig.
- fileCleanup: rename options for clarity — fileType → label,
  match → filter, pick → select. Update sshProcess.ts caller.
- pathResolver#getTelemetryPath: trim oversized comment.
- localJsonlSink: fold reschedule into scheduleNextFlush; use
  spread-style optional fields in serializeEvent for consistency
  with service.ts.
- package.json: tighten setting descriptions, drop confusing
  "at least local" wording.
- Tests: harden coalescing test with a deterministic appendFile
  barrier; switch warning matcher to stringContaining; rename
  symbols.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telemetry: LocalJsonlSink with buffering, rotation, and cleanup

1 participant