Skip to content

feat: add SQLite database support to OpenCode analyzer#120

Merged
mike1858 merged 7 commits intomainfrom
feat/opencode-sqlite-support
Apr 5, 2026
Merged

feat: add SQLite database support to OpenCode analyzer#120
mike1858 merged 7 commits intomainfrom
feat/opencode-sqlite-support

Conversation

@mike1858
Copy link
Copy Markdown
Member

@mike1858 mike1858 commented Feb 17, 2026

Summary

OpenCode has gone through three storage format generations. This PR adds seamless support for all three alongside each other — no new tab, all data merges under the existing OpenCode tab.

  1. JSON files — Individual message files in ~/.local/share/opencode/storage/message/
  2. SQLite v1 (v1.1.53) — Initial opencode.db with basic tables
  3. SQLite v2 (current) — Evolved schema with workspaces, channel-specific DBs (opencode-{channel}.db), composite indexes, and per-step token accounting

What changed

SQLite parsing (v1 + v2)

  • Parse messages from ~/.local/share/opencode/opencode.db using the message, session, project, and part tables
  • Batch-load tool call stats from the part table with a LIKE pre-filter to avoid deserializing large non-tool parts (text, reasoning, etc.)
  • Open database read-only with WAL support and busy timeout for safe concurrent access while OpenCode is running

Channel-specific database support (v2)

  • Discover and parse opencode-{channel}.db files (e.g. opencode-canary.db) in addition to the default opencode.db
  • Updated discovery, glob patterns, watch directories, and data path validation to recognize channel-specific databases

Step-finish token aggregation (v2)

  • When a message data blob has zero tokens at message level, falls back to aggregating token/cost data from step-finish parts
  • Handles newer OpenCode versions where per-step accounting is the primary data source
  • Aggregates input, output, reasoning, cache.read, cache.write, and cost across all step-finish parts per message

Seamless integration

  • Deduplication: Messages are deduplicated across JSON files and all SQLite DBs during migration using the same global_hash formula (opencode_{session_id}_{msg_id})
  • Dynamic contribution strategy: MultiSession when any DB exists, SingleMessage for JSON-only installs
  • File watching: Watches both the legacy storage/message/ directory and the parent opencode/ directory for SQLite DB changes
  • Data path validation: Accepts opencode.db, opencode-{channel}.db, and legacy .json files

Refactoring

Extracted shared logic into reusable helpers:

  • compute_message_stats() — stats computation from message + tool stats
  • build_conversation_message() — ConversationMessage construction
  • json_to_conversation_message() — legacy JSON path wrapper
  • accumulate_tool_stat() — shared tool-stat counting for both filesystem and DB paths
  • batch_load_step_finish_from_db() — aggregates step-finish part tokens per message
  • Made OpenCodeMessage.id and session_id #[serde(default)] so the same struct parses both full JSON files and DB data blobs

Compatibility

Supports all three user states simultaneously:

  1. JSON only (older OpenCode versions) — works as before
  2. SQLite v1 (OpenCode v1.1.53+) — reads from initial DB schema
  3. SQLite v2 (current OpenCode) — reads from evolved schema with workspaces, channel DBs, and step-finish token fallback
  4. Mixed (during migration transitions) — reads from all sources, deduplicates

Tests

Added 29 new tests covering:

JSON format:

  • Boolean, object, and absent summary field parsing
  • Model name resolution (modelID vs model.modelID)

SQLite v1:

  • Data blob parsing (assistant, user, minimal)
  • Stats computation (with cost, user messages, tool stats preservation)
  • build_conversation_message with various project hash fallbacks
  • Global hash consistency between JSON and SQLite paths
  • In-memory integration tests (projects, sessions, tool stats, end-to-end)

SQLite v2:

  • Project loading with new columns (icon_color, sandboxes, commands)
  • Session loading with workspace support
  • global project_id fallback handling
  • step-finish part token aggregation
  • Step-finish fallback when message has zero tokens
  • End-to-end with real-world data shapes (SHA project IDs, parentID, mode, path)
  • Newer message format parsing (tools field, total tokens field)
  • Channel-specific DB file validation (opencode-canary.db etc.)

All 229 tests pass (up from 200 on main). Clippy, fmt, and doc checks clean.

Summary by CodeRabbit

  • New Features

    • Added SQLite database support alongside legacy JSON: automatic discovery, mixed-source parsing, unified message construction and stats computation, and global deduplication that prefers DB records.
    • More resilient parsing with optional metadata handling and consolidated tool-call, token, and cost aggregation across formats.
  • Tests

    • Expanded tests for SQLite parsing, mixed-source workflows, stats aggregation, schema variants, hashing consistency, and fallback behaviors.

OpenCode has migrated from individual JSON message files to a SQLite
database (opencode.db). This adds seamless support for the new format
alongside the existing JSON files — no new tab, all data merges under
the existing 'OpenCode' tab.

## What changed

- Parse messages from ~/.local/share/opencode/opencode.db using the
  message, session, project, and part tables
- Batch-load tool call stats from the part table with a LIKE pre-filter
  to avoid deserializing large non-tool parts
- Deduplicate messages across JSON files and SQLite DB during the
  migration period (same global_hash for identical messages)
- Dynamic contribution strategy: MultiSession when DB exists (correct
  for multi-message source), SingleMessage for JSON-only installs
- Watch both the legacy storage/message/ directory and the parent
  opencode/ directory for SQLite DB changes
- Accept opencode.db as a valid data path for file watcher events

## Refactoring

Extracted shared logic into reusable helpers:
- compute_message_stats() — stats computation from message + tool stats
- build_conversation_message() — ConversationMessage construction
- json_to_conversation_message() — legacy JSON path wrapper
- Made OpenCodeMessage.id and session_id #[serde(default)] so the same
  struct parses both full JSON files and DB data blobs (which omit those
  fields; they come from DB columns instead)

## Tests

Added 20 new tests covering:
- SQLite data blob parsing (assistant, user, minimal)
- Stats computation (with cost, user messages, tool stats preservation)
- build_conversation_message with various project hash fallbacks
- Global hash consistency between JSON and SQLite paths
- In-memory SQLite integration tests (projects, sessions, tool stats,
  end-to-end message conversion)
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds read-only SQLite opencode.db discovery and parsing alongside legacy JSON; unifies ConversationMessage construction, global-hash deduplication with DB precedence, parallel parsing of mixed sources, pre-aggregated tool/step stats from DB, relaxed serde for schema drift, and tests — implemented in src/analyzers/opencode.rs.

Changes

Cohort / File(s) Summary
SQLite integration & loaders
src/analyzers/opencode.rs
Adds DB discovery/opening helpers, read-only DB access, DB models/loaders (project/session), and batch loaders for message-level and step-finish aggregates.
Message parsing & conversion
src/analyzers/opencode.rs
Introduces parse_sqlite_messages, json_to_conversation_message (renamed), build_conversation_message, compute_message_stats, and accumulate_tool_stat to unify JSON and SQLite conversion and stats computation.
Parsing pipeline & parallelization
src/analyzers/opencode.rs
Updates parse_source, parse_sources_parallel_with_paths, and parse_sources_parallel to partition by extension, parse DBs up-front, parse JSON with shared context in parallel, and apply global-hash deduplication (SQLite precedence).
Discovery & filesystem helpers
src/analyzers/opencode.rs
Extends discovery (get_data_glob_patterns, discover_data_sources, get_watch_directories, is_valid_data_path, is_available) and adds helpers (data_dir, storage_root, app_dir, db_path, has_sqlite_db, has_json_messages).
Serde & schema tolerance
src/analyzers/opencode.rs
Makes project/session/message fields optional/defaulted to tolerate schema drift and blobs that omit id/session_id/role.
Tests & validation
src/analyzers/opencode.rs
Adds tests for SQLite blob parsing (including missing ids), in-memory DB variants, stats aggregation and step-finish fallback, and JSON-vs-DB global-hash consistency.
Refactor / deduplication
src/analyzers/opencode.rs
Refactors legacy conversion naming, centralizes stats handling, consolidates message hashing/deduplication logic, and updates contribution strategy selection.

Sequence Diagram

sequenceDiagram
    participant Discovery as Discovery (FS/DB)
    participant FS as Filesystem (JSON)
    participant DB as SQLite DB
    participant Parser as Parser/Converter
    participant Stats as Stats Aggregator
    participant Dedup as Deduplicator
    participant Output as Output

    Discovery->>FS: detect legacy JSON message dirs
    Discovery->>DB: detect `opencode.db` / `opencode-*.db`

    par JSON path
        FS->>Parser: load JSON files
        Parser->>Stats: extract per-message tool stats from parts
        Stats-->>Parser: per-message tool stats
        Parser->>Parser: json_to_conversation_message / build_conversation_message
    and DB path
        DB->>Parser: open_db (read-only), load project/session/messages
        Parser->>Stats: batch_load_tool_stats_from_db & batch_load_step_finish_from_db
        Stats-->>Parser: aggregated tool & step-finish stats
        Parser->>Parser: parse_sqlite_messages -> build_conversation_message
    end

    Parser->>Dedup: emit messages with global hash (DB precedence)
    Dedup->>Output: unified, deduplicated messages
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through folders, then peeked in a DB,
Collected tokens, tools, and threads with great glee.
I hashed every message, both JSON and SQL,
One path now unites them — I binkied a twirl! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding SQLite database support to the OpenCode analyzer, which aligns perfectly with the substantial refactoring and new functionality described in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/opencode-sqlite-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Fixes clippy::field_reassign_with_default triggered by CI's
`cargo clippy --tests -- -D warnings`.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
src/analyzers/opencode.rs (5)

459-477: Duplicated tool-stats accumulation logic.

The tool-name matching and stat incrementing in extract_tool_stats_from_parts (lines 459–477) and batch_load_tool_stats_from_db (lines 633–652) are identical. Extracting a shared helper would reduce the chance of future divergence.

♻️ Proposed shared helper
fn accumulate_tool_stat(stats: &mut Stats, tool_name: &str, value: &OwnedValue) {
    stats.tool_calls += 1;
    match tool_name {
        "read" => {
            stats.files_read += 1;
        }
        "glob" => {
            stats.file_searches += 1;
            if let Some(count) = value
                .get("state")
                .and_then(|s| s.get("metadata"))
                .and_then(|m| m.get("count"))
                .and_then(|c| c.as_u64())
            {
                stats.files_read += count;
            }
        }
        _ => {}
    }
}

Then both call sites become:

-        stats.tool_calls += 1;
-
-        match tool_name {
-            "read" => {
-                stats.files_read += 1;
-            }
-            "glob" => {
-                stats.file_searches += 1;
-                ...
-            }
-            _ => {}
-        }
+        accumulate_tool_stat(&mut stats, tool_name, &value);

Also applies to: 633-652

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/analyzers/opencode.rs` around lines 459 - 477, Extract the duplicated
tool-stat logic into a helper function (e.g., fn accumulate_tool_stat(stats:
&mut Stats, tool_name: &str, value: &OwnedValue)) and replace the matching
blocks in extract_tool_stats_from_parts and batch_load_tool_stats_from_db with
calls to this helper; the helper should increment stats.tool_calls and handle
"read" and "glob" cases (including extracting the nested
"state"->"metadata"->"count" as_u64 to add to stats.files_read and increment
stats.file_searches for "glob"). Ensure the function is visible to both call
sites (module-level) and use the same Stats and OwnedValue types as in the
original code.

883-927: Duplicated source-partitioning and parsing logic across methods.

get_stats_with_sources (lines 883–927) repeats the partition → load-context → parallel-parse-JSON → sequential-parse-DB pattern that already exists in parse_sources_parallel_with_paths (lines 814–860). Consider reusing parse_sources_parallel:

fn get_stats_with_sources(&self, sources: Vec<DataSource>) -> Result<AgenticCodingToolStats> {
    let messages = self.parse_sources_parallel(&sources);
    // ... aggregate stats from `messages` ...
}

This eliminates ~40 lines of duplicated logic and ensures future changes to the parsing pipeline are applied in one place.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/analyzers/opencode.rs` around lines 883 - 927, get_stats_with_sources
currently reimplements the partition/load/parallel-JSON/sequential-DB parsing
logic already implemented in parse_sources_parallel_with_paths (aka
parse_sources_parallel); replace the duplicated block in get_stats_with_sources
with a call to that parsing helper to obtain Vec<ConversationMessage> (or adapt
the helper to return that type), e.g. let messages =
self.parse_sources_parallel_with_paths(sources) and then aggregate stats from
messages; ensure any error handling or storage_root-dependent behavior is
centralized in parse_sources_parallel_with_paths and update
get_stats_with_sources to use its return value for further aggregation.

597-603: LIKE pre-filter may miss tool parts with unexpected JSON formatting.

The two LIKE patterns cover "type":"tool" and "type": "tool", but won't match other valid JSON whitespace variants (e.g., "type" : "tool" or multi-line formatting). Since the filter is an optimization and false negatives would silently drop tool stats, consider a single broader pattern or a note documenting the assumption about OpenCode's serialization format.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/analyzers/opencode.rs` around lines 597 - 603, The current conn.prepare
call uses two specific LIKE patterns that miss valid JSON spacing/formatting
variants and can silently drop tool parts; replace the fragile LIKE filter with
a robust check such as using SQLite JSON functions (e.g., json_extract(data,
'$.type') = 'tool') or broaden the pattern to a single catch‑all before parsing,
so all parts with type=="tool" are reliably detected; update the SQL string
passed to conn.prepare (the query in the SELECT message_id, data FROM part ...)
accordingly and ensure subsequent code that deserializes data still handles
non-tool rows if you keep a looser pre-filter.

789-805: parse_source reloads all projects & sessions for every individual JSON file.

When called in a loop (e.g., from a watcher processing one file at a time), load_projects and load_sessions are invoked per file. This is fine for one-off parses but is worth noting; the batch path (parse_sources_parallel_with_paths) correctly loads context once.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/analyzers/opencode.rs` around lines 789 - 805, parse_source currently
calls load_projects and load_sessions on every invocation (reloading context per
JSON file); change parse_source to avoid per-file reloads by accepting preloaded
context or reusing a cached value: update parse_source signature to take
projects and sessions (e.g., add parameters like projects: &ProjectsType,
sessions: &SessionsType or a single Context struct), remove the internal calls
to load_projects/load_sessions and use the supplied preloaded data, and update
call sites (including parse_sources_parallel_with_paths and the watcher loop) to
load projects/sessions once and pass them through; alternatively implement a
small memoized/cache lookup keyed by storage_root inside parse_source if
changing the signature is impractical.

853-858: Consider structured logging instead of eprintln!.

Using eprintln! for error reporting (also at line 921) mixes analyzer output with stderr. If the project uses a logging framework (e.g., tracing), switching to tracing::warn! would allow log-level filtering and structured metadata.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/analyzers/opencode.rs` around lines 853 - 858, Replace the eprintln!
calls in the Err(e) arms (e.g., the block that prints "Failed to parse OpenCode
SQLite DB {:?}: {}" and the similar call near line 921) with structured tracing
logs: import tracing and use tracing::warn! (or trace/debug/info as appropriate)
with named fields for the path and error (for example: tracing::warn!(path =
%source.path.display(), error = %e, "Failed to parse OpenCode SQLite DB");).
Ensure you remove the eprintln! usage, add the necessary use tracing::...
import, and format the message as structured metadata so the logs can be
filtered and queried.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/analyzers/opencode.rs`:
- Around line 1206-1208: Replace the two-step reassignment of a default Stats
with a single struct initialization: instead of creating tool_stats via
Stats::default() and then setting tool_calls and files_read, construct
tool_stats using the struct init pattern (base it on Default::default()) and set
tool_calls and files_read inline; target the variable tool_stats and the Stats
type, replacing the existing two assignments with the combined initialization.
- Around line 47-50: The doc for has_sqlite_db() claims it checks existence and
schema but the implementation only checks file existence; either update the
comment to state it only checks existence, or implement a lightweight schema
check: in has_sqlite_db() (or a helper called from it) open the SQLite at
Self::db_path(), run a simple query against sqlite_master to ensure the expected
table (e.g., "message") exists (for example: SELECT name FROM sqlite_master
WHERE type='table' AND name='message' LIMIT 1), and return true only if the file
exists, the DB opens, and the table is present; ensure errors opening/queries
are handled and result in false.
- Around line 705-707: The SQLite-path fallback uses session.project_id while
the JSON-path uses session.id, causing inconsistent project_hash; update the
SQLite-path fallback to use the session id instead. Locate the variables
session_title, worktree, fallback in opencode.rs and change the fallback
assignment from session.map(|s| s.project_id.as_str()) to use session.id (e.g.,
session.map(|s| s.id.as_str()) or session.map(|s| s.id.clone()) as appropriate)
so both JSON and SQLite paths use the same session_id fallback.

---

Nitpick comments:
In `@src/analyzers/opencode.rs`:
- Around line 459-477: Extract the duplicated tool-stat logic into a helper
function (e.g., fn accumulate_tool_stat(stats: &mut Stats, tool_name: &str,
value: &OwnedValue)) and replace the matching blocks in
extract_tool_stats_from_parts and batch_load_tool_stats_from_db with calls to
this helper; the helper should increment stats.tool_calls and handle "read" and
"glob" cases (including extracting the nested "state"->"metadata"->"count"
as_u64 to add to stats.files_read and increment stats.file_searches for "glob").
Ensure the function is visible to both call sites (module-level) and use the
same Stats and OwnedValue types as in the original code.
- Around line 883-927: get_stats_with_sources currently reimplements the
partition/load/parallel-JSON/sequential-DB parsing logic already implemented in
parse_sources_parallel_with_paths (aka parse_sources_parallel); replace the
duplicated block in get_stats_with_sources with a call to that parsing helper to
obtain Vec<ConversationMessage> (or adapt the helper to return that type), e.g.
let messages = self.parse_sources_parallel_with_paths(sources) and then
aggregate stats from messages; ensure any error handling or
storage_root-dependent behavior is centralized in
parse_sources_parallel_with_paths and update get_stats_with_sources to use its
return value for further aggregation.
- Around line 597-603: The current conn.prepare call uses two specific LIKE
patterns that miss valid JSON spacing/formatting variants and can silently drop
tool parts; replace the fragile LIKE filter with a robust check such as using
SQLite JSON functions (e.g., json_extract(data, '$.type') = 'tool') or broaden
the pattern to a single catch‑all before parsing, so all parts with type=="tool"
are reliably detected; update the SQL string passed to conn.prepare (the query
in the SELECT message_id, data FROM part ...) accordingly and ensure subsequent
code that deserializes data still handles non-tool rows if you keep a looser
pre-filter.
- Around line 789-805: parse_source currently calls load_projects and
load_sessions on every invocation (reloading context per JSON file); change
parse_source to avoid per-file reloads by accepting preloaded context or reusing
a cached value: update parse_source signature to take projects and sessions
(e.g., add parameters like projects: &ProjectsType, sessions: &SessionsType or a
single Context struct), remove the internal calls to load_projects/load_sessions
and use the supplied preloaded data, and update call sites (including
parse_sources_parallel_with_paths and the watcher loop) to load
projects/sessions once and pass them through; alternatively implement a small
memoized/cache lookup keyed by storage_root inside parse_source if changing the
signature is impractical.
- Around line 853-858: Replace the eprintln! calls in the Err(e) arms (e.g., the
block that prints "Failed to parse OpenCode SQLite DB {:?}: {}" and the similar
call near line 921) with structured tracing logs: import tracing and use
tracing::warn! (or trace/debug/info as appropriate) with named fields for the
path and error (for example: tracing::warn!(path = %source.path.display(), error
= %e, "Failed to parse OpenCode SQLite DB");). Ensure you remove the eprintln!
usage, add the necessary use tracing::... import, and format the message as
structured metadata so the logs can be filtered and queried.

Comment thread src/analyzers/opencode.rs Outdated
Comment thread src/analyzers/opencode.rs Outdated
Comment thread src/analyzers/opencode.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/analyzers/opencode.rs (1)

878-946: 🛠️ Refactor suggestion | 🟠 Major

get_stats_with_sources duplicates parse_sources_parallel logic.

The JSON/SQLite partitioning, context loading, parallel parsing, and deduplication are duplicated here. Since parse_sources_parallel already handles all of this (including deduplication), this method could delegate to it:

♻️ Proposed refactor
     fn get_stats_with_sources(
         &self,
         sources: Vec<DataSource>,
     ) -> Result<crate::types::AgenticCodingToolStats> {
-        // Partition sources into JSON files and DB files.
-        let (db_sources, json_sources): (Vec<_>, Vec<_>) = sources
-            .iter()
-            .partition(|s| s.path.extension().is_some_and(|ext| ext == "db"));
-
-        let mut all_messages: Vec<ConversationMessage> = Vec::new();
-
-        // --- Parse JSON sources in parallel ---
-        if !json_sources.is_empty()
-            && let Some(storage_root) = Self::storage_root()
-        {
-            // ... ~30 lines of duplicated parsing ...
-        }
-
-        // --- Parse SQLite sources ---
-        for source in db_sources {
-            // ... duplicated DB parsing ...
-        }
-
-        // Deduplicate
-        let messages = crate::utils::deduplicate_by_global_hash(all_messages);
+        let messages = self.parse_sources_parallel(&sources);

         // Aggregate stats.
         let mut daily_stats = crate::utils::aggregate_by_date(&messages);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/analyzers/opencode.rs` around lines 878 - 946, get_stats_with_sources
duplicates the JSON/SQLite partitioning, context loading, parallel parsing, and
deduplication already implemented in parse_sources_parallel; replace the body of
get_stats_with_sources with a delegation to parse_sources_parallel and then
adapt its returned messages/stats into the AgenticCodingToolStats struct.
Specifically: call Self::parse_sources_parallel(sources) (ensuring
storage_root/context are handled there), use the returned deduplicated messages
to compute daily_stats and num_conversations (reuse
crate::utils::aggregate_by_date) and construct the AgenticCodingToolStats with
analyzer_name from self.display_name(); remove the duplicated json/db parsing
and parse_sqlite_messages usage from get_stats_with_sources. Ensure any helper
functions referenced (storage_root, parse_sources_parallel,
deduplicate_by_global_hash) are used rather than reimplemented.
🧹 Nitpick comments (1)
src/analyzers/opencode.rs (1)

853-858: Consider structured logging instead of eprintln!.

Using eprintln! for error reporting in a TUI application may interfere with the UI. If the project has a logging framework (e.g., tracing or log), prefer warn! or error! macros for better observability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/analyzers/opencode.rs` around lines 853 - 858, Replace the direct stderr
print in the Err arm that uses eprintln! with a structured logging macro (e.g.,
tracing::error! or log::error!) so the error doesn't disrupt the TUI; keep the
same context by logging the source.path and the error (e), and add the
appropriate use/import (tracing::error or log::error) at the top of the module;
target the Err(e) => block that references source.path and e and swap eprintln!
for the chosen logging macro with a clear message and structured fields if using
tracing (e.g., error!(path = %source.path, error = %e, "Failed to parse OpenCode
SQLite DB")).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/analyzers/opencode.rs`:
- Around line 596-603: The LIKE pre-filter in the conn.prepare call (query
string selecting from part WHERE data LIKE '%"type":"tool"%' OR data LIKE
'%"type": "tool"%') can miss tool parts with arbitrary whitespace/newlines;
update the query to either remove the LIKE filter entirely and rely on the
post-parse type check (the code path that inspects parsed JSON around line 625)
or replace the pattern with a broader match (e.g., use REGEXP/JSON functions if
supported) so that all candidate rows are returned; modify the SQL in the
conn.prepare invocation accordingly and keep the existing post-parse filtering
logic intact.
- Around line 527-530: DbSession.title is currently String but the DB title
column can be NULL, causing row.get(2)? to fail and rows to be lost; change the
DbSession struct's title field to Option<String> and update the query extraction
to use row.get::<_, Option<String>>(2)? (replace any plain row.get(2)? calls),
then adjust downstream code that expects a String (e.g., the mapping/usage
around the former line 705) to handle Option<String> safely (provide fallback or
propagate None) so sessions with NULL titles are preserved instead of being
dropped.
- Around line 270-276: The field s.cached_tokens is incorrectly set to only
tokens.cache.read; update the assignment in the block that checks msg.tokens so
s.cached_tokens = tokens.cache.write + tokens.cache.read (sum write and read)
instead of using tokens.cache.read alone; locate the code around the msg.tokens
handling where s.input_tokens, s.output_tokens, s.reasoning_tokens,
s.cache_creation_tokens and s.cache_read_tokens are set and change the
s.cached_tokens assignment accordingly.

---

Outside diff comments:
In `@src/analyzers/opencode.rs`:
- Around line 878-946: get_stats_with_sources duplicates the JSON/SQLite
partitioning, context loading, parallel parsing, and deduplication already
implemented in parse_sources_parallel; replace the body of
get_stats_with_sources with a delegation to parse_sources_parallel and then
adapt its returned messages/stats into the AgenticCodingToolStats struct.
Specifically: call Self::parse_sources_parallel(sources) (ensuring
storage_root/context are handled there), use the returned deduplicated messages
to compute daily_stats and num_conversations (reuse
crate::utils::aggregate_by_date) and construct the AgenticCodingToolStats with
analyzer_name from self.display_name(); remove the duplicated json/db parsing
and parse_sqlite_messages usage from get_stats_with_sources. Ensure any helper
functions referenced (storage_root, parse_sources_parallel,
deduplicate_by_global_hash) are used rather than reimplemented.

---

Duplicate comments:
In `@src/analyzers/opencode.rs`:
- Around line 705-707: The JSON and SQLite code paths generate inconsistent
project hashes because the JSON path uses session.id while the SQLite path uses
session.project_id; pick one and make both consistent — update the JSON path
(where fallback_project_hash is computed) to use session.project_id (matching
the SQLite path) so the same fallback value is used; locate uses of session.id
and session.project_id (and related vars like fallback_project_hash,
session_title, worktree, fallback) and replace the JSON-side session.id usage
with session.project_id.as_str() (or the equivalent) so grouping is consistent.
- Around line 47-50: The doc for has_sqlite_db() is incorrect: it claims the
function checks "exists and has the expected schema" but the implementation only
tests Self::db_path().is_some_and(|p| p.exists()); either update the doc to say
it only checks file existence or extend the function to validate schema (e.g.,
open the DB at Self::db_path(), run a PRAGMA user_version or query expected
tables/columns, and return true only if schema matches). Modify the comment or
implement the schema check in has_sqlite_db() and keep references to
Self::db_path() and has_sqlite_db() so callers remain unchanged.

---

Nitpick comments:
In `@src/analyzers/opencode.rs`:
- Around line 853-858: Replace the direct stderr print in the Err arm that uses
eprintln! with a structured logging macro (e.g., tracing::error! or log::error!)
so the error doesn't disrupt the TUI; keep the same context by logging the
source.path and the error (e), and add the appropriate use/import
(tracing::error or log::error) at the top of the module; target the Err(e) =>
block that references source.path and e and swap eprintln! for the chosen
logging macro with a clear message and structured fields if using tracing (e.g.,
error!(path = %source.path, error = %e, "Failed to parse OpenCode SQLite DB")).

Comment thread src/analyzers/opencode.rs Outdated
Comment thread src/analyzers/opencode.rs
Comment thread src/analyzers/opencode.rs
- Fix doc comment on has_sqlite_db() to match implementation (only
  checks file existence, not schema) [comment 1]

- Fix inconsistent fallback_project_hash between JSON and SQLite paths:
  JSON used session.id but SQLite used session.project_id, causing
  different project_hash values for the same message depending on which
  source won deduplication. Both now use session_id. [comment 2]

- Extract shared accumulate_tool_stat() helper to deduplicate the
  tool-name matching logic between extract_tool_stats_from_parts (JSON
  filesystem) and batch_load_tool_stats_from_db (SQLite). [nitpick 1]

- Collapse get_stats_with_sources() to reuse parse_sources_parallel()
  instead of reimplementing the partition/parse/dedup pipeline, removing
  ~40 lines of duplicated logic. [nitpick 2]

- Document the LIKE pre-filter assumption: OpenCode uses JSON.stringify
  without pretty-printing, so the two patterns cover all expected
  formatting. False positives are harmless (filtered in Rust). [nitpick 3]

Skipped two nitpicks that don't apply:
- parse_source per-file reload: trait method signature is fixed, batch
  path already handles it, and this matches the pre-existing pattern.
- eprintln! vs tracing: the entire codebase uses eprintln!, not tracing.
@mike1858
Copy link
Copy Markdown
Member Author

I'll do this one in the next release - after this one I'm about to do.

@Sewer56
Copy link
Copy Markdown
Contributor

Sewer56 commented Mar 21, 2026

Maybe worth nothing. They changed something again

image

I haven't peeked into it yet; but past some date, entries are returning blank.

@Sewer56
Copy link
Copy Markdown
Contributor

Sewer56 commented Apr 4, 2026

Aah, dying for this one.
It's just that I've been a busy bee myself 😅

OpenCode has continued evolving their SQLite database schema since the
initial v1.1.53 migration. This adds support for the current schema
alongside the original format.

## What changed

### Channel-specific database support
- Discover and parse opencode-{channel}.db files (e.g. opencode-canary.db)
  in addition to the default opencode.db
- Updated discovery, glob patterns, watch directories, and validation
  to recognize channel-specific databases

### Step-finish token aggregation
- When a message's data blob has zero tokens at message level, fall back
  to aggregating token/cost data from step-finish parts
- This handles newer OpenCode versions where per-step accounting is the
  primary data source

### Updated test infrastructure
- create_test_db() now matches the real initial schema (with slug,
  version, sandboxes, icon_url, etc.)
- Added create_test_db_v2() matching the current schema (workspace_id,
  workspace table, composite indexes, commands column)

## Tests

Added 9 new tests:
- v2 schema project/session loading
- Workspace-linked sessions
- 'global' project_id fallback
- step-finish token aggregation
- step-finish fallback for zero-token messages
- v2 end-to-end with real-world data shapes
- Newer message format parsing (tools field, total tokens)
- Channel-specific DB file validation

All 229 tests pass. Clippy, fmt, and doc checks clean.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/analyzers/opencode.rs (1)

294-301: ⚠️ Potential issue | 🟡 Minor

cached_tokens is undercounted by excluding cache writes.

Line 300 only copies tokens.cache.read, so total cached usage is understated whenever cache writes are present.

Proposed fix
-        s.cached_tokens = tokens.cache.read;
+        s.cached_tokens = tokens.cache.read + tokens.cache.write;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/analyzers/opencode.rs` around lines 294 - 301, The code sets
s.cached_tokens to tokens.cache.read, which undercounts cached usage because it
omits cache writes; update the assignment in the block handling msg.tokens so
s.cached_tokens = tokens.cache.read + tokens.cache.write (or use a
checked/overflow-safe add if types require) referencing s.cached_tokens,
tokens.cache.read and tokens.cache.write to ensure total cached tokens include
both reads and writes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/analyzers/opencode.rs`:
- Around line 800-807: The fallback condition ignores aggregates that have
nonzero reasoning/cache metrics; update the if that checks
step_finish_map.get(&id) to consider any nonzero aggregate field (not just
agg.input || agg.output). Specifically, when retrieving agg from step_finish_map
in the block that uses msg_has_tokens, change the predicate to check (agg.input
> 0 || agg.output > 0 || agg.reasoning > 0 || agg.cache_read > 0 ||
agg.cache_write > 0) so cache-only or reasoning-only aggregates trigger the
step-finish fallback for msg tokens.
- Around line 1074-1079: The contribution_strategy() method flips between
ContributionStrategy::MultiSession and ContributionStrategy::SingleMessage based
on Self::has_sqlite_db(), which allows opencode.db appearing/disappearing at
runtime to move contributions between cache buckets and corrupt incremental
counts; make the strategy deterministic for the lifetime of the analyzer by
selecting and caching the strategy once at startup (e.g., compute and store a
fixed ContributionStrategy in the analyzer struct during construction) instead
of calling Self::has_sqlite_db() on each contribution_strategy() call so the
strategy cannot change mid-run.
- Around line 948-972: The JSON path currently pushes json_results into results
before deduping, allowing legacy JSON to win; change the load/dedup ordering so
SQLite rows take priority: either (A) load/extend results with SQLite-derived
rows first and then add JSON rows while skipping any whose global_hash is
already present, or (B) when building json_results check against an existing
HashSet of global_hashes from the already-loaded SQLite results and filter them
out. Look for json_sources, storage_root, load_projects, load_sessions,
json_results, results and the dedup step that checks global_hash (around the
current dedupe logic) and implement the skip-if-hash-exists behavior so JSON
cannot shadow richer SQLite records.

---

Duplicate comments:
In `@src/analyzers/opencode.rs`:
- Around line 294-301: The code sets s.cached_tokens to tokens.cache.read, which
undercounts cached usage because it omits cache writes; update the assignment in
the block handling msg.tokens so s.cached_tokens = tokens.cache.read +
tokens.cache.write (or use a checked/overflow-safe add if types require)
referencing s.cached_tokens, tokens.cache.read and tokens.cache.write to ensure
total cached tokens include both reads and writes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 22b25d55-ba96-4bfb-b04c-ac3eb48ecfd2

📥 Commits

Reviewing files that changed from the base of the PR and between 6972c6d and a4a9687.

📒 Files selected for processing (1)
  • src/analyzers/opencode.rs

Comment thread src/analyzers/opencode.rs Outdated
Comment thread src/analyzers/opencode.rs
Comment thread src/analyzers/opencode.rs
@Sewer56
Copy link
Copy Markdown
Contributor

Sewer56 commented Apr 5, 2026

image

Ooooh- there we go.

mike1858 added 2 commits April 4, 2026 21:03
Fix 4 issues identified by CodeRabbit review:

1. cached_tokens now sums cache.write + cache.read (was read-only)
   Consistent with all other analyzers (piebald, kilo_code, cline, etc.)

2. DbSession.title changed to Option<String> for NULL safety
   Prevents silently dropping sessions if title column is NULL.
   Updated load_sessions_from_db to use row.get::<_, Option<String>>
   and all downstream usage to use .and_then()/.as_deref().

3. Step-finish fallback now checks reasoning/cache tokens too
   Previously only checked input/output > 0, missing cases where
   step-finish parts had only reasoning or cache token data.

4. Dedup order swapped: SQLite parsed before JSON
   SQLite records are richer (tool stats, step-finish tokens) so they
   should win dedup. Since deduplicate_by_global_hash keeps the first
   seen entry, SQLite sources are now added to results first.

Dismissed 4 suggestions as not applicable:
- LIKE pre-filter whitespace: OpenCode uses JSON.stringify() (no
  pretty-printing), documented assumption, false negatives harmless
- parse_source per-file reload: trait method used by watcher for
  single-file updates; batch path already optimized
- eprintln → tracing: all analyzers use eprintln consistently
- Runtime strategy flips: dismissed by PR author as unrealistic
Replace fragile LIKE pattern matching on JSON text with proper
json_extract() calls for the tool-part and step-finish-part queries.

Before:
  WHERE data LIKE '%"type":"tool"%' OR data LIKE '%"type": "tool"%'

After:
  WHERE json_extract(data, '$.type') = 'tool'

This is whitespace-agnostic and handles any valid JSON formatting.
The bundled SQLite in rusqlite includes JSON1 by default.

Addresses CodeRabbit review feedback on PR #120.
cargo update aws-lc-rs, aws-lc-sys, rustls-webpki, quinn-proto, reqwest

- aws-lc-sys 0.36.0 → 0.39.1 (RUSTSEC-2026-0044/0045/0046/0047/0048)
- rustls-webpki 0.103.8 → 0.103.10 (RUSTSEC-2026-0049)
- quinn-proto 0.11.13 → 0.11.14 (RUSTSEC-2026-0037)
- aws-lc-rs 1.15.3 → 1.16.2
- reqwest 0.13.1 → 0.13.2

cargo audit now reports 0 vulnerabilities.
@mike1858 mike1858 merged commit 63f0a33 into main Apr 5, 2026
6 checks passed
@mike1858 mike1858 deleted the feat/opencode-sqlite-support branch April 5, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants