Skip to content

Comments

fix(drive): eliminate panic in grovedb operations logging under concurrent execution#3142

Open
thepastaclaw wants to merge 1 commit intodashpay:v3.1-devfrom
thepastaclaw:fix/grovedb-logging-panic
Open

fix(drive): eliminate panic in grovedb operations logging under concurrent execution#3142
thepastaclaw wants to merge 1 commit intodashpay:v3.1-devfrom
thepastaclaw:fix/grovedb-logging-panic

Conversation

@thepastaclaw
Copy link
Contributor

@thepastaclaw thepastaclaw commented Feb 21, 2026

Issue

drive-abci tests intermittently panic with:

thread '...' panicked at packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs:97:39:
log params should be set above

This has been observed in CI on multiple PRs (e.g. exponential_distribution_extreme_values, fixed_amount_0) and was reproduced locally under cargo test -p drive-abci --lib -- --test-threads=8.

Root Cause

The grovedb operations logging code (behind #[cfg(feature = "grovedb_operations_logging")]) uses two independent tracing::event_enabled! checks that assume they will always return the same value:

// Check 1: before the grovedb operation
let maybe_params_for_logs = if tracing::event_enabled!(target: "drive_grovedb_operations", TRACE) {
    Some((ops.clone(), root_hash))  // Clone ops for logging
} else {
    None
};

// ... grovedb operation executes here (ops.operations is moved) ...

// Check 2: after the grovedb operation
if tracing::event_enabled!(target: "drive_grovedb_operations", TRACE) && cost_context.value.is_ok() {
    let (ops, previous_root_hash) =
        maybe_params_for_logs.expect("log params should be set above");  // 💥 PANIC if Check 1 was false
    // ...
}

If Check 1 returns false (→ None) but Check 2 returns true, the .expect() panics. The grovedb_operations_logging feature is enabled for drive-abci via its Cargo.toml dependency on drive, so this code is always compiled and active in the test binary.

Why the checks can disagree

The exact mechanism is not fully pinned down, but tracing::event_enabled! evaluates the current dispatcher at call time. In the test binary, drive-abci includes a test_logging test (src/logging/mod.rs:127) that calls tracing::dispatcher::set_default() to install a thread-local subscriber with TRACE level enabled. While set_default is thread-local and guarded, the Rust test framework's thread pool reuse creates a window where dispatcher state can be inconsistent across the two checks.

The issue was reproduced locally under parallel test execution (--test-threads=8) but is non-deterministic — it depends on thread scheduling timing.

Production impact

Low but non-zero. In production, drive-abci initializes a single tracing subscriber at startup and never changes it, so the two checks should always agree. However, the .expect() means any future change that introduces runtime log level reloading (a common feature) would create a latent crash path in every grovedb batch operation — a critical hot path for Platform nodes.

Fix

Replace the second event_enabled! re-check with if let Some(...) on the already-captured Option. This guarantees the logging block only executes when the params were actually captured, regardless of dispatcher state changes:

if let Some((ops, previous_root_hash)) = maybe_params_for_logs {
    if cost_context.value.is_ok() {
        // ... log ...
    }
}

Both affected call sites fixed:

  • grove_apply_batch_with_add_costs/v0/mod.rs
  • grove_clear/v0/mod.rs

Behavioral change

The first event_enabled! check (Check 1) is unchanged — it still gates whether we clone ops and capture the pre-operation root hash. When tracing is disabled, maybe_params_for_logs is None and the if let block is skipped entirely. No extra work happens when tracing is off.

The only behavioral difference from the old code is in the (already-broken) race condition edge case: if tracing is enabled at Check 1 but disabled between Check 1 and Check 2, the old code would skip the post-op block (because Check 2 re-evaluated and saw tracing disabled). The new code will still execute the post-op block — performing one extra root_hash() lookup and emitting a tracing::trace! that the now-disabled subscriber silently drops. This is harmless and is the same timing window that caused the panic in the first place.

Normal flows are identical:

  • Tracing on → both blocks execute (same as before)
  • Tracing off → neither block executes (same as before)

The panic is eliminated entirely.

Error propagation fix (additional)

The post-operation root_hash() call in both logging blocks previously used .map_err(Error::from)?, which could cause an early return from logging-only code. This had two consequences:

  • grove_apply_batch_with_add_costs: If root_hash() errored after the batch was already applied, the ? would skip push_drive_operation_result, silently dropping the operation costs from drive_operations and skewing fee accounting.
  • grove_clear: If root_hash() errored after a successful clear, the function would return an error instead of the successful result, making the caller believe the clear failed.

The post-op root_hash() is only needed for trace output. Errors are now absorbed and logged as warnings instead of propagated, ensuring the actual operation result is always returned to the caller.

CI Failure Evidence

Summary by CodeRabbit

  • Bug Fixes

    • Removed unsafe assertions in logging operations that could cause crashes, improving system stability.
  • Refactor

    • Enhanced error handling in trace operations to ensure logs are only produced when operation conditions are properly met.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 21, 2026

📝 Walkthrough

Walkthrough

Two logging blocks in grove operations refactored to safely handle optional parameters. Replaced unconditional assumptions about maybe_params_for_logs presence with explicit if-let pattern matching, eliminating panic risks while preserving logging behavior when parameters are available.

Changes

Cohort / File(s) Summary
Grove operations logging safety
packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs, packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs
Refactored logging blocks to use if-let pattern matching for maybe_params_for_logs instead of unwrap/expect. Added nested conditional check for operation success (result.is_ok()) before retrieving root_hash and emitting traces. Eliminates panic risk from missing log parameters while maintaining logging semantics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Poem

🐰 A hop, a skip through logs so deep,
Where panic once could make us weep,
Now guards and patterns keep us free,
Safe branches dance in harmony,
No more expect—just graceful guard! 🌿

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main fix: eliminating a panic in grovedb operations logging that occurs under concurrent execution, which aligns directly with the PR's core objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs`:
- Around line 89-93: The current logging block calls
self.grove.root_hash(...).unwrap().map_err(?) which can early-return and skip
the later push_drive_operation_result, losing cost accounting; change the
root_hash retrieval in the logging path to absorb errors instead of propagating
them: call self.grove.root_hash(transaction, &drive_version.grove_version) and
handle the Result with match or if let (log the root hash on Ok and log a
warning on Err) so no ? is used there, ensuring push_drive_operation_result
always runs and drive_operations cost is preserved.

In `@packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs`:
- Around line 52-56: The call to self.grove.root_hash(transaction,
&drive_version.grove_version) used only for logging currently uses ?/map_err and
can return an error that overrides the actual successful commit in result;
change this logging-path call to absorb any error instead of propagating it:
call self.grove.root_hash(... ) and handle the Err case (e.g., match or if let
Err(e) = ...) by logging the error with context but continuing, and only use the
Ok(root_hash) value for logging; ensure the variable names root_hash and result
are preserved so the commit result is returned unchanged.

…rrent execution

Replace independent tracing::event_enabled! re-checks with if-let on
the previously captured Option. The old code checked event_enabled!
twice independently — once before the grovedb operation to decide
whether to clone params for logging, and again after to decide whether
to log. If the tracing dispatcher state changed between the two checks,
the second path would .expect() on a None value and panic.

Additionally, replace the error-propagating ? operator in the post-op
root_hash() call with match-based error absorption. The old code used
.map_err(Error::from)? which could cause an early return from a
logging-only code path — skipping push_drive_operation_result (losing
fee accounting) or shadowing a successful result with a logging error.
The post-op root_hash is only needed for trace output, so errors are
now logged as warnings instead of propagated.

Both affected sites fixed:
- grove_apply_batch_with_add_costs/v0/mod.rs
- grove_clear/v0/mod.rs
@thepastaclaw thepastaclaw force-pushed the fix/grovedb-logging-panic branch from 697846e to 6d26b2c Compare February 21, 2026 21:09
@thepastaclaw
Copy link
Contributor Author

Addressed both CodeRabbit findings in the force-push:

? on root_hash() in logging path — Replaced .map_err(Error::from)? with match-based error absorption in both files. The post-op root_hash() is only needed for trace output, so errors are now logged as warnings instead of propagated. This prevents logging code from skipping push_drive_operation_result (losing fee accounting) or shadowing a successful result with a logging error.

Thanks for catching this — it was a pre-existing issue but good to fix while touching this code.

Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine; I'm not positive about 6d26b2c so sam / others should evaluate if that is ok. it may change behavior in a way that isn't, I'm not sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants