fix(drive): eliminate panic in grovedb operations logging under concurrent execution#3142
fix(drive): eliminate panic in grovedb operations logging under concurrent execution#3142thepastaclaw wants to merge 1 commit intodashpay:v3.1-devfrom
Conversation
📝 WalkthroughWalkthroughTwo logging blocks in grove operations refactored to safely handle optional parameters. Replaced unconditional assumptions about Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs`:
- Around line 89-93: The current logging block calls
self.grove.root_hash(...).unwrap().map_err(?) which can early-return and skip
the later push_drive_operation_result, losing cost accounting; change the
root_hash retrieval in the logging path to absorb errors instead of propagating
them: call self.grove.root_hash(transaction, &drive_version.grove_version) and
handle the Result with match or if let (log the root hash on Ok and log a
warning on Err) so no ? is used there, ensuring push_drive_operation_result
always runs and drive_operations cost is preserved.
In `@packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs`:
- Around line 52-56: The call to self.grove.root_hash(transaction,
&drive_version.grove_version) used only for logging currently uses ?/map_err and
can return an error that overrides the actual successful commit in result;
change this logging-path call to absorb any error instead of propagating it:
call self.grove.root_hash(... ) and handle the Err case (e.g., match or if let
Err(e) = ...) by logging the error with context but continuing, and only use the
Ok(root_hash) value for logging; ensure the variable names root_hash and result
are preserved so the commit result is returned unchanged.
packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs
Outdated
Show resolved
Hide resolved
packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs
Outdated
Show resolved
Hide resolved
…rrent execution Replace independent tracing::event_enabled! re-checks with if-let on the previously captured Option. The old code checked event_enabled! twice independently — once before the grovedb operation to decide whether to clone params for logging, and again after to decide whether to log. If the tracing dispatcher state changed between the two checks, the second path would .expect() on a None value and panic. Additionally, replace the error-propagating ? operator in the post-op root_hash() call with match-based error absorption. The old code used .map_err(Error::from)? which could cause an early return from a logging-only code path — skipping push_drive_operation_result (losing fee accounting) or shadowing a successful result with a logging error. The post-op root_hash is only needed for trace output, so errors are now logged as warnings instead of propagated. Both affected sites fixed: - grove_apply_batch_with_add_costs/v0/mod.rs - grove_clear/v0/mod.rs
697846e to
6d26b2c
Compare
|
Addressed both CodeRabbit findings in the force-push:
Thanks for catching this — it was a pre-existing issue but good to fix while touching this code. |
There was a problem hiding this comment.
seems fine; I'm not positive about 6d26b2c so sam / others should evaluate if that is ok. it may change behavior in a way that isn't, I'm not sure
Issue
drive-abcitests intermittently panic with:This has been observed in CI on multiple PRs (e.g.
exponential_distribution_extreme_values,fixed_amount_0) and was reproduced locally undercargo test -p drive-abci --lib -- --test-threads=8.Root Cause
The grovedb operations logging code (behind
#[cfg(feature = "grovedb_operations_logging")]) uses two independenttracing::event_enabled!checks that assume they will always return the same value:If Check 1 returns
false(→None) but Check 2 returnstrue, the.expect()panics. Thegrovedb_operations_loggingfeature is enabled fordrive-abcivia its Cargo.toml dependency ondrive, so this code is always compiled and active in the test binary.Why the checks can disagree
The exact mechanism is not fully pinned down, but
tracing::event_enabled!evaluates the current dispatcher at call time. In the test binary,drive-abciincludes atest_loggingtest (src/logging/mod.rs:127) that callstracing::dispatcher::set_default()to install a thread-local subscriber with TRACE level enabled. Whileset_defaultis thread-local and guarded, the Rust test framework's thread pool reuse creates a window where dispatcher state can be inconsistent across the two checks.The issue was reproduced locally under parallel test execution (
--test-threads=8) but is non-deterministic — it depends on thread scheduling timing.Production impact
Low but non-zero. In production,
drive-abciinitializes a single tracing subscriber at startup and never changes it, so the two checks should always agree. However, the.expect()means any future change that introduces runtime log level reloading (a common feature) would create a latent crash path in every grovedb batch operation — a critical hot path for Platform nodes.Fix
Replace the second
event_enabled!re-check withif let Some(...)on the already-capturedOption. This guarantees the logging block only executes when the params were actually captured, regardless of dispatcher state changes:Both affected call sites fixed:
grove_apply_batch_with_add_costs/v0/mod.rsgrove_clear/v0/mod.rsBehavioral change
The first
event_enabled!check (Check 1) is unchanged — it still gates whether we clone ops and capture the pre-operation root hash. When tracing is disabled,maybe_params_for_logsisNoneand theif letblock is skipped entirely. No extra work happens when tracing is off.The only behavioral difference from the old code is in the (already-broken) race condition edge case: if tracing is enabled at Check 1 but disabled between Check 1 and Check 2, the old code would skip the post-op block (because Check 2 re-evaluated and saw tracing disabled). The new code will still execute the post-op block — performing one extra
root_hash()lookup and emitting atracing::trace!that the now-disabled subscriber silently drops. This is harmless and is the same timing window that caused the panic in the first place.Normal flows are identical:
The panic is eliminated entirely.
Error propagation fix (additional)
The post-operation
root_hash()call in both logging blocks previously used.map_err(Error::from)?, which could cause an early return from logging-only code. This had two consequences:grove_apply_batch_with_add_costs: Ifroot_hash()errored after the batch was already applied, the?would skippush_drive_operation_result, silently dropping the operation costs fromdrive_operationsand skewing fee accounting.grove_clear: Ifroot_hash()errored after a successful clear, the function would return an error instead of the successfulresult, making the caller believe the clear failed.The post-op
root_hash()is only needed for trace output. Errors are now absorbed and logged as warnings instead of propagated, ensuring the actual operation result is always returned to the caller.CI Failure Evidence
exponential_distribution_extreme_valuespanicked with"log params should be set above"cargo test -p drive-abci --lib -- "block_based" --test-threads=8—fixed_amount_0panicked with same message (non-deterministic, reproduced on first attempt but not on subsequent runs)Summary by CodeRabbit
Bug Fixes
Refactor