Skip to content

Conversation

@yzang2019
Copy link
Contributor

@yzang2019 yzang2019 commented Jan 7, 2026

Describe your changes and provide context

Purpose of this PR:

  1. Make WAL generic type, previously it is only able to store changeset. Make changelog a common implementation of WAL[ChangeLogEntry[
  2. Bump version of tidywal to latest
  3. Refactor MemIAVL to use the new interface

Testing performed to validate your change

Added and modified unit test

@yzang2019 yzang2019 requested a review from blindchaser January 7, 2026 06:56
@github-actions
Copy link

github-actions bot commented Jan 7, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJan 16, 2026, 2:17 AM

@codecov
Copy link

codecov bot commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 65.49708% with 118 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.76%. Comparing base (c3b387c) to head (784f7c2).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sei-db/wal/wal.go 66.06% 36 Missing and 20 partials ⚠️
sei-db/state_db/sc/memiavl/db.go 62.22% 21 Missing and 13 partials ⚠️
sei-db/wal/changelog.go 0.00% 8 Missing ⚠️
sei-db/state_db/sc/memiavl/multitree.go 77.41% 3 Missing and 4 partials ⚠️
sei-db/wal/utils.go 0.00% 4 Missing ⚠️
sei-db/db_engine/pebbledb/mvcc/db.go 57.14% 2 Missing and 1 partial ⚠️
sei-db/db_engine/rocksdb/mvcc/db.go 57.14% 2 Missing and 1 partial ⚠️
sei-db/state_db/sc/store.go 92.85% 2 Missing ⚠️
...-db/tools/cmd/seidb/operations/replay_changelog.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2671      +/-   ##
==========================================
+ Coverage   43.68%   43.76%   +0.08%     
==========================================
  Files        1913     1913              
  Lines      159450   159492      +42     
==========================================
+ Hits        69650    69798     +148     
+ Misses      83393    83277     -116     
- Partials     6407     6417      +10     
Flag Coverage Δ
sei-chain 45.78% <65.67%> (+0.17%) ⬆️
sei-cosmos 38.21% <ø> (-0.01%) ⬇️
sei-db 68.72% <57.14%> (-0.71%) ⬇️
sei-tendermint 47.25% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/state_db/sc/memiavl/opts.go 100.00% <ø> (ø)
sei-db/state_db/ss/store.go 45.28% <100.00%> (ø)
...-db/tools/cmd/seidb/operations/replay_changelog.go 0.00% <0.00%> (ø)
sei-db/state_db/sc/store.go 91.30% <92.85%> (+91.30%) ⬆️
sei-db/db_engine/pebbledb/mvcc/db.go 55.25% <57.14%> (-0.68%) ⬇️
sei-db/db_engine/rocksdb/mvcc/db.go 57.78% <57.14%> (-1.49%) ⬇️
sei-db/wal/utils.go 56.71% <0.00%> (ø)
sei-db/state_db/sc/memiavl/multitree.go 78.54% <77.41%> (-1.01%) ⬇️
sei-db/wal/changelog.go 0.00% <0.00%> (ø)
sei-db/state_db/sc/memiavl/db.go 64.53% <62.22%> (+1.39%) ⬆️
... and 1 more

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this Yiming! Before a full review can I ask you to fix all the ignored errors across the PR please?

I also recommend avoiding the unnecessary package nesting, or the word "generic" in the package name. A typical package structure in Go is typically flatter and less hierarchical than languages like Java. I recommend avoiding common package naming like types because there are already many such packages across the codebase and working with the package name conflicts would require alias, which can lead to inconsistent alias naming and ultimately higher cognitive cycles when reading the code.

* main:
  feat: add generic KV interfaces + Pebble adapter (#2666)
  Make SSTORE chain param height-aware (#2667)
  fix: cosmos: protect coin denom regexp with a lock (#2660)
  Install CA certs on Ubuntu base image (#2658)
  Check storage is non-nil before attempting to close it (#2659)
@yzang2019
Copy link
Contributor Author

Thanks for this Yiming! Before a full review can I ask you to fix all the ignored errors across the PR please?

I also recommend avoiding the unnecessary package nesting, or the word "generic" in the package name. A typical package structure in Go is typically flatter and less hierarchical than languages like Java. I recommend avoiding common package naming like types because there are already many such packages across the codebase and working with the package name conflicts would require alias, which can lead to inconsistent alias naming and ultimately higher cognitive cycles when reading the code.

Great suggestion! Will look into the naming here

t.UpdateCommitInfo()
replayElapsed := time.Since(startTime).Seconds()
t.logger.Info(fmt.Sprintf("Total replayed %d entries in %.1fs (%.1f entries/sec).\n",
replayCount, replayElapsed, float64(replayCount)/replayElapsed))

Check notice

Code scanning / CodeQL

Floating point arithmetic Note

Floating point arithmetic operations are not associative and a possible source of non-determinism
return
if wal := db.GetWAL(); wal != nil {
catchupStart := time.Now()
if err := mtree.Catchup(ctx, wal, db.walIndexDelta, 0); err != nil {

Check warning

Code scanning / CodeQL

Calling the system time Warning

Calling the system time may be a possible source of non-determinism
@yzang2019 yzang2019 requested a review from blindchaser January 14, 2026 01:55
Copy link
Collaborator

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main blockers:

  • Potential resource leakage.
  • Concurrent use safety of WAL

Other than those I left a bunch of feedback across the code

Thanks for working on this @yzang2019 ! 🍻 This is nearly there


for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
os.WriteFile(filepath.Join(dir, "00000000000000000001"), tc.logs, 0o600)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert no error?


func TestTruncateAfter(t *testing.T) {
changelog := prepareTestData(t)
defer changelog.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert no error, i.e. graceful closure? Ditto for the rest of the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed all

}

func TestLogPath(t *testing.T) {
path := LogPath("/some/dir")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "util" LogPath is only used once. It adds an additional 3 lines to the codebase and makes the reader jump into another file to follow what the code does. This is another example of over-refactoring that hardly adds any benefit at all I'm afraid.

Comment on lines +130 to +150
go func() {
defer walLog.wg.Done()
for entry := range ch {
bz, err := walLog.marshal(entry)
if err != nil {
walLog.recordAsyncWriteErr(err)
return
}
nextOffset, err := walLog.NextOffset()
if err != nil {
walLog.recordAsyncWriteErr(err)
return
}
err = walLog.log.Write(nextOffset, bz)
if err != nil {
walLog.recordAsyncWriteErr(err)
return
}

}
}()

Check notice

Code scanning / CodeQL

Spawning a Go routine Note

Spawning a Go routine may be a possible source of non-determinism
Comment on lines +221 to +248
go func() {
defer walLog.wg.Done()
ticker := time.NewTicker(pruneInterval)
defer ticker.Stop()
for {
select {
case <-walLog.closeCh:
return
case <-ticker.C:
lastIndex, err := walLog.log.LastIndex()
if err != nil {
walLog.logger.Error("failed to get last index for pruning", "err", err)
continue
}
firstIndex, err := walLog.log.FirstIndex()
if err != nil {
walLog.logger.Error("failed to get first index for pruning", "err", err)
continue
}
if lastIndex > keepRecent && (lastIndex-keepRecent) > firstIndex {
prunePos := lastIndex - keepRecent
if err := walLog.TruncateBefore(prunePos); err != nil {
walLog.logger.Error(fmt.Sprintf("failed to prune changelog till index %d", prunePos), "err", err)
}
}
}
}
}()

Check notice

Code scanning / CodeQL

Spawning a Go routine Note

Spawning a Go routine may be a possible source of non-determinism
@blindchaser
Copy link
Contributor

LGTM. not a blocker, but some thoughts on adding test coverage:

Concurrency testing:

  • It would be valuable to add a testings that exercises concurrent WAL operations (e.g. async writes while truncation/pruning is running, or Close() racing with in-flight writes) to ensure there are no deadlocks, panics, or log corruption under concurrent access.

Isolation testing:

  • Since WAL/changelog is now scoped per DB, it would be good to add a multi-DB isolation test:
    • open two DBs (separate dirs) in the same process,
    • interleave commits on both,
    • trigger snapshot rewrite and/or WAL truncation on DB A,
    • assert DB B’s WAL offsets, CommittedVersion, and state are unaffected.

@yzang2019 yzang2019 requested a review from masih January 14, 2026 22:26
@yzang2019
Copy link
Contributor Author

LGTM. not a blocker, but some thoughts on adding test coverage:

Concurrency testing:

  • It would be valuable to add a testings that exercises concurrent WAL operations (e.g. async writes while truncation/pruning is running, or Close() racing with in-flight writes) to ensure there are no deadlocks, panics, or log corruption under concurrent access.

Isolation testing:

  • Since WAL/changelog is now scoped per DB, it would be good to add a multi-DB isolation test:

    • open two DBs (separate dirs) in the same process,
    • interleave commits on both,
    • trigger snapshot rewrite and/or WAL truncation on DB A,
    • assert DB B’s WAL offsets, CommittedVersion, and state are unaffected.

Make sense for concurrency point, for isolation I dont think we need to test that much, since different WAL in different folder should be fully isolated, so I dont think there's any concern around that

* main:
  fix: lthash worker loop break; remove unreachable digest.Read fallback (#2698)
Copy link
Collaborator

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No obvious blockers that I can see.

I see you have chosen to ignore the note on the resource leak. Please remember to continuously check for usage of those constructors, because not explicitly cleaning up things would leave the correctness to chance - the correctness can rot by accident.

t.Fatalf("Open: %v", err)
}
defer func() { _ = db.Close() }()
t.Cleanup(func() { require.NoError(t, db.Close()) })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small note on t.Cleanup vs defer: t.Cleanup is executed after the test belonging to t ends. Whereas, defer is executed when the function returns.

We do not always want one or the other. For example, in a test helper function we usually want t.Cleanup if it creates resources for the tests that are done after function returns.

In this specific case and the remaining changes here it makes no difference in using one or the other. But I thought I point out the difference so that we are careful not to always use either of those.

streamHandler, err := changelog.NewStream(logger, utils.GetChangelogPath(opts.Dir), changelog.Config{
// MemIAVL owns changelog lifecycle: always open the WAL here.
// Even in read-only mode we may need WAL replay to reconstruct non-snapshot versions.
streamHandler, err := wal.NewChangelogWAL(logger, utils.GetChangelogPath(opts.Dir), wal.Config{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far probably for the current implementation?

There is one way to make sure resource leaks cannot happen no matter how the constructor is used: On partial instantiation, the constructor should clean up after itself.

tree.WaitToCompleteAsyncWrite()
}

if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but what async process?

Looking at the code the entire operation is executed sequentially. We just happen to pass "process" as a function. Right?

@yzang2019
Copy link
Contributor Author

yzang2019 commented Jan 15, 2026

No obvious blockers that I can see.

I see you have chosen to ignore the note on the resource leak. Please remember to continuously check for usage of those constructors, because not explicitly cleaning up things would leave the correctness to chance - the correctness can rot by accident.

Oh actually I didn't ignore that, but the new change doesn't throw any error i WAL open function because we are not checking lastOffset any more, so it won't have that issue.

Haven't really checked other places such as MemIAVL open function though

@yzang2019 yzang2019 merged commit ef2912e into main Jan 16, 2026
39 checks passed
@yzang2019 yzang2019 deleted the yzang/redesign-changelog branch January 16, 2026 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants