Skip to content

SDSTOR-21797 Propose scrub flatbuffers schema and blob scrub design adr#407

Open
JacksonYao287 wants to merge 2 commits intoeBay:mainfrom
JacksonYao287:SDSTOR-21797
Open

SDSTOR-21797 Propose scrub flatbuffers schema and blob scrub design adr#407
JacksonYao287 wants to merge 2 commits intoeBay:mainfrom
JacksonYao287:SDSTOR-21797

Conversation

@JacksonYao287
Copy link
Copy Markdown
Collaborator

No description provided.

@JacksonYao287 JacksonYao287 changed the title propose scrub flatbuffers schema design SDSTOR-21797 propose scrub flatbuffers schema design Apr 10, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 10, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.30%. Comparing base (1746bcc) to head (77ec375).
⚠️ Report is 171 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #407      +/-   ##
==========================================
- Coverage   63.15%   54.30%   -8.86%     
==========================================
  Files          32       36       +4     
  Lines        1900     5274    +3374     
  Branches      204      657     +453     
==========================================
+ Hits         1200     2864    +1664     
- Misses        600     2110    +1510     
- Partials      100      300     +200     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated
scrub_info: ScrubInfo;
blob_scrub_results: [BlobScrubResultEntry];
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you have an option of

table ScrubResultEntry {
  shard_id:     uint64;
  blob_id:      uint64;
  scrub_result: ScrubResult;  // default = NONE → absent for healthy blobs (saves space)
  hash:         [ubyte];      // absent for shallow scrub or corrupted blobs
}


// use single scrub map for Blob/Shard/PG
table ScrubMap {
    reqID:         uint64;
    type:           ScrubType;
    scrub_result: [ScrubResultEntry]
}

For shard , the blobid field left empty
For PG, both shard_id/blob_id left empty.

Copy link
Copy Markdown
Collaborator Author

@JacksonYao287 JacksonYao287 Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for shard, I want ScrubResultEntry contains shard_id and blob count,
for pg, I want ScrubResultEntry contains the shard count and blob count of this pg.

so , they will not be empty.

table ScrubResultEntry {
  key_1:     uint64;
  key_2:      uint64;
  scrub_result: ScrubResult;  // default = NONE → absent for healthy blobs (saves space)
  hash:         [ubyte];      // absent for shallow scrub or corrupted blobs
}

table ScrubMap {
    scrub_info:         ScrubInfo;
    scrub_result: [ScrubResultEntry]
}

this is a more generic definition.
if scrub type is deep_blob/shallow blob, key_1 is shard_id, key_2 is blob_i2
if scrub type is depp_shard/shallow_shard, key_1 is shard_id, key_2 is blob count
if scrub type is pg_meta, key_1 is total_shard_count, key_2 is total_blob_count.

total_shard_count and total_blob_count be useful to compare the replica at a hight level

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you expect the blob_count on PG level be used? If we can keep one result per entry it is more straightforward.

Also, maybe we can change ScrubResult to int64 , use negative for error? and positive for value?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don`t have a clear for this so far. I think they will useful in some scenario in the future and it is transfered only once(pg_meta scrub is executed only once) and not expensive, so I put here.

positive for value?
the value stands for what for a single blob?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If doesn't impact single blob
NONE: 0
ERROR: NEGTIVE
POSITIVE: APP_defined.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you expect APP_defined be used? or just keep it for later extension?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For shard it is blob_count
for pg it is shard count (optionally if we really want blob count, hash field can be used)
for blob, reserve for later extension

hash: [ubyte]; // absent for shallow scrub or corrupted blobs
}
table BlobScrubMap {
scrub_info: ScrubInfo;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is scrub_info useful in response?
if only a few fields useful, can we just taking those fields?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost all the fileds are used to check if the BlobScrubMap is the one I want. that is , if it matches the current req of leader
https://github.com/eBay/HomeObject/pull/395/changes#diff-a2f2f7f16456c62c69e3a806813dbbe974551df7c32fdce849596b10dadd215cR143-R144

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if sm restart, req_id will start from 0. then it probably received scrub map from a stale scrub req, which has the same req id.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your TO-DO clearly indicating the right path --- now a lot of fields used trying to make a request uniq, however, it can be achieved by a randomized requestID, uuid, etc.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requestID LGTM

shard_id: uint64;
blob_id: uint64;
scrub_result: ScrubResult; // default = NONE → absent for healthy blobs (saves space)
hash: [ubyte]; // absent for shallow scrub or corrupted blobs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the hash algorithm we are targeting?
IIRC I did a research and seems like CRC64NVME is the best candidate, combining both computational expense as well as strengthen.

If it is CRC32/CRC64NVME, then we can use uint64 for it, avoiding the byte arrary.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, the hash algorithm we are using is CRC32

however, we support multiple hash algorithms now, and the blob_max_hash_len is 32,.

to keep compatible if algorithm changes, we have to use byte arrary, even if we know it is 32bit(4B) now

Copy link
Copy Markdown
Collaborator

@xiaoxichen xiaoxichen Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the blob hash we stored on disk, doesnt necessary to match the hash we used for scrubbing, for example if we decide to use SHA256 for scrubbing , we can do that by sending back SHA256 of each blob in BlobMap. Or reverse, if data stored locally use MD5, we can still use CRC32 for scrubbing.

I think this isolation should be part of the scrubber design , for example if we change the blob hashing algorithm on PutBlob side, there will be a mixture of blobs with hash_algo1 and hash_algo2. Even worse, due to rolling upgrade of different FD, a single blob's checksum algorithm, might not be the same across replicas.

Shortcut can be taken if blob hashing algo match with scrubbing algo, but this should be an optimization , althouhg the optimization works for 99% cases.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,let`s use uint32 here for hash instead of byte array。 if we change the hash algorithm for put_blob in the future, we can use a seperate PR to add a crc32 as the hash algorithm dedicated for scrubbing.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a separate discussion/reasoning regarding what would be the best hashing algorithm for scrubbing purpose.

Suggeting uint64 based on my previous analyze as it suggest to CRC64NVM

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated
Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated
@JacksonYao287 JacksonYao287 changed the title SDSTOR-21797 propose scrub flatbuffers schema design SDSTOR-21797 Propose scrub flatbuffers schema and blob scrub design adr Apr 10, 2026
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Copy link
Copy Markdown
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipped review for the flatbuffer review, only the blob-range as the later one depends on the first one.

Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md
Comment thread docs/adr/scrub-blob-range-coverage.md
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated
/// absent (→ 0) for META, SHALLOW_BLOB, and for any corrupted blob.
table ScrubResultEntry {
shard_id: uint64;
value: uint64;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as default field do not transfer, why dont we add blobk_id ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don`t get this question. value is blob_id for blob scrub.
yes, add blob_id is harmless. to make the fbs itself as simple as possible, the less Items the better.
do you mean we can just rename value here to blob_id and we use it as blob count for shard scrub?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think we can add a blobID...using value for blobID is very tricky and hard to follow

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename value here to blob_id
will this be better? for shard scrub, the blob_id means blob count.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed value to blob_id, as a result, for meta scrub, blob_id holds the shard`s blob count

Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated
Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated
Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated
Copy link
Copy Markdown
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, ideally if there is a sequence diagram can be generated by ai, will be easier to follower

Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated
Comment thread docs/adr/scrub-blob-range-coverage.md Outdated
@JacksonYao287 JacksonYao287 force-pushed the SDSTOR-21797 branch 2 times, most recently from e61fab5 to 9fa82a8 Compare April 15, 2026 09:00
Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated
/// Randomised per-request identifier used by the receiver to match a response to
/// the outstanding request and to discard stale responses from a previous scrub
/// epoch. The leader generates a distinct req_id for every (replica, request) pair:
/// two replicas receiving the same scrub round will each see a different req_id,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments for req_id

This design combines data produced during the shard scrub phase with a dynamic bin-packing algorithm and a streaming response protocol to satisfy all three constraints simultaneously.

### Step 1: Build shard_blob_count from the Shard Scrub Phase
### Step 1: META Scrub
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxichen pls take a look , I have add the decription for meta scrub

Copy link
Copy Markdown
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

A few other thinking point you might doc in here or other doc, or re-purpose the first doc into scrub flow etc.

  1. Step 1 when qeurying the PG btree, assuming all the shards are big shards, consider the lock intention with incoming IOs.
  2. How the shard metablk crc works
  3. When leader starts a scrubbing task, it is not inside the state-machine which means the read to lsn, last_shard_id, last_blobid is not transactional. Say LSN 101 is a create shard(0xAA) and LSN 102 is a put blob(0xBB), if we read <LSN=100, 0xAA, 0xBB> that will cause some false-positive of missing shard/missing PG. If we do spot check then it is not a big deal. A 60% confidence thought is that we read the LSN at last which ensures the scrub-lsn covers last_shard_id and last_blob_id, make it <LSN=102, 0xAA, 0xBB>. More carefully thinking to bump up the confidence level is needed.

@JacksonYao287
Copy link
Copy Markdown
Collaborator Author

1 Step 1 when qeurying the PG btree, assuming all the shards are big shards, consider the lock intention with incoming IOs.

IIUC, when we do range query, let`s say the range covers 2 btree nodes. when we scan the second node, [we will not lock the first node] (https://github.com/eBay/HomeStore/blob/860330af43973ff99a862f6da17329de045dedc1/src/include/homestore/btree/detail/btree_query_impl.ipp#L33) ?.
I need to confirm this by checking the code carefully. thanks for putting the point here , I will check it later since LLM is not available now(504 error) , sigh...

How the shard metablk crc works

the key point for pg/shard crc is which fields we will use to calculate the metablk crc, since some fields will probably be different among different replicas. we can discuss it later.

lsn, last_shard_id, last_blobid is not transactional.

pls see
https://github.com/JacksonYao287/HomeObject/blob/40463ac34020dd3f06cf26c8bb80d2dbfe77e9d6/src/lib/homestore_backend/scrub_manager.cpp#L792

and

https://github.com/JacksonYao287/HomeObject/blob/40463ac34020dd3f06cf26c8bb80d2dbfe77e9d6/src/lib/homestore_backend/scrub_manager.cpp#L858

in my implementation, we get last_shard_id and last_blob_id before getting scrub_lsn. This can ensure that when committing to scrub_lsn, a replica should at least see last_shard_id and last_blob_id. this seems basically the same point as you put here.

@xiaoxichen
Copy link
Copy Markdown
Collaborator

makes sense, just putting them into doc as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants