Skip to content

feat: persist element count in JSON serialization#31

Draft
lidel wants to merge 3 commits intomasterfrom
feat/persist-elements
Draft

feat: persist element count in JSON serialization#31
lidel wants to merge 3 commits intomasterfrom
feat/persist-elements

Conversation

@lidel
Copy link
Member

@lidel lidel commented Mar 4, 2026

Important

merge #30 first and rebase

add Elements *uint64 to the JSON export struct so that ElementsAdded() survives JSONMarshal/JSONUnmarshal round-trips. this enables callers to size a replacement bloom filter from a previously persisted one without re-counting.

Why

Prerequisite for provide system improvements I'm working on.

Enables smarter filter size estimation based on previous runs or cyclical process, such as provide sweep in Kubo. In simple terms, allows us to skip enumerating entire datastore to learn what is the sensible filter size, and instead run first one as best-effort, and learn the count from that. This is especially efficient if provide strategy other than "all" is used (reducing memory requirements for pins+mfs etc).

Changes

  • bbloom.go: add Elements field to bloomJSONImExport, store in marshal(), restore in JSONUnmarshal() (nil = old format, *0 = empty)
  • bbloom.go: clarify ElementsAdded godoc re Add vs AddIfNotHas
  • bbloom_test.go: round-trip test confirming count survives
  • bbloom_test.go: backward compat test with old JSON (no Elements)

lidel added 2 commits March 4, 2026 19:14
the siphash keys were hardcoded as 0xdeadbeaf and 0xfaebdaed.
anyone can read these from the source and craft inputs that hash
to the same bit positions, filling the filter faster and raising
false positives.

add NewWithKeys(k0, k1, ...) so callers can supply their own
random keys (e.g. generated once per node). this restores the
collision resistance that siphash is designed to provide.

- sipHash.go: extract siphash constants and default keys, read
  k0/k1 from the Bloom struct instead of using hardcoded values
- bbloom.go: add k0/k1 fields, add NewWithKeys constructor,
  persist custom keys in JSON (omitted when using defaults)
- bbloom_test.go: tests for custom keys, JSON round-trip with
  custom keys, default keys omitted from JSON
- doc.go: mention NewWithKeys for untrusted data
add `Elements *uint64` to the JSON export struct so that
`ElementsAdded()` survives `JSONMarshal`/`JSONUnmarshal` round-trips.
this enables callers to size a replacement bloom filter from a
previously persisted one without re-counting.

- bbloom.go: add Elements field to bloomJSONImExport, store in
  marshal(), restore in JSONUnmarshal() (nil = old format, *0 = empty)
- bbloom.go: clarify ElementsAdded godoc re Add vs AddIfNotHas
- bbloom_test.go: round-trip test confirming count survives
- bbloom_test.go: backward compat test with old JSON (no Elements)
@lidel lidel marked this pull request as draft March 4, 2026 23:18
@lidel
Copy link
Member Author

lidel commented Mar 4, 2026

On second thought, parking as draft.
We could read ElementsAdded() and persist that. This makes sense only if we are going to persist filters as JSON for anything (TBD)

Base automatically changed from feat/custom-siphash-keys to master March 13, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant