feat: persist element count in JSON serialization#31
Draft
Conversation
the siphash keys were hardcoded as 0xdeadbeaf and 0xfaebdaed. anyone can read these from the source and craft inputs that hash to the same bit positions, filling the filter faster and raising false positives. add NewWithKeys(k0, k1, ...) so callers can supply their own random keys (e.g. generated once per node). this restores the collision resistance that siphash is designed to provide. - sipHash.go: extract siphash constants and default keys, read k0/k1 from the Bloom struct instead of using hardcoded values - bbloom.go: add k0/k1 fields, add NewWithKeys constructor, persist custom keys in JSON (omitted when using defaults) - bbloom_test.go: tests for custom keys, JSON round-trip with custom keys, default keys omitted from JSON - doc.go: mention NewWithKeys for untrusted data
add `Elements *uint64` to the JSON export struct so that `ElementsAdded()` survives `JSONMarshal`/`JSONUnmarshal` round-trips. this enables callers to size a replacement bloom filter from a previously persisted one without re-counting. - bbloom.go: add Elements field to bloomJSONImExport, store in marshal(), restore in JSONUnmarshal() (nil = old format, *0 = empty) - bbloom.go: clarify ElementsAdded godoc re Add vs AddIfNotHas - bbloom_test.go: round-trip test confirming count survives - bbloom_test.go: backward compat test with old JSON (no Elements)
Member
Author
|
On second thought, parking as draft. |
# Conflicts: # bbloom.go # bbloom_test.go
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
merge #30 first and rebase
add
Elements *uint64to the JSON export struct so thatElementsAdded()survivesJSONMarshal/JSONUnmarshalround-trips. this enables callers to size a replacement bloom filter from a previously persisted one without re-counting.Why
Prerequisite for provide system improvements I'm working on.
Enables smarter filter size estimation based on previous runs or cyclical process, such as provide sweep in Kubo. In simple terms, allows us to skip enumerating entire datastore to learn what is the sensible filter size, and instead run first one as best-effort, and learn the count from that. This is especially efficient if provide strategy other than "all" is used (reducing memory requirements for pins+mfs etc).
Changes