[codex] fix mergeable cid encoding#1002
Draft
zxch3n wants to merge 1 commit into
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the unpublished recursive hex mergeable container id payload with a flattened path encoding.
The previous design encoded mergeable child identity by recursively embedding
parent.to_bytes()in the synthetic root name. When a mergeable map was nested under another mergeable map, that parent cid already contained its own parent payload, so the size grew roughly exponentially. This PR changes the synthetic root name to encode the nearest non-mergeable map parent once, followed by escaped map keys. Nested mergeable cid size is now linear in logical path length.Design: Synthetic Root Container ID
A mergeable child container is still represented as a synthetic
ContainerID::Root:The child container type is not encoded in
payload. It is carried byRoot.container_type, just like ordinary root containers. This means two mergeable cids with the same parent/key but different child types have the same root name string but remain differentContainerIDvalues becausecontainer_typediffers.There is no version byte in the payload. This replaces an unpublished format, so no old-format decoder is kept.
Payload Grammar
The payload is a flattened map path:
base-parentis the nearest non-mergeable map ancestor:$<escaped-root-name>means the base parent is a root map.@<peer-base36>:<counter-base36>means the base parent is a normal op-created map.Every segment after
base-parentis one map key. Intermediate mergeable parents are always maps, so their type is omitted. The final container's type isRoot.container_type.Example:
Parsing
Root { name: "🤝:$state>note-1>body", container_type: Text }returns:Encoding Algorithm
ContainerID::new_mergeable(parent, key, child_type)does:parent.container_type() == Map. This is a hard release assertion, not only a debug assertion, because the payload omits parent type.namewithMERGEABLE_NAMESPACE_PREFIX, currently🤝:.parentis a valid mergeable map root, reuse its existing payload without re-encoding the full parent cid.parentis a root map, append$plus the escaped root name.parentis a normal map, append@, the peer id in canonical lowercase base36,:, then the counter in canonical lowercase base36.>.key.ContainerID::Root { name, container_type: child_type }.This is the key property that prevents recursive growth: a nested mergeable parent contributes only its already-flattened payload, not serialized parent bytes.
Escaping Rules
Segments are escaped before being placed in the synthetic root name:
>is the only structural delimiter.\introduces an escape./and NUL are escaped so synthetic root names keep the old safety property that raw slash and raw NUL do not appear in root names.The parser rejects:
Base36 Rules
Normal op-created map parents use canonical lowercase base36:
Rules:
0-9a-zonly.0.-followed by the positive magnitude.-0is rejected.u64peer ori32counter rejects the payload.Examples:
Decoding Algorithm
ContainerID::parse_mergeable()does:ContainerID::Root.MERGEABLE_NAMESPACE_PREFIX; reject if absent.>.key.>, the parent is a mergeable map root:Root { name: "🤝:" + parent_payload, container_type: Map }.$...) or normal map (@peer:counter).(parent, key, Root.container_type).ContainerID::is_mergeable()uses the same structural validation. A root whose name merely starts with🤝:but does not match the grammar is treated as an ordinary root, not as a mergeable container.Map Slot Marker Relationship
This PR does not change the map slot marker format.
The parent
LoroMapslot still stores a compact binary activation marker:The marker binds
(parent container id, key, child type)and controls visibility of the mergeable child under that map key. The synthetic root cid controls deterministic child identity. These two layers remain separate:The only remaining
parent.to_bytes()in this area is inside marker CRC input. That is intentional and is not part of synthetic root name encoding.Compatibility
This changes an unpublished mergeable cid payload format before release. Existing public
setContainer/ regular child container identity is unchanged. Existing marker bytes are unchanged.Old hex/recursive mergeable cid name decoding is intentionally not retained. User-created root names are still rejected if they start with the reserved
🤝:namespace.Tests Added / Updated
Coverage includes:
>,\,/, NUL, and embedded🤝:substrings.-0.new_mergeable.Validation
rustfmt --check crates/loro-common/src/lib.rs crates/loro-internal/tests/mergeable_cid_encoding.rs crates/loro-wasm/src/lib.rs crates/loro-internal/tests/mergeable_container/events_and_paths.rscargo test -p loro-internal --test mergeable_cid_encoding -- --nocapturecargo test -p loro-common mergeable -- --nocapturecargo test -p loro-internal --test mergeable_container -- --nocapturecargo test -p loro --test mergeable_public_api -- --nocaptureNote: full-file
rustfmt --checkoncrates/loro-internal/src/handler.rsreports pre-existing unrelated formatting differences, so this PR avoids formatting that whole file.