-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Background
SparseBytes is the data structure used for memory representation. Instead of storing a full byte array, memory is kept as a list of chunks: empty regions (#empty(N)) and concrete byte blocks (#bytes(Bs)). The memory always starts as a single #empty(N) chunk, where N is the size of the initial memory. When writes occur, empty regions are split and #bytes chunks are inserted in their place. This structure keeps large mostly-empty memories efficient, but relies on chunks remaining as coarse-grained as possible.
SparseBytes was introduced as an optimization for the MultiverseX semantics and Kasmer, where contracts often had very large memories with huge untouched regions. This representation significantly reduced configuration size and made symbolic execution feasible on those workloads.
Description
The following replaceAtZ rule in the SparseBytes implementation introduces unnecessary fragmentation.
wasm-semantics/pykwasm/src/pykwasm/kdist/wasm-semantics/wasm-data/sparse-bytes.k
Lines 165 to 169 in 91637f9
| ///////// len(Bs) = N | |
| rule replaceAtZ(N, REST, S, Bs) | |
| => SBChunk(#bytes(Bs)) REST | |
| requires 0 ==Int S | |
| andBool lengthBytes(Bs) ==Int N |
When writing to the end of an #empty chunk, the rule always creates a new #bytes chunk, even if the next chunk in REST is already a #bytes chunk. This leads to an explosion of adjacent small chunks under workloads with many small writes (e.g., copying an array from host to memory). This rule blindly inserts a new SBChunk(#bytes(Bs)) in front of REST. If REST begins with another #bytes chunk, the two byte blocks should logically be adjacent, but the rule keeps them separate. Over time this produces significant fragmentation in the SparseBytes list.
Proposed fix
Before creating a new chunk, check whether REST begins with a #bytes chunk. If it does, merge Bs with that chunk instead of prepending a new one. This keeps consecutive byte regions contiguous and prevents unnecessary list growth.