Skip to content

Implement the build side of spec shaking#2353

Open
leighmcculloch wants to merge 39 commits intomainfrom
spec-clean
Open

Implement the build side of spec shaking#2353
leighmcculloch wants to merge 39 commits intomainfrom
spec-clean

Conversation

@leighmcculloch
Copy link
Member

@leighmcculloch leighmcculloch commented Jan 8, 2026

What

Filter unused type and event definitions from the contractspecv0 WASM section during stellar contract build using markers embedded by the SDK.

Changes:

  • Add replace_custom_section() in new soroban_spec_tools::wasm module for WASM custom section replacement
  • Integrate spec shaking into the build process using soroban_spec::shaking from the SDK
  • Gate filtering on the rssdkfeat=experimental_spec_shaking_v2 contract meta entry
  • Set SOROBAN_SDK_BUILD_SYSTEM_SUPPORTS_SPEC_SHAKING_V2 env var during builds to signal SDK support
  • Add comprehensive integration tests and test fixture

Why

Contract WASM files can contain unused type and event definitions in the contractspecv0 custom section that inflate binary size unnecessarily.

The problem: Any exported contracttypes or contractevents defined in the SDK or any library that don't get used in the importing contract still end up in the contract spec. This results in awkward behaviour where we avoid exporting contracttypes in the SDK, even though doing so would be convenient and enable using contracttypes more frequently there.

Why the SDK can't fix this alone: Because of how proc-macros run in isolation and cannot coordinate in WASM builds in Rust (the ctor crate cannot be used like we use it in non-wasm builds in tests), there's nothing the soroban-sdk can do to identify whether a contracttype that sometimes needs to be exported hasn't been used and can be excluded.

Solution: The SDK now embeds markers in the WASM data section for each type/event that is actually used. These markers survive dead code elimination only if the corresponding code is reachable. The CLI extracts these markers and filters the spec accordingly.

Examples of unused entries that get filtered:

  • Error enums from imported libraries that are never used by any function
  • Types that are defined but never referenced in reachable code
  • Events that are defined but never published
  • Transitive type dependencies that no longer apply after refactoring

Close #2030
Close stellar/rs-soroban-sdk#1569
Close stellar/rs-soroban-sdk#1330

How

The filtering uses markers embedded by the SDK (via soroban_spec::shaking):

Marker format (14 bytes):

  • Bytes 0-5: SpEcV1 magic prefix
  • Bytes 6-13: first 64 bits of SHA256 hash of the spec entry XDR bytes

Filtering process:

  1. After cargo build completes, parse the built WASM
  2. Check contract meta for rssdkfeat=experimental_spec_shaking_v2 — if absent, skip filtering (SDK does not support spec shaking)
  3. Scan the WASM data section for SpEcV1 markers via soroban_spec::shaking::find_all()
  4. Filter spec entries via soroban_spec::shaking::filter():
    • Functions: always kept (they define the contract's API)
    • Types/events: kept only if a matching marker exists in the data section
  5. Replace the contractspecv0 custom section with the filtered version

Why this works: The SDK's proc-macros generate code that references a static marker when a type/event is used. If that code path survives dead code elimination (DCE), the marker remains in the data section. If the type/event is never used, DCE removes the code and the marker disappears.

Build integration: The CLI sets SOROBAN_SDK_BUILD_SYSTEM_SUPPORTS_SPEC_SHAKING_V2=1 during builds to inform the SDK that the build system supports marker-based filtering.

WASM manipulation: The new soroban_spec_tools::wasm::replace_custom_section() function rebuilds the WASM, copying all sections verbatim except the target custom section, which is replaced with new content at the end. If multiple custom sections share the same name, all are consolidated into one.

SDK Support Required

This feature requires SDK support for embedding markers. See: stellar/rs-soroban-sdk#1571

Try It Out

Install the modified stellar-cli that strips unused specs:

cargo install --locked --git https://github.com/stellar/stellar-cli --branch spec-clean stellar-cli

Import the modified soroban-sdk that marks used specs:

[dependencies]
soroban-sdk = { git = "https://github.com/stellar/rs-soroban-sdk", branch = "spec-markers", features = ["experimental_spec_shaking_v2"] }

[dev-dependencies]
soroban-sdk = { git = "https://github.com/stellar/rs-soroban-sdk", branch = "spec-markers", features = ["testutils", "experimental_spec_shaking_v2"] }

Build using the stellar-cli:

stellar contract build

Inspect the contract interface and it should only show types actually used at the boundary of the contract (inputs, outputs, and events):

stellar contract info interface --wasm target/wasm32v1-none/release/contract.wasm

@github-project-automation github-project-automation bot moved this to Backlog (Not Ready) in DevX Jan 8, 2026
@leighmcculloch leighmcculloch requested a review from Copilot January 8, 2026 15:05
@socket-security
Copy link

socket-security bot commented Jan 8, 2026

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automatic filtering of unused types from contract specifications during the build process. It implements reachability analysis using fixed-point iteration to identify and preserve only types that are transitively referenced by contract functions, while always preserving functions and events. The implementation replaces wasm-gen with wasm-encoder for more flexible WASM manipulation capabilities.

Key changes:

  • Implements a new filter module with reachability analysis to detect unused UDTs
  • Refactors WASM manipulation to use wasm-encoder instead of wasm-gen, enabling section replacement
  • Integrates filtering into the contract build pipeline after metadata injection

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cmd/soroban-cli/src/commands/contract/build.rs Adds filter_spec call to build pipeline; refactors inject_meta to use wasm-encoder; adds new error types
cmd/soroban-cli/Cargo.toml Replaces wasm-gen dependency with wasm-encoder 0.235.0
cmd/crates/soroban-spec-tools/src/lib.rs Exports new filter module
cmd/crates/soroban-spec-tools/src/filter.rs Implements reachability analysis and filtering logic with comprehensive unit tests
cmd/crates/soroban-spec-tools/src/contract.rs Adds filter_unused_types, filtered_spec_xdr methods and replace_custom_section utility; includes tests
cmd/crates/soroban-spec-tools/Cargo.toml Adds wasm-encoder 0.235.0 dependency
Cargo.lock Updates dependencies with wasm-encoder and wasmparser 0.235.0; removes obsolete wasm-gen dependencies

@leighmcculloch leighmcculloch changed the title Strip unused types from contract spec during build Strip unused types/events/errors from contract spec during build Jan 9, 2026
@leighmcculloch leighmcculloch changed the title Strip unused types/events/errors from contract spec during build Add post-build stripping of unused types/events/errors from contract spec Jan 9, 2026
Copy link
Contributor

@mootz12 mootz12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Only left nitpicks.

@leighmcculloch leighmcculloch changed the title Add post-build stripping of unused types/events/errors from contract spec Implement the build side of spec shaking using dead code elimination and data section of wasm Mar 3, 2026
@leighmcculloch leighmcculloch changed the title Implement the build side of spec shaking using dead code elimination and data section of wasm Implement the build side of spec shaking Mar 3, 2026
@leighmcculloch leighmcculloch marked this pull request as ready for review March 3, 2026 11:42
@leighmcculloch leighmcculloch requested a review from a team as a code owner March 3, 2026 11:42
Copy link
Contributor

@mootz12 mootz12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a couple of pedantic comments inline.

Leaving review as comment for now - will revise after doing some manual testing.

@leighmcculloch
Copy link
Member Author

leighmcculloch commented Mar 4, 2026

Opened a couple related issues as a result of this work:

I've opened PRs for both, if those PRs merged before this one, this change will benefit from them.

leighmcculloch added a commit that referenced this pull request Mar 5, 2026
### What
Add `Spec::verify()` to `soroban-spec-tools` that checks every UDT
referenced in function signatures, event params, and type definitions is
defined within the spec. Integrate the check into `contract build` to
emit warnings after a successful build. Add a hidden `contract
spec-verify` subcommand for standalone verification of a WASM file.

### Why
Contracts can compile and deploy with incomplete specs when referenced
types are missing — for example, due to spec shaking in the SDK.
Downstream tooling such as bindings generators and explorers silently
fail when types are unresolvable. Surfacing this at build time gives
developers an early, actionable signal.

This will also act as a backup safety to help alert us and developers if
changes we make and experiment with wrt spec shaking in #2353 cause
specs to disappear that were otherwise needed.

Close #2425
github-merge-queue bot pushed a commit to stellar/rs-soroban-sdk that referenced this pull request Mar 5, 2026
…f wasm (#1672)

### What

Implement spec shaking using dead code elimination and data section of
wasm. Embed markers in the WASM data section for each type/event that is
used at external boundaries. These markers enable post-build tools to
filter unused entries from the `contractspecv0` section.

Also, all contracttypes, contracterrors, and contractevents are exported
in contract specs now if they are directly or indirectly referenced by a
function parameter, return value, event.

### Why

Contract WASM files can contain unused type and event definitions in the
`contractspecv0` custom section that inflate binary size unnecessarily.

No shaking of specs occurrs today.

Because of this contract imports never export types otherwise importing
one contract would export all its types.

Because of how proc-macros run in isolation and cannot coordinate in
WASM builds in Rust, there's nothing the soroban-sdk can do to identify
whether a contracttype that sometimes needs to be exported hasn't been
used and can be excluded.

The macros can however embed markers in the regular data section for
each type/event when it's used. These markers survive DCE only if the
corresponding code is reachable. A post-build tool (stellar-cli) can
extract these markers and filter the spec accordingly.

Close #1569
Close stellar/stellar-cli#2030

### Side-effects

A side-effect of this change, is that types/events brought into a
contract from a `contractimport` can now be automatically exported from
that contract if they use the types at their boundary. This is why the
change also removes the `export = false` from types/events that come
from a `contractimport`.

Close #1330

### How

The SDK embeds markers using volatile reads in conversion/publish
functions:

**Marker format (12 bytes):**
- Bytes 0-5: `SpEcV1` magic prefix
- Bytes 6-13: 64-bit truncated SHA256 hash of the spec entry XDR bytes

**Embedding mechanism:**
```rust
impl SpecShakingMarker for MyType {
    #[inline(always)]
    fn spec_shaking_marker() {
        #[cfg(target_family = "wasm")]
        {
            static MARKER: [u8; 14] = *b"SpEcV1\x93\xd5\x05\x04\x36\x68\x40\xd2";
            let _ = unsafe { ::core::ptr::read_volatile(MARKER.as_ptr()) };
        }
    }
}
```

**When markers are included:**
- Types: `spec_shaking_marker()` is called from `TryFromVal`
implementations via `IncludeSpec` trait
- Events: `spec_shaking_marker()` is called from the `publish()` method

**Why this works:** If a type/event is never used at an external
boundary, its `__include_spec_marker()` function is never called, and
DCE removes both the function and its marker. If the type is used, the
marker survives in the data section.

**Type identities:** Note that a long time problem with doing any sort
of work like this has been identifying the identity of one spec type vs
another. That is improved and worked around in this change by using a
hash of the spec entry rather than the name of the spec entry.

### CLI Support Required

This feature requires CLI support for filtering specs using these
markers. See: stellar/stellar-cli#2353

### Try It Out

Install the modified `stellar-cli` that strips unused specs:
```
cargo install --locked --git https://github.com/stellar/stellar-cli --branch spec-clean stellar-cli
```

Import the modified `soroban-sdk` that marks used specs:
```
[dependencies]
soroban-sdk = { git = "https://github.com/stellar/rs-soroban-sdk", branch = "spec-markers", features = ["experimental_spec_shaking_v2"] }

[dev-dependencies]
soroban-sdk = { git = "https://github.com/stellar/rs-soroban-sdk", branch = "spec-markers", features = ["experimental_spec_shaking_v2", "testutils"] }"
```

Build using the stellar-cli:
```
stellar contract build
```

Inspect the contract interface and it should only show types actually
used at the boundary of the contract (inputs, outputs, and events):
```
stellar contract info interface --wasm target/wasm32v1-none/release/contract.wasm
```

### Todo
- [x] Figure out a roll out plan, because you can't really use this sdk
until you use the new cli, otherwise your specs will increase in size
rather than decrease because imported types are now exported then
stripped.
- [x] Evaluate the cost of the volatile read on a large contract (expect
impact is small)
- [x] Look for alternative ways other than a volatile read to force the
marker to stick around
   - [x] Try black box
   - [x] Try `#[used]`
- [x] Markers need to be included for any nested user defined types and
types inside containers
- [x] Come up with better name for feature
- [x] Come up with better name for env var (should it just be Stellar
CLI version?)
- [x] Write out every possible test case, with dimensions:
	- [x] type: struct, struct tuple, enum, enum int, error
- [x] containers: vec, map, option, result, tuple (0-12), struct, struct
tuple, enum
	- [x] mod: lib, sub
	- [x] visibility: pub, pub(crate), private
	- [x] imported: and not exported, and exported
- [x] context: contract fn param, non-contract fn param, contract fn
return, non-contract fn return, event topic field, event data field

---------

Co-authored-by: mootz12 <38118608+mootz12@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog (Not Ready)

3 participants