Skip to content

feat(pb): actors v3#4463

Draft
MasterPtato wants to merge 1 commit into03-23-fix_ups_implement_queue_subsfrom
03-20-feat_pb_actors_v3
Draft

feat(pb): actors v3#4463
MasterPtato wants to merge 1 commit into03-23-fix_ups_implement_queue_subsfrom
03-20-feat_pb_actors_v3

Conversation

@MasterPtato
Copy link
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link

railway-app bot commented Mar 21, 2026

🚅 Deployed to the rivet-pr-4463 environment in rivet-frontend

Service Status Web Updated (UTC)
website 😴 Sleeping (View Logs) Web Mar 26, 2026 at 1:22 am
frontend-inspector ❌ Build Failed (View Logs) Web Mar 26, 2026 at 1:19 am
frontend-cloud ❌ Build Failed (View Logs) Web Mar 26, 2026 at 1:19 am
mcp-hub ✅ Success (View Logs) Web Mar 21, 2026 at 1:56 am
ladle ❌ Build Failed (View Logs) Web Mar 21, 2026 at 1:56 am

Copy link
Contributor Author

MasterPtato commented Mar 21, 2026

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 21, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4463

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4463

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4463

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4463

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4463

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4463

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4463

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4463

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4463

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4463

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4463

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4463

commit: 9d9049a

@claude
Copy link

claude bot commented Mar 21, 2026

PR Review: feat(pb): actors v3

This is a large, ambitious draft PR introducing actor v3 infrastructure. Since it's a draft, I'm flagging issues to address before it's ready for merge.


Critical Issues (must fix before merge)

1. Package name mismatch in pegboard-envoy/Cargo.toml
The package is named pegboard-runner but lives in the pegboard-envoy directory. routing/envoy.rs references PegboardEnvoyWsCustomServe, but lib.rs defines PegboardRunnerWsCustomServe. Pick one name and be consistent — this will cause build/dependency resolution issues.

2. Literal TODO placeholders that won't compile
Several TODO tokens appear as literal struct field values that are not valid Rust:

  • ctx.msg(Allocate { TODO }) in actor2/mod.rs
  • EnvoyStartFailed { TODO: TODO } in the ActorError enum

These will fail to compile and need to be resolved before this can be tested.

3. Undefined type UpdateStateAndDbOutput
update_state_and_db in actor2/mod.rs declares Result<UpdateStateAndDbOutput> as its return type, but this struct is never defined in the diff. Either define it or change the return type.

4. Missing ActorError::EnvoyConnectionLost variant
is_envoy_failure() and error-building code reference ActorError::EnvoyConnectionLost, but it does not appear in the ActorError enum declaration. This will be a compile error.

5. Wrong struct name in runtime.rs
Envoy::new appears to return a LifecycleRunnerState { ... } literal, which looks like a stale copy-paste from the v1 runner. It should return an Envoy { ... } struct.

6. Mismatched input type: SendMessagesToEnvoyInput vs SendMessagesToRunnerInput
mod.rs calls runtime::SendMessagesToEnvoyInput, but runtime.rs defines SendMessagesToRunnerInput. Code also references input.runner_id which is not a field on that struct (only envoy_key and messages).


Dependency Issues

7. hyper pinned outside workspace in both new Cargo.toml files
Both pegboard-envoy/Cargo.toml and pegboard-gateway2/Cargo.toml pin hyper = "1.6" with a TODO comment about the workspace version mismatch. Fix the workspace dependency rather than bypassing it.


Code Quality

8. Commented-out dead code
shared_state.rs has wrapping_lt commented out at the bottom. Remove it or use it.

9. Metrics label change is a breaking change
pegboard/src/metrics.rs changes ACTOR_ALLOCATE_DURATION labels from ["status"] to ["kind", "result"]. This is a breaking change for existing dashboards/alerts monitoring this metric — worth calling out in the PR description.

10. Duplicated lost_timeout_ts computation
In the GoingAway signal handler in actor2/mod.rs, lost_timeout_ts is computed twice identically. Verify there isn't a scoping error here.


Design / Architecture Questions

11. Naming: envoy vs runner
The PR introduces pegboard-envoy and renames the protocol in v1.bare (ToServer/ToClient to ToRivet/ToEnvoy), but many internal struct and package names still say "runner". Settle on one canonical term before merging.

12. Epoxy key reservation failure modes
The new keys.rs Epoxy-based key reservation is a significant new cross-DC consensus path. A comment explaining what happens if the reservation is committed in Epoxy but the UDB reserve_actor_key activity fails afterward would help reviewers and future maintainers.

13. Unresolved TODO comments in production paths

  • // TODO: Don't return actual error? appears multiple times in ws_to_tunnel_task.rs
  • // TODO: Add queue and bg thread for processing kv ops suggests the KV path may have ordering/concurrency issues under load

These should be resolved or tracked as follow-up issues before merge.


Minor

  • pegboard/src/workflows/actor/metrics.rs: the renamed function pegboard_actor2_metrics lives in the v1 actor module. If intentional, add a comment explaining why. If not, it likely belongs in actor2.
  • The typo fix (an serverless to a serverless) in actor/runtime.rs is unrelated to the feature and could be a separate commit.

Overall this is a substantial architectural improvement with sleep/wake support, the envoy protocol rename, and Epoxy-based key reservation. The main blockers are compile errors from placeholder TODO tokens and naming inconsistencies. Happy to re-review once those are addressed.

@claude
Copy link

claude bot commented Mar 21, 2026

Design / Logic Issues

5. Double-counted eviction metrics
Both tunnel_to_ws_task and ws_to_tunnel_task subscribe to separate eviction subjects and each increments EVICTION_TOTAL. When a runner is evicted, both tasks fire concurrently, doubling the counter per eviction.

6. Silent JSON deserialization errors in conn.rs
Actor metadata falls back to an empty map on parse failure via unwrap_or_default(). This silently swallows invalid data. At minimum, log a warning when falling back.

7. Wrong field mapping in Mk1 KV metadata
In the ws_to_tunnel_task.rs Mk1 handler, create_ts: x.update_ts looks like the wrong field is being mapped. It should likely be x.create_ts.

8. Unverified abort safety in demuxer GC
In actor_event_demuxer.rs, a TODO: Verify aborting is safe here precedes channel.handle.abort(). Aborting mid-dispatch could silently drop in-flight events. Confirm safety or use a graceful shutdown channel like shutdown() uses.

@claude
Copy link

claude bot commented Mar 21, 2026

Code Quality

9. Significant KV code duplication
handle_message_mk1 and handle_message_mk2 in ws_to_tunnel_task.rs share nearly identical KV operation handling (~400 lines), repeated for each operation type. Consider extracting a shared dispatcher that converts protocol versions once and delegates to common logic.

10. Unimplemented module
engine/packages/pegboard/src/workflows/actor2/allocator.rs is an empty file. Either add a placeholder comment or remove it until it has content.

11. TODOs indicating known gaps
The queue/bg-thread TODO means KV ops run inline in the WebSocket message loop, which could block message handling under load. The do-not-return-actual-error TODO means internal error details are exposed directly to runners in KV responses.

Minor / Style

12. Workflow rename is a breaking change
In engine/packages/pegboard/src/workflows/actor/metrics.rs, renaming from pegboard_actor_metrics to pegboard_actor2_metrics affects in-flight executions. Confirm this is intentional and that any migration path has been handled.

@claude
Copy link

claude bot commented Mar 21, 2026

PR Review: feat(pb): actors v3 (#4463)

State: Draft | Author: MasterPtato | Scale: ~11,190 additions / 118 deletions across 65 files

Summary

This PR introduces Actors V3 replacing the runner-based protocol with an envoy-based protocol, with three new packages (pegboard-envoy, pegboard-gateway2, pegboard-outbound), a new actor2 workflow, and a new BARE wire schema (engine/sdks/schemas/envoy-protocol/v1.bare). Dual-version routing in pegboard_gateway.rs dispatches to v1 or v2 based on a version field in UDB.


Compile Errors (blocking)

1. eviction_topic2 undefined — pegboard-envoy/src/lib.rs
Never defined in the tracing debug call. Leftover from a refactor.

2. init_packet undefined — pegboard-envoy/src/conn.rs
The deserialized result binds to init but the pattern match and debug log reference the nonexistent init_packet.

3. namespace_id undefined in error context — pegboard-envoy/src/conn.rs
Not in scope at the .with_context(...) call site; should be namespace.namespace_id.

4. PegboardEnvoyWsCustomServe not defined — guard/src/routing/envoy.rs
The public struct in pegboard-envoy is PegboardEnvoyWs, not PegboardEnvoyWsCustomServe.

5. protocol::mk2 module does not exist — pegboard-gateway2/src/ws_to_tunnel_task.rs
References to protocol::mk2::RequestId and related types. The mk2 submodule does not exist in rivet-envoy-protocol.

6. TODO; is not valid Rust syntax (two locations)

  • pegboard-envoy/src/ws_to_tunnel_task.rs inside the ToRivetStopping match arm
  • pegboard-outbound/src/lib.rs inside the RunnerConfigKind::Normal branch

These need to be todo!() or replaced with actual implementations.


Incomplete / Stub Implementation

7. Serverful actor path is unimplemented — pegboard-outbound/src/lib.rs
The RunnerConfigKind::Normal branch has two TODO; stubs for "Choose envoy from lb" and "Insert command to fdb". Serverful actors using actor2 will reach Allocated state then silently fail.

8. ToRivetStopping graceful drain is unimplemented — pegboard-envoy/src/ws_to_tunnel_task.rs
Envoy shutdowns will not trigger GoingAway signals on actors, breaking graceful restarts.

9. KV ops block the WebSocket receive loop — pegboard-envoy/src/ws_to_tunnel_task.rs
A TODO comment acknowledges KV operations are handled inline and will block message processing under load.


Logic / Design Issues

10. Wrong workflow subscribed in gateway2 tunnel_to_ws_task.rs
Subscribes to pegboard::workflows::actor::Stopped (v1) instead of actor2::Stopped. For v2 actors this will never fire, causing the gateway to not notice when a v2 actor stops.

11. Wake retry count reduced without explanation
In guard/src/routing/pegboard_gateway.rs, the v1 wake retry limit was reduced from 16 to 8 with no comment.

12. last_rtt loaded but never passed to update_ping op — pegboard-envoy/src/ping_task.rs
The RTT value is read but never passed to the update_ping op, so RTT is never persisted.

13. Metrics argument order inconsistency — pegboard-gateway2
record_req_metrics is called with arguments in different orders between lib.rs and metrics_task.rs, causing mismatched metrics tags.

14. Two-phase GC silent drop window — pegboard-gateway2/src/shared_state.rs
Between phase 1 (marking stopping = true) and phase 2 (removing closed entries), messages arriving at a stopping entry will be silently dropped. Should be documented.


Security Concerns

15. Auth is optional for envoy connections — guard/src/routing/envoy.rs
Auth is only checked if let Some(auth) = &ctx.config().auth. If auth is not configured, any client can connect as an envoy and inject actor events/commands. Should be documented as a required production deployment setting.

16. No limit on actor_names from envoy init — pegboard-envoy/src/conn.rs
serde_json::from_str(...).unwrap_or_default() on untrusted envoy metadata with no size limit could cause unbounded UDB writes.


Code Quality

17. Non-workspace hyper / hyper-util versions
Both pegboard-envoy/Cargo.toml and pegboard-gateway2/Cargo.toml hardcode versions with a comment "Idk how to get this working with the workspace version". Should be resolved before merging.

18. Unused variable existing_envoypegboard-envoy/src/conn.rs
The result of the UDB lookup is bound but never used.

19. Comment style violations (CLAUDE.md)
Several new comments use dash-separated fragment style that the project conventions prohibit.

20. PegboardEnvoyWs::new unnecessary intermediate variable
The constructor binds service then immediately returns it; can be inlined directly.


Test Coverage

No new tests included. Key areas needing coverage before merge:

  • actor2 workflow state machine transitions
  • Envoy connection lifecycle (connect, eviction, ping timeout, going away)
  • Gateway2 HTTP/WebSocket routing including hibernation resume
  • pegboard-outbound for serverless actors
  • Protocol versioning / backward compatibility

Verdict

Correctly marked as draft. At least 5 compile errors (#1-#5) and 2 invalid Rust syntax uses (#6) prevent the code from building. The serverful actor path (#7) and graceful shutdown path (#8) are explicitly unimplemented stubs.

The overall architecture is well-considered: the envoy protocol design, dual-version routing with version: u32 backward compatibility in UDB, the hibernation state machine, and the two-phase GC for in-flight requests are all sound. The main work needed is completing the implementation, fixing compile errors, and adding tests.


Reviewed by Claude Sonnet 4.6

@MasterPtato MasterPtato changed the base branch from 03-19-feat_cache_add_in_flight_deduping to graphite-base/4463 March 24, 2026 00:29
@MasterPtato MasterPtato force-pushed the 03-20-feat_pb_actors_v3 branch from 3f2996f to 9d9049a Compare March 24, 2026 00:30
@MasterPtato MasterPtato mentioned this pull request Mar 24, 2026
11 tasks
@MasterPtato MasterPtato force-pushed the 03-20-feat_pb_actors_v3 branch from 9d9049a to 7da9651 Compare March 24, 2026 00:36
@NathanFlurry NathanFlurry changed the title feat(pb): actors v3 feat(pb): rewrite serverless handlers Mar 24, 2026
@MasterPtato MasterPtato force-pushed the 03-20-feat_pb_actors_v3 branch from 7da9651 to dc6bd10 Compare March 25, 2026 00:05
@MasterPtato MasterPtato force-pushed the 03-20-feat_pb_actors_v3 branch from dc6bd10 to 8a1d9d6 Compare March 25, 2026 01:57
@MasterPtato MasterPtato changed the base branch from graphite-base/4463 to 03-23-fix_ups_implement_queue_subs March 25, 2026 01:57
@MasterPtato MasterPtato force-pushed the 03-23-fix_ups_implement_queue_subs branch from 4e8c22e to e5b3f53 Compare March 26, 2026 01:18
@MasterPtato MasterPtato force-pushed the 03-20-feat_pb_actors_v3 branch from 8a1d9d6 to 703e3b6 Compare March 26, 2026 01:18
@MasterPtato MasterPtato force-pushed the 03-20-feat_pb_actors_v3 branch from 703e3b6 to 446bf80 Compare March 26, 2026 01:23
@MasterPtato MasterPtato changed the title feat(pb): rewrite serverless handlers feat(pb): actors v3 Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant