Skip to content

feat(rivetkit): actor queue with wait-for-completion support#4088

Closed
NathanFlurry wants to merge 7 commits intomainfrom
queue-wait-complete
Closed

feat(rivetkit): actor queue with wait-for-completion support#4088
NathanFlurry wants to merge 7 commits intomainfrom
queue-wait-complete

Conversation

@NathanFlurry
Copy link
Member

  • Add pull-based durable queue for actors
  • Support fire-and-forget and wait-for-completion patterns
  • Add c.queue.next() for actors to receive messages
  • Add msg.complete() for manual completion with response
  • Add persistence & redelivery with exponential backoff
  • Add HTTP API for queue operations
  • Add docs page for queue messages

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4088 January 31, 2026 03:29 Destroyed
@railway-app
Copy link

railway-app bot commented Jan 31, 2026

🚅 Deployed to the pr-8030e0-4088 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-cloud 😴 Sleeping (View Logs) Web Feb 1, 2026 at 7:09 am
frontend-inspector 😴 Sleeping (View Logs) Web Feb 1, 2026 at 7:03 am
website ❌ Build Failed (View Logs) Web Feb 1, 2026 at 6:56 am
mcp-hub ✅ Success (View Logs) Web Feb 1, 2026 at 6:54 am

Copy link
Member Author

NathanFlurry commented Jan 31, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Jan 31, 2026

Pull Request Review: Actor Queue with Wait-for-Completion Support

This PR adds a sophisticated pull-based durable queue system for Rivet Actors with wait-for-completion patterns. The implementation is generally solid with good test coverage. Below are my findings:


✅ Strengths

  1. Well-designed architecture - The pull-based queue with manual completion is a good pattern for durable message processing
  2. Comprehensive test coverage - Tests cover edge cases like double completion, pending message conflicts, and timeout scenarios
  3. Good error handling - Custom error types for queue-specific scenarios (QueueCompleteNotAllowed, QueueMessagePending, QueueAlreadyCompleted)
  4. Backward compatibility - Changes to schemas are additive with optional fields
  5. Documentation - Excellent new docs page at website/src/content/docs/actors/queue.mdx
  6. Redelivery mechanism - Exponential backoff with crash recovery is well-implemented

🔴 Critical Issues

1. Race Condition in Completion Waiter Resolution

Location: queue-manager.ts:413-426

Issue: The waitForCompletion function creates a promise but only resolves it when the message doesnt exist. When the message does exist, the promise is never resolved, causing it to hang indefinitely.

Fix: Add return promise; at the end of the waitForCompletion function.


2. Incomplete Cleanup of Redelivery Timeout

Location: queue-manager.ts:671-732

Issue: The recovery process marks messages as not in-flight and schedules them for redelivery, but theres no cleanup of the #redeliveryTimeout if the actor shuts down. This could lead to callbacks executing after disposal.

Recommendation: Add cleanup in the actor disposal/shutdown flow to clear #redeliveryTimeout and #pendingWarningHandle.


⚠️ Potential Issues

3. Missing Validation in HTTP API

Location: router-endpoints.ts:880-924

Issue: When wait: true but no timeout is provided, the request could hang indefinitely. The HTTP API should enforce a maximum timeout or provide a default.

Recommendation: Add a maximum timeout constant (e.g., 5 minutes) and enforce it in the HTTP handler.


4. Unbounded Retry Without Dead Letter Queue

Location: Documentation in queue.mdx:377-386

Issue: Messages that consistently fail will retry forever with exponential backoff capped at 5 minutes. This could fill the queue with poison messages.

Recommendation:

  • Add a configurable max retry count (e.g., 10 retries)
  • Move failed messages to a dead letter queue or mark them with a failed state
  • Allow actors to manually fail/reject messages with msg.fail()

5. Memory Leak Risk with Completion Waiters

Location: queue-manager.ts:72-77

Issue: If a message is manually deleted from storage (or corrupted), completion waiters may never be resolved and will accumulate in the map.

Recommendation: Add a background cleanup task or TTL to remove stale waiters after a reasonable period (e.g., 1 hour).


🟡 Code Quality & Style Issues

6. Inconsistent Error Handling Pattern

Location: queue.ts:40-43 and fixture tests

Tests use manual type assertion instead of proper error type checking. Consider importing and using proper error types.


7. Missing Type Safety in Client API

Location: queue.ts:53-62 and client/queue.ts:1012-1027

Issue: Return type is QueueSendResult | void which forces callers to check for undefined even when wait: true.

Recommendation: Use function overloads for better type safety.


🔍 Performance Considerations

8. Inefficient Queue Scanning

Location: queue-manager.ts:470-507

For large queues, loading all messages and filtering in memory is inefficient. Consider indexing messages by name for faster lookups or document queue size limitations.


🧪 Testing

9. Missing Test Cases

Consider adding tests for:

  • Actor crash recovery with in-flight messages (integration test)
  • Concurrent completion waiters for the same message
  • Queue behavior when storage operations fail
  • Backoff calculation edge cases (very large failure counts)
  • Race condition: client timeout vs actor completion

📝 Documentation

10. Excellent Documentation, Minor Improvements

The new queue.mdx docs are comprehensive. Suggestions:

  • Add a section on monitoring queue health (size, pending duration)
  • Document memory implications of large queues
  • Add migration guide for existing queue implementations
  • Clarify what happens to in-flight messages during graceful shutdown

🔧 Minor Issues

  1. Unrelated change in pkg-pr-new.yaml - Adding node-version: 22 seems unrelated to this PR
  2. Empty line added to package.json:86 - Unnecessary whitespace change

🎯 Summary

Overall Assessment: This is a well-designed feature with good implementation quality. The critical issues around promise resolution and timeout handling should be addressed before merging.

Recommendation: Request changes for critical issues #1 and #3. Issues #2, #4, and #5 should be addressed or tracked for follow-up.


📋 Action Items

Must Fix:

Should Fix:

Nice to Have:


Generated by Claude Code - PR Review

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4088 January 31, 2026 04:28 Destroyed
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4088 January 31, 2026 05:36 Destroyed
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4088 February 1, 2026 06:46 Destroyed
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4088 February 1, 2026 06:49 Destroyed
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4088 February 1, 2026 06:51 Destroyed
@NathanFlurry NathanFlurry reopened this Feb 1, 2026
@railway-app railway-app bot temporarily deployed to rivet-frontend / pr-8030e0-4088 February 1, 2026 06:53 Destroyed
@NathanFlurry NathanFlurry changed the base branch from actor-messages to main February 1, 2026 06:55
@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 1, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4088

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@4088

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4088

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4088

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4088

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4088

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4088

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4088

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4088

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4088

commit: 780ddeb

@NathanFlurry NathanFlurry changed the base branch from main to graphite-base/4088 February 3, 2026 19:39
@graphite-app
Copy link
Contributor

graphite-app bot commented Feb 4, 2026

Merge activity

  • Feb 4, 8:39 PM UTC: This pull request can not be added to the Graphite merge queue. Please try rebasing and resubmitting to merge when ready.
  • Feb 4, 8:39 PM UTC: Graphite disabled "merge when ready" on this PR due to: a merge conflict with the target branch; resolve the conflict and try again..
  • Feb 4, 9:20 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Feb 4, 9:21 PM UTC: CI is running for this pull request on a draft pull request (#4115) due to your merge queue CI optimization settings.
  • Feb 4, 9:21 PM UTC: Merged by the Graphite merge queue via draft PR: #4115.

NathanFlurry and others added 7 commits February 4, 2026 13:15
- Add pull-based durable queue for actors
- Support fire-and-forget and wait-for-completion patterns
- Add c.queue.next() for actors to receive messages
- Add msg.complete() for manual completion with response
- Add persistence & redelivery with exponential backoff
- Add HTTP API for queue operations
- Add docs page for queue messages
- Move Playwright install after pnpm install in website directory
- Fix websocket -> webSocket method name in docs
- Add @nocheck to queue.mdx code blocks missing context
- Add @nocheck to websocket handler code blocks with implicit any

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The rivetkit package requires Node 22+ but the workflow was using the
default Node 20, causing TypeScript type checking failures.
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4088 to main February 4, 2026 21:20
@railway-app railway-app bot temporarily deployed to rivet-frontend / pr-8030e0-4088 February 4, 2026 21:20 Destroyed
graphite-app bot pushed a commit that referenced this pull request Feb 4, 2026
- Add pull-based durable queue for actors
- Support fire-and-forget and wait-for-completion patterns
- Add c.queue.next() for actors to receive messages
- Add msg.complete() for manual completion with response
- Add persistence & redelivery with exponential backoff
- Add HTTP API for queue operations
- Add docs page for queue messages
@graphite-app graphite-app bot closed this Feb 4, 2026
@graphite-app graphite-app bot deleted the queue-wait-complete branch February 4, 2026 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant