Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/pages/docs/ai-transport/features/agent-presence.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ meta_description: "Show agent status in your AI application with Ably Presence.
meta_keywords: "agent presence, AI status, presence API, agent state, AI Transport, Ably"
---

Agent presence uses Ably's native [Presence](/docs/presence) API to show real-time agent status in your application. Display whether the agent is streaming, thinking, idle, or offline - across all connected clients.
Agent presence provides a realtime view to other session participants so they can know which agents are active in a session. Agent presence uses Ably's native [Presence](/docs/presence) API to show real-time agent status in your application, and this could include a sole or orcestrator agent, or multiple sub-agents. Presence can convey whether the agent is streaming, thinking, idle, or offline - across all connected clients.

<Aside data-type='note'>
Agent presence is not built into the AI Transport SDK. It uses the Ably Presence API alongside AI Transport to give users visibility into what the agent is doing.
Expand Down
6 changes: 4 additions & 2 deletions src/pages/docs/ai-transport/features/branching.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ meta_description: "Branch conversations with edit and regenerate in Ably AI Tran
meta_keywords: "conversation branching, edit message, regenerate response, conversation tree, AI Transport, Ably"
---

Conversation branching in AI Transport lets users edit messages and regenerate responses, creating alternative branches in the conversation. The SDK maintains a tree structure where each branch is a fork - the full history is preserved, and users navigate between branches with sibling navigation.
The leading AI user experiences today expose more than just a simple linear message history or a conversation; it is possible to fork a conversation; ie to submit a prompt from an earlier time in the chat so that the conversation continues from that point, incorporating prior conversation history as context. It is also possible to re-generate a response to a given prompt, perhaps because the generation by the agent failed on the first attempt. These branching possibilies come in addition to the structure of turns, which can be concurrent and may have been interleaved at the time.

Conversation branching in AI Transport suppose these use-cases by allowing users to edit messages and regenerate responses, creating alternative branches in the conversation. The SDK maintains a tree structure where each branch is a fork - the full history is preserved, and users navigate between branches with sibling navigation.

## How it works <a id="how-it-works"/>

The conversation is a tree, not a list. Every message is a node with a parent pointer. When a user edits a message or regenerates a response, the SDK creates a new branch rather than overwriting the original.
The conversation is a tree, not a list; every message is a node with a parent pointer. When a user edits a message or regenerates a response, the SDK creates a new branch rather than overwriting the original.

- `edit()` forks a user message. The original message and its descendants remain intact. A new child is added to the same parent, creating a sibling branch.
- `regenerate()` forks an assistant message. The original response stays in the tree, and a new sibling is created with a fresh turn.
Expand Down
4 changes: 2 additions & 2 deletions src/pages/docs/ai-transport/features/cancellation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ meta_description: "Cancel AI responses mid-stream with Ably AI Transport. Scoped
meta_keywords: "AI cancellation, cancel streaming, abort signal, AI Transport, cancel filter, Ably"
---

Cancellation in AI Transport is a channel-level operation. The client publishes a cancel signal on the Ably channel, the server receives it and aborts the matching turns. Unlike closing an HTTP connection, cancellation is an explicit signal - the session survives, other turns continue, and both sides handle cleanup gracefully.
Cancellation of an agent invocation is supported in AI Transport. Cancellation is a turn-level operation. The client publishes a cancel signal on the Ably channel, the server receives it and aborts the matching turns, but the session remains intact, other turns continue, and both sides handle cleanup gracefully. Unlike closing an HTTP connection, cancellation is an explicit signal.

## How it works <a id="how-it-works"/>

Because sessions are bidirectional, cancel is just a signal on the channel. The client publishes a cancel message with a filter specifying which turns to cancel. The server's transport matches the filter against active turns and fires their abort signals. The LLM stream stops, the turn ends with reason `'cancelled'`, and all subscribers are notified.
Since sessions are bidirectional, cancel is just a signal on the channel. The client publishes a cancel message with a filter specifying which turns to cancel. The server's transport matches the filter against active turns and fires their abort signals. The LLM stream stops, the turn ends with reason `'cancelled'`, and all subscribers are notified.

<Code>
```javascript
Expand Down
39 changes: 34 additions & 5 deletions src/pages/docs/ai-transport/features/token-streaming.mdx
Original file line number Diff line number Diff line change
@@ -1,14 +1,43 @@
---
title: "Token streaming"
meta_description: "Stream LLM tokens through Ably AI Transport with durable delivery. Tokens survive disconnections and sync across devices automatically."
meta_keywords: "token streaming, LLM streaming, AI Transport, Ably, real-time AI, durable streaming"
meta_description: "Stream AI-generated tokens to clients in realtime using AI Transport, with support for message-per-response and message-per-token patterns."
redirect_from:
- /docs/ai-transport/token-streaming
- /docs/ai-transport/token-streaming/message-per-response
- /docs/ai-transport/token-streaming/message-per-token
- /docs/ai-transport/guides/anthropic/anthropic-message-per-response
- /docs/ai-transport/guides/anthropic/anthropic-message-per-token
- /docs/ai-transport/guides/openai/openai-message-per-response
- /docs/ai-transport/guides/openai/openai-message-per-token
- /docs/ai-transport/guides/langgraph/langgraph-message-per-response
- /docs/ai-transport/guides/langgraph/langgraph-message-per-token
- /docs/ai-transport/guides/vercel-ai-sdk/vercel-message-per-response
- /docs/ai-transport/guides/vercel-ai-sdk/vercel-message-per-token
- /docs/guides/ai-transport/anthropic-message-per-response
- /docs/guides/ai-transport/anthropic/anthropic-message-per-response
- /docs/guides/ai-transport/anthropic-message-per-token
- /docs/guides/ai-transport/anthropic/anthropic-message-per-token
- /docs/guides/ai-transport/openai-message-per-response
- /docs/guides/ai-transport/openai/openai-message-per-response
- /docs/guides/ai-transport/openai-message-per-token
- /docs/guides/ai-transport/openai/openai-message-per-token
- /docs/guides/ai-transport/langgraph-message-per-response
- /docs/guides/ai-transport/langgraph/langgraph-message-per-response
- /docs/guides/ai-transport/langgraph-message-per-token
- /docs/guides/ai-transport/langgraph/langgraph-message-per-token
- /docs/guides/ai-transport/vercel-message-per-response
- /docs/guides/ai-transport/vercel-ai-sdk/vercel-message-per-response
- /docs/guides/ai-transport/vercel-message-per-token
- /docs/guides/ai-transport/vercel-ai-sdk/vercel-message-per-token
---

Token streaming in AI Transport delivers LLM responses progressively through an Ably channel. Tokens are published as they're generated and persist on the channel - clients that disconnect and reconnect receive the complete response without gaps.
LLMs generate responses progressively - token by token - and the best user experience is achieved by delivering those tokens progressively also to clients, minimising perceived response latency. This token by token delivery, as a response is still being generated, is referred to as token streaming.

## How it works <a id="how-it-works"/>
## How it works

The server pipes an LLM response stream through the server transport. The transport's codec encodes each token as an Ably message append operation. Subscribing clients receive tokens in real time through their channel subscription. The channel accumulates tokens, so reconnecting clients get the assembled response, not individual deltas to replay.
Although tokens need to be delivered individually when being consumed in real time, the reality is still that these are fragments of a response, not just discrete, independent messages. It must be possible to consume them as coherent responses when not in real time - for example when looking at history, refreshing a client, or returning to a conversation.

A key feature of AI Transport's transport layer is that it understands this relationship between responses and their constituent tokens. By doing this, service can support clients that resume an interrupted connection, or those that refresh, during a streamed response. AI Transport supports token streaming by enabling agents to form responses incrementally by making a stream of appends to the content. Each appended token can be received immediately by a subscriber if they are consuming in real time; but the durable session layer structures the conversation as responses, including completed responses and still-in-progress responses.

On the server, a single call streams the entire response:

Expand Down
45 changes: 29 additions & 16 deletions src/pages/docs/ai-transport/how-it-works/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,38 +4,51 @@ meta_description: "Understand the core concepts behind Ably AI Transport: sessio
meta_keywords: "AI Transport, sessions, turns, transport architecture, codec, Ably"
---

AI Transport is built on three core concepts: sessions, turns, and a two-layer transport architecture. Sessions are durable communication channels shared between agents and clients. Turns are structured request-response cycles within a session. The transport architecture separates your AI framework from the messaging infrastructure, so you can swap either side without changing the other.
AI Transport implements Durable Sessions as the layer than underpins client to agent interactions.

## Understand sessions <a id="sessions"/>
## Sessions <a id="sessions"/>

A session is an Ably channel shared between one or more agents and one or more clients. It represents a single conversation and persists beyond any individual connection. When a client disconnects and reconnects, the session is still there. When a second device joins, it sees the same session.
The central concept is of a session. This is a durable, addressable, communication channel, that carries client-to-agent, and agent-to-client events. A session may be shared between one or more agents and one or more clients. A single session would usually represent a single conversation or chat, but that can be long-lived, and can survive individual client connections, or episodes of client interactions. A single ChatGPT chat, for example, could be a session; you can revisit and continue that session multiple times, on different devices. A second device interacting with that chat would do so by joining the same session.

Sessions provide four guarantees:
In AI Transport, a session maps to an Ably Channel. Channels support realtime pub/sub messaging, so multiple (agent or client) partipants in a channel can each publish and subscribe to messages; channels support presence, so each connected participant can advertise its online status; and channels support structured state, so arbitrary data, in addition to messages, can be durably added, modified and observed by any participant.

- Events arrive in the order they were published. Token streams render correctly without reordering logic.
- Messages survive disconnections. A client that drops and reconnects picks up exactly where it left off.
- All participants see all events. Every subscribed device receives every token, every control signal, every lifecycle event.
- Clients signal agents through the same channel. Cancel requests, interrupts, and metadata flow back to the agent without a separate control plane.
Messages in a session are ordered, so subscribers see events in the order in which they were published. This ordering ensures that it is possible to distribute streamed token events at a high rate, knowing that they are received in order by clients.

When a client disconnects, the session continues to exist. The agent keeps publishing tokens to the channel. On reconnect, Ably's connection protocol automatically resumes from the last received message with no gaps. If the client has been offline longer, it loads the full conversation from channel history using `view.loadOlder()`.
AI Transport sessions persist messages and structured state, so interaction history survives the disconnection of any client or agent. Clients can therefore retrieve historical messages, as well as receive them in real time. Sessions are a unifying primitive that expose the same message stream to be consumed either in real time or after the fact.

Since sessions are based on a pub/sub primitive, multiple clients can receive messages, either serially (ie different devices at different times) or simultaneously .

Sessions don't just carry client prompts and agent responses; they are bidirectional and capable of carrying multiple signals, whether that's for cancellation, interruption, steering, and whether it's chat text, artifacts, tool parameters, or other metadata.

When a client disconnects, the session continues to exist and the agent keeps publishing tokens to the channel. On reconnect, Ably's connection protocol automatically resumes from the last received message with no gaps. If the client has been offline longer, it loads the full conversation from channel history.

See [Sessions and turns](/docs/ai-transport/how-it-works/sessions-and-turns) for a detailed explanation of how sessions and recovery work.

## Understand turns <a id="turns"/>
## Turns <a id="turns"/>

A turn is one prompt-response cycle within a session. The user sends a prompt, the agent streams back a response. That exchange, from start to finish, is a single turn.
A turn groups a related set of interactions in a session; the simplest and most typical example of a turn is ais one prompt-response cycle within a session: the client sends a prompt, and the agent streams back a response. In general, however, a turn can include more complex interactions, including those where the turn is initiated by some agent activity, such as an autonomous agent responding to an external event.

Each turn has a lifecycle: it starts when the agent begins generating, streams tokens as they are produced, and ends when the agent completes its response. Turns have clear boundaries. Cancellation is scoped to a turn, not the whole session. Cancelling one turn does not affect other turns or the session itself. Each turn carries its own stream, its own cancel handle, and its own lifecycle events.
Turns are the principal way of structuring interactions within a single session. Each turn has a livecycle; cancellation is scoped to a turn, so cancelling a turn does not affect other turns or the session itself. Each turn carries its own stream, its own cancel handle, and its own lifecycle events. Turns can also exist concurrently, such as if a user submits a follow-up prompt before the previous response finishes, or where multiple subagents are each interacting with the client at the same time. In general, you can think of turns as a way of multiplexing multiple independent threads in a shared session.

Multiple turns can be active simultaneously. A user can send a follow-up prompt before the previous response finishes, and both turns stream independently. This enables concurrent interactions without waiting for each turn to complete.
In many agent deployments, the client prompt that initiates a turn is made as a request to the server endpoint that makes the agent invocation, or initiates the agent workflow if using a durable execution framework. The agent workflow will usually end with the completion of that turn; therefore, turn lifecycle is often correlated with agent invocation lifecycle.

See [Sessions and turns](/docs/ai-transport/how-it-works/sessions-and-turns) for details on turn lifecycle, cancellation, and concurrent turns.

## Understand the transport architecture <a id="transport"/>
## Transport architecture <a id="transport"/>

Sessions are powered by Ably pub/sub channels. Channels provide:

- pub/sub message delivery, including message persistence and history;
- presence so that the state of participants in a channel can be observed in real time;
- structured durable and collborative state via the LiveObjects API.

The AI Transport library implements two principal protocol layers on top of the channel primitive:

- a "transport" layer that implements the Turns abstraction. This supports the multiplexing of turns onto channels, so that turns can provide independent streams; and a conversation tree that exposes a branching conversation structure from the linear stream of channel messages. Together, these provide the rich message structure required to model AI conversations, both for real time and historical messages.

The transport has two layers: a core transport and a codec. The core transport manages the turn lifecycle, cancellation, conversation tree, and history. The codec translates between your AI framework's event types and Ably messages.
- a "codec" that handles format conversion of messages between the domain event and message types associated with the agent or client framework - for example `UIMessage` in the case of the Vercel AI SDK - and messages suitable for exchange over the Ably channel.

On the server, the transport takes the output stream from your AI framework and publishes it to the session channel. On the client, the transport subscribes to the channel and reconstructs the conversation from the incoming events. The HTTP POST that sends the user's prompt to the agent is fire-and-forget. Response tokens come back through the channel, not through the HTTP response.
On the server, the transport takes the output stream from the AI framework and publishes it to channel via the turn's stream. On the client, the transport subscribes to the channel and reconstructs the conversation from the incoming events. Clients that initiate a turn via an HTTP request to the backend (containing the user's prompt) do not receive the agent response on that request; the response to that request simply confirms successful invocation of the agent. Response tokens come back through the turn stream.

The SDK ships with a Vercel AI SDK codec (`UIMessageCodec`) that maps Vercel AI SDK message types to Ably messages. You can write custom codecs for other frameworks by implementing the codec interface. This keeps the core transport framework-agnostic while giving each framework a native integration surface.

Expand Down
Loading