Hybrid distributed runtime fabric for actors, native compute offload, and cross-language services.
Features • Architecture • Installation • Usage • Distributed Mesh
Iris is a hybrid distributed runtime built in Rust with first-class Python and Node.js bindings. It combines three execution styles in one system:
- an actor mesh for stateful, message-driven workflows,
- native compute offload/JIT for CPU-heavy hot paths,
- cross-language runtime APIs for service-oriented applications.
So Iris is not only an actor runtime—it is a runtime fabric that lets you mix coordination, messaging, and high-performance compute under a single operational model.
At its core, Iris uses a cooperative reduction-based scheduler for fairness and high concurrency, while providing built-in supervision, hot swapping, discovery, and location-transparent messaging across nodes.
Note
Node.js bindings are still in very early phases and are not yet feature-parity with Python.
Iris is designed as a hybrid platform, not a single-paradigm engine.
Iris provides two complementary execution patterns:
- Push Actors (Green Threads): ultra-lightweight handlers triggered only when messages arrive.
- Pull Actors (OS Threads): blocking mailbox workers for synchronous control flow.
- Python pull actors: run on dedicated OS threads and block on
recv()while releasing the GIL.
- Python pull actors: run on dedicated OS threads and block on
Inspired by the BEAM (Erlang VM), Iris uses a cooperative reduction scheduler for fairness.
- Reduction budgets: each actor gets a budget, then yields to Tokio via
yield_now()when it is exhausted. - Starvation resistance: no single high-throughput actor can monopolize a core.
Update live application logic without stopping the runtime.
- Zero downtime: replace Python or Node.js handlers in memory without losing mailbox state.
- Safe transition: in-flight work completes on old logic; new messages use new logic.
Actors are first-class network services.
- Name registry: register human-readable names (for example,
"auth-provider") withregister/unregisterand resolve withwhereis. - Async discovery: resolve remote service PIDs with Python
awaitor Node.js Promises without blocking runtime progress. - Location transparency: message actors the same way whether local or remote. The runtime automatically spawns lightweight proxy actors; callers simply treat the returned PID as if it were local and the network is handled invisibly.
Built-in fault tolerance follows the “Let it Crash” model.
- Heartbeat monitoring: automatic
PING/PONG(0x02/0x03) detects silent failures such as GIL stalls and half-open links. - Structured system messages: exits, hot swaps, and heartbeats are surfaced as system events for supervisors.
- Self-healing factories: restart logic can re-resolve and reconnect automatically when remote nodes recover.
Iris actor internals vary by execution pattern:
- Push actors: state-machine driven futures in Tokio, typically ~2KB each.
- Pull actors (Python): dedicated blocking threads with higher footprint but simpler synchronous flow.
Iris uses a length-prefixed binary TCP protocol for inter-node communication.
| Packet Type | Function | Payload Structure |
|---|---|---|
0x00 |
User Message | [PID: u64][LEN: u32][DATA: Bytes] |
0x01 |
Resolve Request | [LEN: u32][NAME: String] → Returns [PID: u64] |
0x02 |
Heartbeat (Ping) | [Empty] — Probe remote node health |
0x03 |
Heartbeat (Pong) | [Empty] — Acknowledge health |
Iris bridges Rust safety with dynamic-language ergonomics through PyO3 (Python) and N-API (Node.js):
- Membrane hardening: uses
block_in_placeandThreadSafeFunctionqueues for safe sync/async boundaries. - GIL management: Python
recv()releases the GIL so other Python threads can continue running. - Atomic RwLocks: behavior pointers are swapped safely for thread-safe hot swapping.
- Rust 1.70+
- Python 3.8+ OR Node.js 14+
- Maturin (for Python) / NAPI-RS (for Node)
- Cranelift JIT backend is included by default; needed only if you use the experimental offload/JIT APIs.
# Clone the repository
git clone https://github.com/SSL-ACTX/iris.git
cd iris
# Build and install the Python extension
maturin develop --release
# Clone the repository
git clone https://github.com/SSL-ACTX/iris.git
cd iris
# Build the N-API binding
npm install
npm run build
Iris exposes one decorator API for native acceleration:
@iris.offload(strategy="jit", return_type="float")
def kernel(...):
...strategy="jit": compiles eligible code paths with Cranelift.strategy="actor": executes on a dedicated Rust offload pool.
import iris
@iris.offload(strategy="jit", return_type="float")
def vector_magnitude(x: float, y: float, z: float) -> float:
return (x*x + y*y + z*z) ** 0.5
result = vector_magnitude(1.0, 2.0, 3.0)
print(result)- Scalar expression kernels with arithmetic/logic/comparisons/ternaries and common math calls.
- Generator reductions (
sum/any/all) overrange(...)and runtime containers. - While-style reduction helpers (
sum_while,any_while,all_while) and loop-control intrinsics. - Scalar recurrence loops recognized by the frontend (for/while patterns), including inlined helper calls.
- JIT execution profiles are specialized by observed input shapes and data layouts.
- Supported scalar inputs: Python
float,int,bool(lowered to nativef64ABI). - Supported vectorized buffers:
f64,f32, signed/unsigned integers, and bool. - On unsupported syntax, compile miss, panic, or profile mismatch, Iris falls back safely to Python.
- Enable JIT logs:
IRIS_JIT_LOG=1oriris.jit.set_jit_logging(...). - Query logging status:
iris.jit.get_jit_logging(). - Optional multi-variant quantum speculation:
- env:
IRIS_JIT_QUANTUM=1 - API:
iris.jit.set_quantum_speculation(...)/iris.jit.get_quantum_speculation()
- env:
Note
On aarch64 targets, Iris automatically adjusts JIT module flags to avoid unsupported relocation paths.
For lower-level internals and compiler details, see JIT.md.
Iris provides a unified API across both supported languages.
Use spawn for maximum throughput (100k+ actors). Rust owns the scheduling and only invokes the guest language when a message arrives.
import iris
rt = iris.Runtime()
def fast_worker(msg):
print(f"Processed: {msg}")
# Spawn 1000 workers instantly (Green Threads)
for _ in range(1000):
rt.spawn(fast_worker, budget=50)const { NodeRuntime } = require('./index.js');
const rt = new NodeRuntime();
const fastWorker = (msg) => {
// msg is a Buffer
console.log(`Processed: ${msg.toString()}`);
};
for (let i = 0; i < 1000; i++) {
rt.spawn(fastWorker, 50);
}Use spawn_with_mailbox for complex logic where you want to block and wait for specific messages. No async/await required in Python.
# Runs in a dedicated OS thread. Blocking is safe.
def saga_coordinator(mailbox):
# Blocks thread, releases GIL
msg = mailbox.recv()
print("Starting Saga...")
# Wait up to 5 seconds for next message
confirm = mailbox.recv(timeout=5.0)
if confirm:
print("Confirmed")
else:
print("Timed out")
rt.spawn_with_mailbox(saga_coordinator, budget=100)const sagaCoordinator = async (mailbox) => {
const msg = await mailbox.recv();
console.log("Starting Saga...");
// Wait up to 5 seconds
const confirm = await mailbox.recv(5.0);
if (confirm) {
console.log("Confirmed");
} else {
console.log("Timed out");
}
};
rt.spawnWithMailbox(sagaCoordinator, 100);For high-throughput workloads you can pre-spawn a pool of long-lived child actors and round‑robin work across them instead of repeatedly spawning new actors. This avoids allocation and teardown overhead, giving near-linear scaling when each message performs heavy native work (e.g. JIT offloaded math).
import iris
rt = iris.Runtime()
# spawn a reusable pool attached to a parent PID (useful for supervision)
parent = rt.spawn(lambda m: None) # dummy parent
workers = iris.spawn_child_pool(rt, parent, size=4)
# `workers` is a list of PIDs. use in a simple round‑robin loop:
from itertools import cycle
worker_cycle = cycle(workers)
def dispatch(price, vol, strike):
pid = next(worker_cycle)
# regular send is automatically optimized internally
rt.send(pid, serialize_args(price, vol, strike))
# each worker's handler can call into JIT or Python directlyActors can now be spawned with a parent PID. When the parent exits (normal or crash) all of its direct children are automatically stopped as well. This mirrors the behaviour of many functional runtimes and makes it easy to manage lifetimes for short‑lived helper tasks.
let rt = Runtime::new();
let parent = rt.spawn_actor(|mut rx| async move { /* ... */ });
let child = rt.spawn_child(parent, |mut rx| async move { /* will die with parent */ });
rt.send(parent, Message::User(Bytes::from("quit"))).unwrap();
// after the parent exits the child mailbox is closed as well
assert!(!rt.is_alive(child));rt = iris.Runtime()
parent = rt.spawn(lambda msg: print("parent got", msg))
child = rt.spawn_child(parent, lambda msg: print("child got", msg))
rt.send(parent, b"quit")
import time; time.sleep(0.1)
assert not rt.is_alive(child)There are three variants of the API:
spawn_child(parent, handler)– mailbox‑based actor.spawn_child_with_budget(parent, handler, budget)– same but with a reduction budget.spawn_child_handler_with_budget(parent, handler, budget)– message‑style handler (used by Python/Node wrappers).
Python and Node bindings expose matching helpers (spawn_child, spawn_child_with_mailbox, etc.) which accept the same arguments as their non‑child counterparts plus the parent PID.
Iris now supports virtual (lazy) actors: a PID is reserved up front, but the actor is only activated on first message delivery. This keeps idle footprint low while preserving actor-style addressing.
Current semantics:
- Activation occurs automatically on first
send/send_userto the reserved PID. - Optional idle timeout can auto-stop an activated virtual actor after inactivity.
stop(pid)also works for a never-activated virtual actor and cleanly deallocates its PID.
let rt = Runtime::new();
let pid = rt.spawn_virtual_handler_with_budget(
|msg| async move {
// process message once activated
},
100,
Some(std::time::Duration::from_millis(500)),
);
// Actor is activated lazily on first send
rt.send(pid, Message::User(Bytes::from("hello"))).unwrap();import iris
rt = iris.Runtime()
pid = rt.spawn_virtual(
lambda msg: print("virtual got", bytes(msg)),
budget=100,
idle_timeout_ms=500,
)
# activates on first send
rt.send(pid, b"hello")Note
Network hardening: the underlying TCP protocol now imposes a 1 MiB payload ceiling, per-operation timeouts, and diligent logging. Malformed or oversized messages are dropped rather than crashing the node, and remote resolution/send operations will fail fast instead of hanging indefinitely.
# 1. Register a local actor
pid = rt.spawn(my_handler)
rt.register("auth_worker", pid)
# 2. Look it up later (Local)
target = rt.whereis("auth_worker")
# 3. Look it up remotely (Network)
async def find_remote():
addr = "192.168.1.5:9000"
# Non-blocking resolution returns a local *proxy* actor
proxy_pid = await rt.resolve_remote_py(addr, "auth_worker")
if proxy_pid:
# send just like a local PID — the runtime forwards the message
rt.send(proxy_pid, b"login")// 1. Register
const pid = rt.spawn(myHandler);
rt.register("auth_worker", pid);
// 2. Resolve Remote
async function findAndQuery() {
const addr = "192.168.1.5:9000";
// resolution returns a local proxy PID
const proxyPid = await rt.resolveRemote(addr, "auth_worker");
if (proxyPid) {
// treat it exactly like a local actor
rt.send(proxyPid, Buffer.from("login"));
}
}Iris supports hierarchical path registrations (e.g. /svc/payment/processor) and allows you to create a supervisor that is scoped to a path prefix. This is useful for grouping related actors and applying supervision policies per-service or per-tenant.
Key Python APIs:
rt.create_path_supervisor(path)— create a per-path supervisor instance.rt.path_supervisor_watch(path, pid)— register an actor PID with the path supervisor.rt.path_supervisor_children(path)— list PIDs currently supervised for the path.rt.remove_path_supervisor(path)— remove the path supervisor.rt.spawn_with_path_observed(budget, path)— spawn and register an observed actor underpath(useful for testing/monitoring).
Python example:
rt = iris.Runtime()
# spawn an observed actor and register it under a hierarchical path
pid = rt.spawn_with_path_observed(10, "/svc/test/one")
# create a supervisor for the '/svc/test' prefix and register the pid
rt.create_path_supervisor("/svc/test")
rt.path_supervisor_watch("/svc/test", pid)
# inspect supervised children
children = rt.path_supervisor_children("/svc/test")
print(children) # [pid]
# remove supervisor when done
rt.remove_path_supervisor("/svc/test")This mechanism makes it easy to apply restart strategies or monitoring rules to logical groups of actors without affecting the global supervisor.
messages = rt.get_messages(observer_pid)
for msg in messages:
if isinstance(msg, iris.PySystemMessage):
if msg.type_name == "EXIT":
print(f"Actor {msg.target_pid} has crashed!")// Node wrappers return wrapped objects { data: Buffer, system: Object }
const messages = rt.getMessages(observerPid);
messages.forEach(msg => {
if (msg.system) {
if (msg.system.typeName === "EXIT") {
console.log(`Actor ${msg.system.targetPid} has crashed!`);
}
} else {
console.log(`User Data: ${msg.data.toString()}`);
}
});def behavior_a(msg): print("Logic A")
def behavior_b(msg): print("Logic B (Upgraded!)")
pid = rt.spawn(behavior_a, budget=10)
rt.send(pid, b"test") # Prints "Logic A"
rt.hot_swap(pid, behavior_b)
rt.send(pid, b"test") # Prints "Logic B (Upgraded!)"
# Versioning + rollback
print(rt.behavior_version(pid)) # 2
rt.rollback_behavior(pid, 1) # back to prior hot-swapped behaviorIris exposes lightweight mailbox introspection and actor-local timers so guest languages can inspect queue sizes and schedule timed messages.
- Rust:
Runtime::mailbox_size(pid: u64) -> Option<usize>returns the number of user messages queued forpid(excludes system messages). - Python:
rt.mailbox_size(pid)mirrors the Rust API and returnsNoneif the PID is unknown.
Python example:
size = rt.mailbox_size(pid)
print(f"Mailbox size for {pid}: {size}")You can schedule one-shot or repeating messages to an actor's mailbox. Timers are cancellable via an id returned at creation.
- Rust APIs:
send_after(pid, delay_ms, payload),send_interval(pid, interval_ms, payload),cancel_timer(timer_id) - Python:
rt.send_after(pid, ms, b'data'),rt.send_interval(pid, ms, b'data'),rt.cancel_timer(timer_id)
Python example (one-shot):
timer_id = rt.send_after(pid, 200, b'tick') # send 'tick' after 200ms
# cancel if needed
rt.cancel_timer(timer_id)Python example (repeating):
timer_id = rt.send_interval(pid, 1000, b'heartbeat') # every 1s
# stop later
rt.cancel_timer(timer_id)When an actor exits the runtime sends a structured EXIT system message that includes the reason and optional metadata. This allows supervisors and link/watch logic to make informed decisions.
Common ExitReason variants:
Normal— actor finished cleanly.Killed— requested shutdown.Panic— runtime detected a panic;ExitInfomay include panic metadata.Crash— user code returned an unrecoverable error.
Python example receiving exit info:
for msg in rt.get_messages(supervisor_pid):
if isinstance(msg, iris.PySystemMessage) and msg.type_name == 'EXIT':
print('from:', msg.from_pid)
print('target:', msg.target_pid)
print('reason:', msg.reason) # e.g. 'Normal' | 'Panic' | 'Killed'
print('metadata:', msg.metadata) # optional dict/bytes with extra infoNode.js receives system objects with the same fields: fromPid, targetPid, reason, and optional metadata.
In addition to the environment variables documented above, the Python runtime exposes programmatic setters:
rt.set_release_gil_limits(max_threads: int, gil_pool_size: int)— set the per-process cap and fallback pool size at runtime.rt.set_release_gil_strict(strict: bool)— whentrueaspawn(..., release_gil=True)will return an error if the dedicated-thread cap is reached instead of falling back to the shared pool.
By default, all Iris mailboxes are unbounded. For some high-throughput
workloads you may want a fixed capacity with a drop-new policy — useful for
rate-limiting fast producers. Use Runtime.spawn_py_handler_bounded in Python
or runtime.spawn_bounded in Node and specify the mailbox capacity (in
messages).
Overflow policy can be changed per PID at runtime with:
- Python:
rt.set_overflow_policy(pid, policy, target_pid_or_none) - Rust:
runtime.set_overflow_policy(pid, OverflowPolicy::...)
Supported policies:
dropnew— reject the incoming message (send(...)returnsFalse/Err).dropold— evict the oldest queued user message, then accept the new one.redirect— send overflowing messages to a fallback PID.spill— send a copy to fallback PID and still enqueue the original on primary.block— block the sender until mailbox capacity is available.
Python example:
from iris import Runtime
rt = Runtime()
# handler simply prints messages
pid = rt.spawn_py_handler_bounded(lambda m: print('got', m), budget=100, capacity=2)
# first two sends succeed
assert rt.send(pid, b'one')
assert rt.send(pid, b'two')
# third message is dropped and send returns False
assert not rt.send(pid, b'three')Python policy examples:
from iris import Runtime
rt = Runtime()
primary = rt.spawn_py_handler_bounded(lambda m: print("primary", bytes(m)), budget=100, capacity=1)
fallback = rt.spawn(lambda m: print("fallback", bytes(m)), budget=100)
# Redirect overflow to fallback actor
rt.set_overflow_policy(primary, "redirect", fallback)
# Spill copies overflow to fallback and still enqueues on primary
rt.set_overflow_policy(primary, "spill", fallback)
# Block sender until room is available
rt.set_overflow_policy(primary, "block", None)You can control whether push-based Python actors run their callbacks on a blocking thread
that acquires the GIL (avoiding holding the GIL on the async worker) using the
release_gil flag on Runtime.spawn:
from iris import Runtime
import time
rt = Runtime()
def handler_no(msg):
print('no release', __import__('threading').get_ident())
def handler_yes(msg):
print('release', __import__('threading').get_ident())
pid_no = rt.spawn(handler_no, budget=10, release_gil=False)
pid_yes = rt.spawn(handler_yes, budget=10, release_gil=True)
rt.send(pid_no, b'ping')
rt.send(pid_yes, b'ping')
time.sleep(0.2)When release_gil=True the handler runs inside a spawn_blocking worker that
acquires the GIL; release_gil=False keeps the previous behavior (callback runs
directly while holding the GIL on the async worker).
Iris supports hierarchical path registrations (for example /system/service/one) so you can group, query and supervise actors by logical paths.
Python APIs (examples): rt.register_path(path, pid), rt.unregister_path(path), rt.whereis_path(path), rt.list_children(prefix), rt.list_children_direct(prefix), rt.watch_path(prefix), rt.spawn_with_path_observed(budget, path), rt.child_pids(), rt.children_count().
Python example:
from iris import Runtime
rt = Runtime()
# Spawn and register
pid = rt.spawn(lambda m: None, 10)
rt.register_path("/system/service/one", pid)
print(rt.whereis_path("/system/service/one"))
print(rt.list_children("/system/service"))
print(rt.list_children_direct("/system"))
# Shallow watch: registers current direct children with the supervisor
rt.watch_path("/system/service")
print(rt.child_pids(), rt.children_count())Node.js example (conceptual):
const { NodeRuntime } = require('./index.js');
const rt = new NodeRuntime();
const pid = rt.spawn(myHandler, 10);
rt.registerPath('/system/service/one', pid);
const children = rt.listChildren('/system/service');Notes:
list_childrenreturns all descendant registrations under a prefix.list_children_directreturns only immediate children one level below the prefix.watch_pathperforms a shallow registration of direct children with the supervisor — path-scoped supervisors are planned as a next step.
Linux / macOS / Android (Termux)
Fully supported. High performance via multi-threaded Tokio runtime. Note: Android builds require NDK or local clang configuration.Windows
Supported. Ensure you have the latest Microsoft C++ Build Tools installed for PyO3/N-API compilation.Important
Production Status: Iris is currently in Alpha.
-
Experimental Features: The JIT/offload APIs are extremely new and may change or break between releases. Use with caution.
-
Performance Metrics (v0.3.0):
-
Push Actors: Validated to scale to 100k+ concurrent actors with message throughput exceeding ~1.2M+ msgs/sec on modern cloud vCPUs and ~409k msgs/sec on single-core legacy hardware.
-
Pull Actors: High-performance threaded actors supporting 100k+ concurrent instances with throughput reaching ~1.5M+ msgs/sec, demonstrating massive scaling beyond traditional thread-pool limitations.
-
Hot-Swapping: Logic upgrades validated at ~136k swaps/sec while maintaining active message processing.
-
Notice: The binary protocol is subject to change.
-
Stability: Always use the
Supervisorfor critical actor lifecycles to ensure automatic recovery and location transparency.
Author: Seuriin (SSL-ACTX)
v0.4.0