GitHub - SSL-ACTX/Iris: A lightweight, hybrid actor runtime in Rust with optional Cranelift JIT for Python/JS off‑loading – fast local actors, math off‑load, and seamless PyO3 bindings.

Hybrid distributed runtime fabric for actors, native compute offload, and cross-language services.

Features • Architecture • Installation • Usage • Distributed Mesh

Overview

Iris is a hybrid distributed runtime built in Rust with first-class Python and Node.js bindings. It combines three execution styles in one system:

an actor mesh for stateful, message-driven workflows,
native compute offload/JIT for CPU-heavy hot paths,
cross-language runtime APIs for service-oriented applications.

So Iris is not only an actor runtime—it is a runtime fabric that lets you mix coordination, messaging, and high-performance compute under a single operational model.

At its core, Iris uses a cooperative reduction-based scheduler for fairness and high concurrency, while providing built-in supervision, hot swapping, discovery, and location-transparent messaging across nodes.

Note

Node.js bindings are still in very early phases and are not yet feature-parity with Python.

Core Capabilities

Iris is designed as a hybrid platform, not a single-paradigm engine.

⚡ Hybrid Execution Model (Push & Pull)

Iris provides two complementary execution patterns:

Push Actors (Green Threads): ultra-lightweight handlers triggered only when messages arrive.
Pull Actors (OS Threads): blocking mailbox workers for synchronous control flow.
- Python pull actors: run on dedicated OS threads and block on recv() while releasing the GIL.

⚡ Cooperative Reduction Scheduler

Inspired by the BEAM (Erlang VM), Iris uses a cooperative reduction scheduler for fairness.

Reduction budgets: each actor gets a budget, then yields to Tokio via yield_now() when it is exhausted.
Starvation resistance: no single high-throughput actor can monopolize a core.

🔄 Atomic Hot-Code Swapping

Update live application logic without stopping the runtime.

Zero downtime: replace Python or Node.js handlers in memory without losing mailbox state.
Safe transition: in-flight work completes on old logic; new messages use new logic.

🌐 Global Service Discovery

Actors are first-class network services.

Name registry: register human-readable names (for example, "auth-provider") with register/unregister and resolve with whereis.
Async discovery: resolve remote service PIDs with Python await or Node.js Promises without blocking runtime progress.
Location transparency: message actors the same way whether local or remote. The runtime automatically spawns lightweight proxy actors; callers simply treat the returned PID as if it were local and the network is handled invisibly.

🛡️ Distributed Supervision & Self-Healing

Built-in fault tolerance follows the “Let it Crash” model.

Heartbeat monitoring: automatic PING/PONG (0x02/0x03) detects silent failures such as GIL stalls and half-open links.
Structured system messages: exits, hot swaps, and heartbeats are surfaced as system events for supervisors.
Self-healing factories: restart logic can re-resolve and reconnect automatically when remote nodes recover.

Technical Deep Dive

The Actor Lifecycle

Iris actor internals vary by execution pattern:

Push actors: state-machine driven futures in Tokio, typically ~2KB each.
Pull actors (Python): dedicated blocking threads with higher footprint but simpler synchronous flow.

Distributed Mesh Protocol

Iris uses a length-prefixed binary TCP protocol for inter-node communication.

Packet Type	Function	Payload Structure
`0x00`	User Message	`[PID: u64][LEN: u32][DATA: Bytes]`
`0x01`	Resolve Request	`[LEN: u32][NAME: String]` → Returns `[PID: u64]`
`0x02`	Heartbeat (Ping)	`[Empty]` — Probe remote node health
`0x03`	Heartbeat (Pong)	`[Empty]` — Acknowledge health

Memory Safety & FFI

Iris bridges Rust safety with dynamic-language ergonomics through PyO3 (Python) and N-API (Node.js):

Membrane hardening: uses block_in_place and ThreadSafeFunction queues for safe sync/async boundaries.
GIL management: Python recv() releases the GIL so other Python threads can continue running.
Atomic RwLocks: behavior pointers are swapped safely for thread-safe hot swapping.

Quick Start

Requirements

Rust 1.70+
Python 3.8+ OR Node.js 14+
Maturin (for Python) / NAPI-RS (for Node)
Cranelift JIT backend is included by default; needed only if you use the experimental offload/JIT APIs.

Installation

🐍 Python

# Clone the repository
git clone https://github.com/SSL-ACTX/iris.git
cd iris

# Build and install the Python extension
maturin develop --release

📦 Node.js

# Clone the repository
git clone https://github.com/SSL-ACTX/iris.git
cd iris

# Build the N-API binding
npm install
npm run build

Usage Examples

🧠 JIT & Compute Offload (Python)

Iris exposes one decorator API for native acceleration:

@iris.offload(strategy="jit", return_type="float")
def kernel(...):
    ...

strategy="jit": compiles eligible code paths with Cranelift.
strategy="actor": executes on a dedicated Rust offload pool.

Quick start

import iris

@iris.offload(strategy="jit", return_type="float")
def vector_magnitude(x: float, y: float, z: float) -> float:
    return (x*x + y*y + z*z) ** 0.5

result = vector_magnitude(1.0, 2.0, 3.0)
print(result)

What gets accelerated

Scalar expression kernels with arithmetic/logic/comparisons/ternaries and common math calls.
Generator reductions (sum/any/all) over range(...) and runtime containers.
While-style reduction helpers (sum_while, any_while, all_while) and loop-control intrinsics.
Scalar recurrence loops recognized by the frontend (for/while patterns), including inlined helper calls.

Runtime behavior and safety

JIT execution profiles are specialized by observed input shapes and data layouts.
Supported scalar inputs: Python float, int, bool (lowered to native f64 ABI).
Supported vectorized buffers: f64, f32, signed/unsigned integers, and bool.
On unsupported syntax, compile miss, panic, or profile mismatch, Iris falls back safely to Python.

Observability and controls

Enable JIT logs: IRIS_JIT_LOG=1 or iris.jit.set_jit_logging(...).
Query logging status: iris.jit.get_jit_logging().
Optional multi-variant quantum speculation:
- env: IRIS_JIT_QUANTUM=1
- API: iris.jit.set_quantum_speculation(...) / iris.jit.get_quantum_speculation()

Note

On aarch64 targets, Iris automatically adjusts JIT module flags to avoid unsupported relocation paths. For lower-level internals and compiler details, see JIT.md.

Iris provides a unified API across both supported languages.

1. High-Performance Push Actors (Recommended)

Use spawn for maximum throughput (100k+ actors). Rust owns the scheduling and only invokes the guest language when a message arrives.

Python

import iris
rt = iris.Runtime()

def fast_worker(msg):
    print(f"Processed: {msg}")

# Spawn 1000 workers instantly (Green Threads)
for _ in range(1000):
    rt.spawn(fast_worker, budget=50)

Node.js

const { NodeRuntime } = require('./index.js');
const rt = new NodeRuntime();

const fastWorker = (msg) => {
    // msg is a Buffer
    console.log(`Processed: ${msg.toString()}`);
};

for (let i = 0; i < 1000; i++) {
    rt.spawn(fastWorker, 50);
}

2. Synchronous Pull Actors (Erlang Style)

Use spawn_with_mailbox for complex logic where you want to block and wait for specific messages. No async/await required in Python.

Python

# Runs in a dedicated OS thread. Blocking is safe.
def saga_coordinator(mailbox):
    # Blocks thread, releases GIL
    msg = mailbox.recv() 
    print("Starting Saga...")
    
    # Wait up to 5 seconds for next message
    confirm = mailbox.recv(timeout=5.0)
    if confirm: 
        print("Confirmed")
    else:
        print("Timed out")

rt.spawn_with_mailbox(saga_coordinator, budget=100)

Node.js (Promise Based)

const sagaCoordinator = async (mailbox) => {
    const msg = await mailbox.recv();
    console.log("Starting Saga...");
    
    // Wait up to 5 seconds
    const confirm = await mailbox.recv(5.0);
    if (confirm) {
        console.log("Confirmed");
    } else {
        console.log("Timed out");
    }
};

rt.spawnWithMailbox(sagaCoordinator, 100);

🔧 Persistent Worker Pool (Python)

For high-throughput workloads you can pre-spawn a pool of long-lived child actors and round‑robin work across them instead of repeatedly spawning new actors. This avoids allocation and teardown overhead, giving near-linear scaling when each message performs heavy native work (e.g. JIT offloaded math).

import iris

rt = iris.Runtime()

# spawn a reusable pool attached to a parent PID (useful for supervision)
parent = rt.spawn(lambda m: None)              # dummy parent
workers = iris.spawn_child_pool(rt, parent, size=4)

# `workers` is a list of PIDs. use in a simple round‑robin loop:
from itertools import cycle
worker_cycle = cycle(workers)

def dispatch(price, vol, strike):
    pid = next(worker_cycle)
    # regular send is automatically optimized internally
    rt.send(pid, serialize_args(price, vol, strike))

# each worker's handler can call into JIT or Python directly

3. Structured Concurrency

Actors can now be spawned with a parent PID. When the parent exits (normal or crash) all of its direct children are automatically stopped as well. This mirrors the behaviour of many functional runtimes and makes it easy to manage lifetimes for short‑lived helper tasks.

Rust

let rt = Runtime::new();
let parent = rt.spawn_actor(|mut rx| async move { /* ... */ });
let child = rt.spawn_child(parent, |mut rx| async move { /* will die with parent */ });

rt.send(parent, Message::User(Bytes::from("quit"))).unwrap();
// after the parent exits the child mailbox is closed as well
assert!(!rt.is_alive(child));

Python

rt = iris.Runtime()
parent = rt.spawn(lambda msg: print("parent got", msg))
child = rt.spawn_child(parent, lambda msg: print("child got", msg))

rt.send(parent, b"quit")
import time; time.sleep(0.1)
assert not rt.is_alive(child)

There are three variants of the API:

spawn_child(parent, handler) – mailbox‑based actor.
spawn_child_with_budget(parent, handler, budget) – same but with a reduction budget.
spawn_child_handler_with_budget(parent, handler, budget) – message‑style handler (used by Python/Node wrappers).

Python and Node bindings expose matching helpers (spawn_child, spawn_child_with_mailbox, etc.) which accept the same arguments as their non‑child counterparts plus the parent PID.

Virtual / Lazy Actors (Experimental)

Iris now supports virtual (lazy) actors: a PID is reserved up front, but the actor is only activated on first message delivery. This keeps idle footprint low while preserving actor-style addressing.

Current semantics:

Activation occurs automatically on first send / send_user to the reserved PID.
Optional idle timeout can auto-stop an activated virtual actor after inactivity.
stop(pid) also works for a never-activated virtual actor and cleanly deallocates its PID.

Rust

let rt = Runtime::new();

let pid = rt.spawn_virtual_handler_with_budget(
    |msg| async move {
        // process message once activated
    },
    100,
    Some(std::time::Duration::from_millis(500)),
);

// Actor is activated lazily on first send
rt.send(pid, Message::User(Bytes::from("hello"))).unwrap();

Python

import iris

rt = iris.Runtime()

pid = rt.spawn_virtual(
    lambda msg: print("virtual got", bytes(msg)),
    budget=100,
    idle_timeout_ms=500,
)

# activates on first send
rt.send(pid, b"hello")

4. Service Discovery & Registry

Note

Network hardening: the underlying TCP protocol now imposes a 1 MiB payload ceiling, per-operation timeouts, and diligent logging. Malformed or oversized messages are dropped rather than crashing the node, and remote resolution/send operations will fail fast instead of hanging indefinitely.

Python

# 1. Register a local actor
pid = rt.spawn(my_handler)
rt.register("auth_worker", pid)

# 2. Look it up later (Local)
target = rt.whereis("auth_worker")

# 3. Look it up remotely (Network)
async def find_remote():
    addr = "192.168.1.5:9000"
    # Non-blocking resolution returns a local *proxy* actor
    proxy_pid = await rt.resolve_remote_py(addr, "auth_worker")
    if proxy_pid:
        # send just like a local PID — the runtime forwards the message
        rt.send(proxy_pid, b"login")

Node.js

// 1. Register
const pid = rt.spawn(myHandler);
rt.register("auth_worker", pid);

// 2. Resolve Remote
async function findAndQuery() {
    const addr = "192.168.1.5:9000";
    // resolution returns a local proxy PID
    const proxyPid = await rt.resolveRemote(addr, "auth_worker");
    if (proxyPid) {
        // treat it exactly like a local actor
        rt.send(proxyPid, Buffer.from("login"));
    }
}

Path-Scoped Supervisors

Iris supports hierarchical path registrations (e.g. /svc/payment/processor) and allows you to create a supervisor that is scoped to a path prefix. This is useful for grouping related actors and applying supervision policies per-service or per-tenant.

Key Python APIs:

rt.create_path_supervisor(path) — create a per-path supervisor instance.
rt.path_supervisor_watch(path, pid) — register an actor PID with the path supervisor.
rt.path_supervisor_children(path) — list PIDs currently supervised for the path.
rt.remove_path_supervisor(path) — remove the path supervisor.
rt.spawn_with_path_observed(budget, path) — spawn and register an observed actor under path (useful for testing/monitoring).

Python example:

rt = iris.Runtime()

# spawn an observed actor and register it under a hierarchical path
pid = rt.spawn_with_path_observed(10, "/svc/test/one")

# create a supervisor for the '/svc/test' prefix and register the pid
rt.create_path_supervisor("/svc/test")
rt.path_supervisor_watch("/svc/test", pid)

# inspect supervised children
children = rt.path_supervisor_children("/svc/test")
print(children)  # [pid]

# remove supervisor when done
rt.remove_path_supervisor("/svc/test")

This mechanism makes it easy to apply restart strategies or monitoring rules to logical groups of actors without affecting the global supervisor.

4. Structured System Messages

Python

messages = rt.get_messages(observer_pid)
for msg in messages:
    if isinstance(msg, iris.PySystemMessage):
        if msg.type_name == "EXIT":
            print(f"Actor {msg.target_pid} has crashed!")

Node.js

// Node wrappers return wrapped objects { data: Buffer, system: Object }
const messages = rt.getMessages(observerPid);
messages.forEach(msg => {
    if (msg.system) {
        if (msg.system.typeName === "EXIT") {
            console.log(`Actor ${msg.system.targetPid} has crashed!`);
        }
    } else {
        console.log(`User Data: ${msg.data.toString()}`);
    }
});

5. Hot-Swapping Logic

Python

def behavior_a(msg): print("Logic A")
def behavior_b(msg): print("Logic B (Upgraded!)")

pid = rt.spawn(behavior_a, budget=10)
rt.send(pid, b"test") # Prints "Logic A"

rt.hot_swap(pid, behavior_b)
rt.send(pid, b"test") # Prints "Logic B (Upgraded!)"

# Versioning + rollback
print(rt.behavior_version(pid))   # 2
rt.rollback_behavior(pid, 1)      # back to prior hot-swapped behavior

6. Mailbox Introspection & Timers

Iris exposes lightweight mailbox introspection and actor-local timers so guest languages can inspect queue sizes and schedule timed messages.

Mailbox Introspection

Rust: Runtime::mailbox_size(pid: u64) -> Option<usize> returns the number of user messages queued for pid (excludes system messages).
Python: rt.mailbox_size(pid) mirrors the Rust API and returns None if the PID is unknown.

Python example:

size = rt.mailbox_size(pid)
print(f"Mailbox size for {pid}: {size}")

Actor-local Timers

You can schedule one-shot or repeating messages to an actor's mailbox. Timers are cancellable via an id returned at creation.

Rust APIs: send_after(pid, delay_ms, payload), send_interval(pid, interval_ms, payload), cancel_timer(timer_id)
Python: rt.send_after(pid, ms, b'data'), rt.send_interval(pid, ms, b'data'), rt.cancel_timer(timer_id)

Python example (one-shot):

timer_id = rt.send_after(pid, 200, b'tick')  # send 'tick' after 200ms

# cancel if needed
rt.cancel_timer(timer_id)

Python example (repeating):

timer_id = rt.send_interval(pid, 1000, b'heartbeat')  # every 1s

# stop later
rt.cancel_timer(timer_id)

7. Exit Reasons (Structured)

When an actor exits the runtime sends a structured EXIT system message that includes the reason and optional metadata. This allows supervisors and link/watch logic to make informed decisions.

Common ExitReason variants:

Normal — actor finished cleanly.
Killed — requested shutdown.
Panic — runtime detected a panic; ExitInfo may include panic metadata.
Crash — user code returned an unrecoverable error.

Python example receiving exit info:

for msg in rt.get_messages(supervisor_pid):
    if isinstance(msg, iris.PySystemMessage) and msg.type_name == 'EXIT':
        print('from:', msg.from_pid)
        print('target:', msg.target_pid)
        print('reason:', msg.reason)        # e.g. 'Normal' | 'Panic' | 'Killed'
        print('metadata:', msg.metadata)    # optional dict/bytes with extra info

Node.js receives system objects with the same fields: fromPid, targetPid, reason, and optional metadata.

8. Runtime Configuration APIs (programmatic)

In addition to the environment variables documented above, the Python runtime exposes programmatic setters:

rt.set_release_gil_limits(max_threads: int, gil_pool_size: int) — set the per-process cap and fallback pool size at runtime.
rt.set_release_gil_strict(strict: bool) — when true a spawn(..., release_gil=True) will return an error if the dedicated-thread cap is reached instead of falling back to the shared pool.

Bounded Mailboxes

By default, all Iris mailboxes are unbounded. For some high-throughput workloads you may want a fixed capacity with a drop-new policy — useful for rate-limiting fast producers. Use Runtime.spawn_py_handler_bounded in Python or runtime.spawn_bounded in Node and specify the mailbox capacity (in messages).

Overflow policy can be changed per PID at runtime with:

Python: rt.set_overflow_policy(pid, policy, target_pid_or_none)
Rust: runtime.set_overflow_policy(pid, OverflowPolicy::...)

Supported policies:

dropnew — reject the incoming message (send(...) returns False/Err).
dropold — evict the oldest queued user message, then accept the new one.
redirect — send overflowing messages to a fallback PID.
spill — send a copy to fallback PID and still enqueue the original on primary.
block — block the sender until mailbox capacity is available.

Python example:

from iris import Runtime
rt = Runtime()

# handler simply prints messages
pid = rt.spawn_py_handler_bounded(lambda m: print('got', m), budget=100, capacity=2)

# first two sends succeed
assert rt.send(pid, b'one')
assert rt.send(pid, b'two')

# third message is dropped and send returns False
assert not rt.send(pid, b'three')

Python policy examples:

from iris import Runtime
rt = Runtime()

primary = rt.spawn_py_handler_bounded(lambda m: print("primary", bytes(m)), budget=100, capacity=1)
fallback = rt.spawn(lambda m: print("fallback", bytes(m)), budget=100)

# Redirect overflow to fallback actor
rt.set_overflow_policy(primary, "redirect", fallback)

# Spill copies overflow to fallback and still enqueues on primary
rt.set_overflow_policy(primary, "spill", fallback)

# Block sender until room is available
rt.set_overflow_policy(primary, "block", None)

Python example — toggling GIL release

You can control whether push-based Python actors run their callbacks on a blocking thread that acquires the GIL (avoiding holding the GIL on the async worker) using the release_gil flag on Runtime.spawn:

from iris import Runtime
import time

rt = Runtime()

def handler_no(msg):
    print('no release', __import__('threading').get_ident())

def handler_yes(msg):
    print('release', __import__('threading').get_ident())

pid_no = rt.spawn(handler_no, budget=10, release_gil=False)
pid_yes = rt.spawn(handler_yes, budget=10, release_gil=True)

rt.send(pid_no, b'ping')
rt.send(pid_yes, b'ping')

time.sleep(0.2)

When release_gil=True the handler runs inside a spawn_blocking worker that acquires the GIL; release_gil=False keeps the previous behavior (callback runs directly while holding the GIL on the async worker).

Path-based Registry & Supervision

Iris supports hierarchical path registrations (for example /system/service/one) so you can group, query and supervise actors by logical paths.

Python APIs (examples): rt.register_path(path, pid), rt.unregister_path(path), rt.whereis_path(path), rt.list_children(prefix), rt.list_children_direct(prefix), rt.watch_path(prefix), rt.spawn_with_path_observed(budget, path), rt.child_pids(), rt.children_count().

Python example:

from iris import Runtime
rt = Runtime()

# Spawn and register
pid = rt.spawn(lambda m: None, 10)
rt.register_path("/system/service/one", pid)

print(rt.whereis_path("/system/service/one"))
print(rt.list_children("/system/service"))
print(rt.list_children_direct("/system"))

# Shallow watch: registers current direct children with the supervisor
rt.watch_path("/system/service")
print(rt.child_pids(), rt.children_count())

Node.js example (conceptual):

const { NodeRuntime } = require('./index.js');
const rt = new NodeRuntime();

const pid = rt.spawn(myHandler, 10);
rt.registerPath('/system/service/one', pid);
const children = rt.listChildren('/system/service');

Notes:

list_children returns all descendant registrations under a prefix.
list_children_direct returns only immediate children one level below the prefix.
watch_path performs a shallow registration of direct children with the supervisor — path-scoped supervisors are planned as a next step.

Platform Notes

Linux / macOS / Android (Termux)

Fully supported. High performance via multi-threaded Tokio runtime. Note: Android builds require NDK or local clang configuration.

Windows

Supported. Ensure you have the latest Microsoft C++ Build Tools installed for PyO3/N-API compilation.

Disclaimer

Important

Production Status: Iris is currently in Alpha.

Experimental Features: The JIT/offload APIs are extremely new and may change or break between releases. Use with caution.
Performance Metrics (v0.3.0):
Push Actors: Validated to scale to 100k+ concurrent actors with message throughput exceeding ~1.2M+ msgs/sec on modern cloud vCPUs and ~409k msgs/sec on single-core legacy hardware.
Pull Actors: High-performance threaded actors supporting 100k+ concurrent instances with throughput reaching ~1.5M+ msgs/sec, demonstrating massive scaling beyond traditional thread-pool limitations.
Hot-Swapping: Logic upgrades validated at ~136k swaps/sec while maintaining active message processing.
Notice: The binary protocol is subject to change.
Stability: Always use the Supervisor for critical actor lifecycles to ensure automatic recovery and location transparency.

Author: Seuriin (SSL-ACTX)

v0.4.0

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.github		.github
RFC		RFC
iris		iris
src		src
tests		tests
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
FEAT.md		FEAT.md
JIT.md		JIT.md
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
build.rs		build.rs
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
rustfmt.toml		rustfmt.toml

Folders and files

Latest commit

History

Repository files navigation

Overview

Core Capabilities

⚡ Hybrid Execution Model (Push & Pull)

⚡ Cooperative Reduction Scheduler

🔄 Atomic Hot-Code Swapping

🌐 Global Service Discovery

🛡️ Distributed Supervision & Self-Healing

Technical Deep Dive

The Actor Lifecycle

Distributed Mesh Protocol

Memory Safety & FFI

Quick Start

Requirements

Installation

🐍 Python

📦 Node.js

Usage Examples

🧠 JIT & Compute Offload (Python)

Quick start

What gets accelerated

Runtime behavior and safety

Observability and controls

1. High-Performance Push Actors (Recommended)

Python

Node.js

2. Synchronous Pull Actors (Erlang Style)

Python

Node.js (Promise Based)

🔧 Persistent Worker Pool (Python)

3. Structured Concurrency

Rust

Python

Virtual / Lazy Actors (Experimental)

Rust

Python

4. Service Discovery & Registry

Python

Node.js

Path-Scoped Supervisors

4. Structured System Messages

Python

Node.js

5. Hot-Swapping Logic

Python

6. Mailbox Introspection & Timers

Mailbox Introspection

Actor-local Timers

7. Exit Reasons (Structured)

8. Runtime Configuration APIs (programmatic)

Bounded Mailboxes

Python example — toggling GIL release

Path-based Registry & Supervision

Platform Notes

Disclaimer

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages