Version: 1.0 Status: Active
The Pons Kernel is a microkernel — the single orchestration point of the system. Its responsibility is strictly limited to five things:
- Message Bus — pub/sub between modules (in-memory, fire-and-forget)
- Lifecycle Manager — spawn, kill, restart, hot-swap of module processes
- RPC Routing — request/response routing between modules with timeout enforcement
- Service Directory — dynamic resolution of who provides what, circular dependency detection
- Configuration — layered config with schema validation, hot-reload, scoped per module
Everything else lives in modules. The kernel contains no business logic.
This is the complete step-by-step sequence from pons kernel start to "system ready".
Phase 1 — Discovery
1. Read kernel manifest (version, metadata)
2. Scan <home>/.pons/modules/ for module directories
3. For each directory:
a. Read module.json
b. Validate: id, name, permissions block present
c. Verify manifest hash against stored hash (tamper detection)
d. Resolve entry point path (must exist within module dir)
e. Skip invalid modules with logged reason
4. Result: list of DiscoveredModules
Phase 2 — Configuration
5. For each discovered module with a configSchema:
a. Import/load the schema file
b. Security check: schema path must be within module directory
6. Merge all module schemas + kernel schema → unified AppSchema
7. Load <home>/.pons/config.yaml
8. Validate config against AppSchema
9. Fill missing values with schema defaults
10. If config is invalid: log warnings, continue with best-effort values
Phase 3 — Module Spawn
11. Sort modules by priority (lower = first)
12. For each module:
a. Read approved permissions from PermissionStore
b. Read approved capabilities from PermissionStore
c. Construct spawn command based on runtime (see Section 21)
d. Spawn child process
e. Send { type: "init", protocolVersion, config, workspacePath, projectRoot }
f. Wait for { type: "ready" } (max 30s)
g. On ready: validate manifest, register services, check circular deps
h. If required services available: send { type: "deps_ready" }, start health checks
i. If required services missing: hold in "waiting" state
13. Cascade: as services become available, activate waiting modules
Phase 4 — Running
14. Register signal handlers (SIGINT, SIGTERM, SIGUSR1, SIGUSR2, SIGHUP)
15. Write PID to <home>/.pons/.runtime/kernel.pid
16. Emit kernel.boot log event
17. System is ready — message bus active, RPC routing active
┌─────────────────────────────────────────────────────────────┐
│ Kernel │
│ │
│ ┌────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ MessageBus │ │ Lifecycle │ │ ServiceDirectory │ │
│ │ (pub/sub) │ │ Manager │ │ (registry) │ │
│ └────────────┘ └──────────────┘ └────────────────────┘ │
│ ┌────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Config │ │ Permission │ │ Security │ │
│ │ Manager │ │ Store │ │ Enforcer │ │
│ └────────────┘ └──────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ IPC (newline-delimited JSON over stdin/stdout)
┌────┴────┬─────────┬──────────┬───────────┐
│ │ │ │ │
module- module- module- module- module-...
agent llm gateway memory sandbox
(process) (process)(process) (process) (process)
Core principle: modules never import each other. All communication flows through the kernel over IPC.
Newline-delimited JSON over stdin/stdout of each child process. One JSON object per line, terminated by \n. No binary framing, no external message broker required.
| type | Fields | Description |
|---|---|---|
init |
protocolVersion, config, workspacePath, projectRoot |
First message after spawn. Delivers protocol version and initial config. |
install |
— | Signals first-ever launch. Module should declare any permission requests. |
deps_ready |
— | All required services are available. Module may begin full operation. |
shutdown |
— | Graceful shutdown request. Module should clean up and exit. |
ping |
— | Health check. Module must reply with pong within the timeout. |
deliver |
id, topic, payload |
Pub/sub message delivery. |
config:update |
config, changedSections |
Config hot-reload. Contains only the module's own section. |
call |
id, method, params |
Kernel calling a method on the module. |
rpc_request |
id, from, service, method, params |
Proxied RPC request from another module. |
rpc_response |
id, result?, error? |
Response to a previously sent RPC request. |
service_available |
service |
An optional dependency just became available. |
| type | Fields | Description |
|---|---|---|
ready |
manifest, capabilities? |
Module has initialized and is ready. Sends its parsed manifest and optionally its capabilities for kernel validation. |
log |
level, msg, data?, topic? |
Structured log entry to be aggregated by the kernel. Optional topic enables log grouping (e.g. agent:loop). |
log-group |
level, msg, items |
Grouped log entries (e.g. a summary with sub-items). |
publish |
topic, payload |
Publish a message to the bus. |
call |
id, method, params |
Call a method on the kernel (config.get, module.list, etc.). |
call:response |
id, result?, error? |
Response to a kernel call. |
pong |
— | Reply to a ping health check. |
rpc_request |
id, service, method, params |
Initiate an RPC call to another module. |
rpc_response |
id, result?, error? |
Response to a proxied RPC request. |
ack |
id |
Acknowledge successful processing of a deliver message. id matches the delivered message. |
nack |
id, error |
Reject a deliver message. error is a human-readable string describing the failure. Informational only — kernel does not retry. |
Caller Module Kernel Target Module
│ │ │
├──rpc_request(id)─────>│ │
│ service, method, │ │
│ params │ │
│ ├─ capability check │
│ ├─rpc_request(id)────────>│
│ │ from, method, params │
│ │ │ (processing)
│ │<──rpc_response(id)──────│
│<─rpc_response(id)──────│ result / error │
│ result / error │ │
Timeout: 30 seconds. On expiry — error response is sent back to the caller.
Publisher Kernel Subscribers
│ │ │ │ │
├─publish(topic)───>│ │ │ │
│ payload ├─deliver(id, topic)───>│ │ │
│ │ payload │ │ │
│ ├─deliver(id, topic)──────────>│ │
│ ├─deliver(id, topic)──────────────>│
Fire-and-forget. No persistence, no retry. The ack/nack messages defined in the IPC protocol (Section 4) exist for module-to-module application-level tracking — the kernel forwards them but does not use them for delivery guarantees or retry logic. Stronger guarantees (at-least-once, persistence, replay) are the responsibility of modules.
The kernel publishes lifecycle events to the message bus whenever a module changes state. This allows modules to react to system topology changes without polling.
| Topic | Payload | Published when |
|---|---|---|
system:module:ready |
{ moduleId, provides, version } |
Module activated and health checks started |
system:module:stopped |
{ moduleId, reason } |
Module intentionally stopped (shutdown, kill) |
system:module:crashed |
{ moduleId, exitCode, restartCount } |
Module exited unexpectedly, entering restart flow |
system:module:stopping |
{ moduleId, tier } |
Module is about to receive shutdown signal (during ordered drain) |
Rules:
- These are published by the kernel itself, not by modules
- They follow the same delivery semantics as all pub/sub — at-most-once, fire-and-forget
- Modules must declare
system:module:*topics in theircapabilities.topicsto receive them - The kernel publishes these after the state transition is complete (e.g.
system:module:readyis sent after the module's services are registered, not before) - During shutdown,
system:module:stoppingis sent before theshutdownmessage to the target module, giving other modules a chance to stop sending work to it
Use cases:
- Gateway can show real-time system status to connected clients
- Monitoring modules can track uptime and crash frequency
- Modules with optional dependencies can react to services appearing/disappearing beyond the built-in
service_availablemechanism
Every module has exactly one status at any given time. Transitions are triggered by events from the module process, the kernel, or external signals.
spawn()
│
▼
┌──────────┐
│ starting │
└────┬─────┘
│
┌────────────┼────────────┐
│ │ │
ready msg timeout 30s crash/exit
│ │ │
▼ ▼ ▼
┌──────────┐ ┌────────┐ ┌─────────────┐
│ waiting │ │ killed │ │ restarting │◄──┐
│ (deps) │ └────────┘ └──────┬──────┘ │
└────┬─────┘ │ │
│ backoff delay │
deps ready │ │
│ spawn() │
▼ │ │
┌──────────┐ └───────────┘
│ ready │ (max 5 attempts)
└────┬─────┘ │
│ ▼
┌────┼────────┐ ┌──────────────┐
│ │ │ │ crashed │
crash kill() shutdown │ (terminal) │
│ │ │ └──────────────┘
▼ ▼ ▼
restarting stopped stopped
States:
| State | Description |
|---|---|
starting |
Process spawned, waiting for ready message (max 30s) |
waiting |
Module sent ready but required services are not yet available (max 30s) |
ready |
Module is fully operational. Health checks are active. |
restarting |
Module exited unexpectedly. Kernel is waiting for backoff delay before re-spawning. |
stopped |
Module was intentionally stopped (shutdown or kill). |
crashed |
Module exceeded max restart attempts. Terminal state — requires manual intervention. |
killed |
Module was killed due to a violation or timeout. May lead to restart or stopped. |
Transition rules:
- Only
readymodules receivedeliver,rpc_request, andconfig:updatemessages - A module in
waitingonly receivesservice_availablenotifications - A module in
crashedstate can only be restarted manually viapons module restart <id> killedis a transient state — it transitions torestarting(if attempts remain) orcrashed
The full activation sequence is described in Section 2 (Boot Sequence, Phase 3).
- Every 30 seconds: kernel sends
ping - Module must reply with
pongwithin 10 seconds - After 3 consecutive failures: kill process → restart logic
- A single successful
pongresets the failure counter
- Exponential backoff: 1s → 2s → 4s → 8s → 16s → 32s → 60s (max)
- Maximum 5 attempts
- If module lived less than 1 second → likely an entry point error (logged as warning)
- After 5 failed attempts → status:
crashed, no further restarts - Restart counter resets if module stays alive for more than 60 seconds
Hot-swap replaces a running module with a new version without restarting the kernel or other modules.
1. CLI or operator triggers hot-swap for module X
2. Kernel sends { type: "shutdown" } to module X
3. Wait up to 5s for graceful exit
4. Kill if still running
5. Unregister module X's services from the directory
6. Load new manifest from disk (re-read module.json)
7. Validate new manifest (hash, permissions, entry point)
8. Spawn new process with updated permissions
9. Wait for ready → re-register services
10. Notify modules that had optional dependency on X's services
Constraints:
- Hot-swap does not change the module ID — it's the same logical module with new code
- If the new version fails to start, the old version is NOT restored (module enters restart/crash flow)
- In-flight RPC calls to the old module will timeout and return errors to callers
The kernel shuts down modules in reverse dependency order to ensure clean drain of in-flight work. Modules that accept external traffic (e.g. gateway) stop first, allowing downstream modules (e.g. agent, LLM) to finish processing before they are stopped.
Phase 1 — Compute shutdown order
1. Build dependency graph from all running modules' `requires` and `provides`
2. Topological sort: modules with no dependents (leaves) shut down first
3. Group into tiers — modules in the same tier can shut down in parallel
Phase 2 — Tier-by-tier shutdown
For each tier (leaves first → roots last):
1. Kernel publishes { topic: "system:module:stopping", payload: { moduleId, tier } }
to the bus (allows other modules to stop sending work to this module)
2. Kernel → modules in tier: { type: "shutdown" }
3. Wait up to 5 seconds for voluntary exit
4. Kill remaining processes in this tier forcefully
5. Proceed to next tier
Phase 3 — Cleanup
1. Close message bus
2. Remove kernel PID file
3. Close log file handle
4. Exit kernel process
Example shutdown order:
Tier 1 (leaves — no one depends on them): module-gateway
Tier 2 (depended on by gateway only): module-agent, module-sandbox
Tier 3 (depended on by agent): module-llm, module-memory
Gateway stops accepting connections first. Agent finishes in-flight turns. LLM and memory stop last after all consumers are gone.
Fallback: if the dependency graph is empty or cannot be computed (e.g. no requires declared), the kernel falls back to sending shutdown to all modules simultaneously (v1 behavior).
Each module declares in its manifest:
{
"provides": ["service-name"],
"requires": ["other-service"],
"optionalRequires": ["nice-to-have-service"]
}Rules:
- Each service name may only be provided by one module (duplicates are rejected)
requires— module will not activate until all required services are available (timeout: 30s)optionalRequires— graceful degradation; module activates even without them, and receives aservice_availablenotification if they come online later- Circular dependency detection runs on every activation (DFS graph traversal); a detected cycle kills the module immediately
Topics are the pub/sub channel names used for fire-and-forget messaging. Subscription is declared statically in the module manifest via capabilities.topics:
{
"capabilities": {
"topics": ["inbound:message", "agent:turn:end"]
}
}When a module sends { type: "ready" }, the kernel reads the capabilities.topics list and registers the module as a subscriber for those topics in the MessageBus.
Rules:
- A module can only publish to topics listed in its
capabilities.topics— enforced by the SecurityEnforcer - A module can only receive
delivermessages for topics it declared - There is no dynamic subscribe/unsubscribe at runtime in v1. Changing topic subscriptions requires a manifest update and module restart (or hot-swap).
- Topic names are freeform strings. Convention:
domain:event(e.g.agent:turn:start,outbound:ws)
A module calls the kernel by sending a call message. The kernel responds with call:response.
| method | Params | Returns | Description |
|---|---|---|---|
config.get |
{ key } |
value | Read a value from the module's own config section. Key is a dot-separated path (e.g. "agents.defaultModel"). |
config.set |
{ key, value } |
{ ok: true } |
Write a value to the module's own config section. Validates against schema. Persists to config.yaml atomically. Notifies affected modules via config:update. |
config.sections |
— | list of strings | List available config section names (filtered to the module's own section only). |
module.list |
— | list of { id, status, provides } |
Get all modules with their current status and provided services. |
module.commands |
— | list of CommandDeclarations | Get all CLI command declarations from all module manifests. |
service.discover |
— | list of { service, moduleId } |
List all registered services and their provider modules. |
service.resolve |
{ service } |
moduleId |
Resolve which module provides a specific service. Returns error if service not found. |
permissions.request |
{ permissions, reason? } |
{ granted, pending?, denied?, requestId? } |
Request additional runtime permissions from the user. If approval is required, returns pending: true with a requestId for tracking. |
permissions.check |
{ permissions } |
{ granted, missing } |
Check which of the requested permissions are currently granted and which are missing. |
Config scoping: a module can only read/write its own config section (identified by configKey in the manifest). Path traversal patterns (.., __proto__, constructor, prototype) are rejected. Violations are rejected and logged as security events.
logging:
level: info # trace | debug | info | warn | error | fatal
levels:
module-agent: debug # per-module level override
# Module-specific sections below (each module owns its key)
models:
providers:
- id: openai
apiKey: ${OPENAI_API_KEY}- Each module declares a config schema (using any schema validation library available in the implementation language)
- On boot, the kernel discovers and imports all module schemas, merges them into an
AppSchema - Loads
config.yaml, validates againstAppSchema, fills in defaults - On hot-reload signal: compares against previous config, sends
config:updateonly to affected modules - Each module receives only its own section — other sections are never forwarded
Before loading a schema file, the kernel performs security checks to prevent malicious schema files from executing arbitrary code:
- Path containment — the schema file path must resolve to a location within the module directory (after symlink resolution). Paths escaping the module dir are rejected.
- Extension whitelist — only
.ts,.js, or.jsonextensions are accepted - Static pattern scan — the kernel scans the schema file source for forbidden patterns before importing it:
- Process spawning (e.g.
exec,spawn,Command) - Network access (e.g.
listen,connect,fetch,WebSocket) - Dynamic code execution (e.g.
eval, dynamicimport()) - File system mutations (e.g.
remove,unlink) - If any pattern is found, the schema is skipped with a warning — the module still loads but without schema validation
- The exact pattern list is implementation-specific; each runtime adds its own dangerous APIs to the scan
- Process spawning (e.g.
This is defense-in-depth: the runtime sandbox already restricts module permissions, but schema files may be imported into the kernel process itself, so additional guards are necessary.
{
"configKey": "models",
"configSchema": "./src/config.schema"
}Layers:
- Process Sandbox — each module runs as a separate process with the minimum required OS permissions; the runtime should enforce permission boundaries (e.g. sandboxed runtimes, seccomp, containers)
- Manifest Hash — a cryptographic hash (e.g. SHA-256) of the module manifest is stored at install time and verified on every load (tamper detection)
- Permission Store — permissions are stored in
<home>/.pons/permissions.yamland must be explicitly approved by the user at install time - Runtime Enforcer — every RPC call, pub/sub publish/subscribe, and config access is checked against declared capabilities before being forwarded
- Static Audit — source scan at install time for patterns that could bypass sandbox restrictions (see below)
{
"permissions": {
"net": ["api.openai.com"],
"read": ["~/.pons/", "./workspace/"],
"write": ["~/.pons/data/"],
"env": ["OPENAI_API_KEY", "HOME"],
"run": ["git"],
"sys": ["hostname"]
}
}These are translated into the appropriate sandbox restrictions for the target runtime (e.g. OS-level flags, container policies, seccomp profiles).
Default when no permissions declared: deny all.
{
"capabilities": {
"services": ["llm", "memory"],
"topics": ["agent.task", "agent.result"]
}
}- A module may only call services listed in its
capabilities.services - A module may only publish/subscribe to topics listed in its
capabilities.topics - Capabilities are stored in the Permission Store at install approval time — not self-asserted by the module at runtime
- On
ready, the kernel loads capabilities from the store (falling back to manifest if no store entry exists for backward compatibility) - Violation → log the event + kill the module
The permission store persists approved permissions and capabilities across kernel restarts.
modules:
module-agent:
manifestHash: "sha256:a1b2c3..." # SHA-256 of module.json at approval time
firstSpawn: "2026-01-15T10:30:00Z" # Timestamp of first install
permissions: # Approved OS-level permissions
net: ["api.openai.com"]
read: ["~/.pons/"]
write: ["~/.pons/data/"]
env: ["HOME"]
capabilities: # Approved IPC-level capabilities
services: ["llm", "memory"]
topics: ["inbound:message", "agent:turn:start"]
dynamicPermissions: {} # Runtime-requested permissions (granted)
pendingRequests: [] # Queued permission requests (awaiting user)
deniedRequests: [] # Denied requests (prevents re-prompting)Manifest tamper detection flow:
- At
pons module install— user reviews and approves permissions. SHA-256 hash ofmodule.jsonis computed and stored. - On every kernel boot — for each module, compute current hash and compare against stored hash.
- If mismatch → refuse to load module. Log error:
"manifest hash mismatch for module-X — re-install required". - This prevents privilege escalation via silent manifest edits.
Runtime permission request flow:
- Module sends
call("permissions.request", { permissions, reason }) - Kernel queues the request and sends a system notification (macOS AppleScript / Linux
notify-send) — best-effort - User approves or denies via CLI:
pons permissions grant <requestId>/pons permissions deny <requestId> - On grant: permissions added to
dynamicPermissions, module restarted with new effective permissions - On deny: request moved to
deniedRequests, module receives{ granted: false, denied: true }
At module install time (pons module install), the kernel scans module source files for patterns that could bypass the runtime sandbox. This is advisory only — it does not prevent installation but displays warnings to the user during the approval flow.
Scanned patterns:
- Node.js API imports:
node:fs,node:child_process,node:net,node:http,node:https,node:dgram,node:tls - Dynamic require:
createRequire - Dynamic imports:
import('node:...')
Limitations:
- Cannot detect obfuscated or dynamically constructed imports
- Does not scan transitive dependencies
- Applies primarily to JavaScript/TypeScript modules — other runtimes rely on OS-level sandboxing
This scanner is defense-in-depth: it catches accidental sandbox bypasses. Intentional bypasses by a malicious module require OS-level containment (containers, seccomp, etc.).
The kernel listens for OS signals to trigger live updates without downtime:
| Signal | Action |
|---|---|
SIGINT / SIGTERM |
Graceful shutdown |
SIGUSR1 |
Config hot-reload (re-read config file) |
SIGUSR2 |
Permission hot-reload (re-read permissions file) |
SIGHUP |
Module discovery + hot-load newly installed modules |
The CLI sends these signals after writing updated files.
Note for non-Unix implementations: on platforms without POSIX signals, equivalent mechanisms (e.g. named pipes, admin HTTP endpoints, file watchers) should be used to trigger the same behaviors. See Section 19 (Known Limitations) for details.
The kernel aggregates its own logs together with logs forwarded from modules via IPC (log and log-group messages).
Log levels: trace, debug, info, warn, error, fatal
Log output:
- Development: human-readable colorized format to stdout + daily rotating file
- Production: structured JSON to stdout
Log file path: <home>/.pons/.runtime/logs/kernel-YYYY-MM-DD.log
Each log entry includes at minimum: level, timestamp, module (source), msg.
Every failure in the kernel has a defined response. This section is the single source of truth for what happens when things go wrong.
| Failure | Kernel response |
|---|---|
| Module crashes (exit code ≠ 0) | Log error with last stderr line. Restart with exponential backoff (see Section 7). |
Module does not send ready within 30s |
Kill process. Treat as crash → restart logic. |
Module fails health check (no pong in 10s) |
Retry up to 3 times (one per health interval). After 3 consecutive failures → kill process → restart logic. |
| Module sends unknown message type | Log warning with module ID and message type. Drop message. Module is not killed. |
| Module sends malformed JSON on stdout | Log warning. Treat line as plain text (stderr-like). Module is not killed but may time out if ready was never sent. |
| Module writes binary data on stdout | Same as malformed JSON — logged, dropped, not fatal. |
| Module sends message after being killed | Ignore. Process stdout/stderr may still be buffered; kernel discards messages from modules not in ready or starting state. |
| Two modules declare same service | Second module is rejected at registration. Log error. Module is killed with reason duplicate-service. |
| Circular dependency detected | All modules in the cycle are killed with reason circular-dependency. Cycle path is logged. |
| Module exceeds max restart attempts (5) | Module status set to crashed. No further restarts. Log error. Operator must intervene (pons module restart <id>). |
| Failure | Kernel response |
|---|---|
| Write to module stdin fails | Log warning. Message is lost. No retry. Module may be dead — wait for exit event. |
| Module stdout closes unexpectedly | Treat as crash → call onShutdown() if possible, then restart logic. |
| Very large JSON message (>1 MB) | No built-in limit in v1. Implementations should consider adding a configurable max message size. |
| Partial JSON (buffer split mid-line) | The newline-delimited protocol guarantees complete lines. Partial reads are buffered until \n is received. |
| Failure | Kernel response |
|---|---|
| RPC target service not found | Immediate error response to caller: { error: "service_not_found" } |
| RPC target module not ready | Immediate error response to caller: { error: "module_not_ready" } |
| RPC timeout (30s) | Error response to caller: { error: "timeout" }. Late responses from target are silently dropped. |
| RPC caller capability violation | Error response to caller: { error: "forbidden" }. Log security violation. |
| Failure | Kernel response |
|---|---|
config.yaml missing at boot |
Use schema defaults for all sections. Log warning. |
config.yaml is invalid YAML |
Reject load. Keep previous config in memory. Log error with parse details. |
config.yaml fails schema validation |
Log validation errors per field. Fill invalid fields with defaults. Load succeeds with warnings. |
config.yaml deleted while running |
No effect until next hot-reload. On hot-reload: treat as missing → use defaults. |
| Hot-reload fails mid-save | Keep previous config. Log error. No modules are notified. |
| Module requests config outside its scope | Reject call with error. Log security violation. |
| Violation | Kernel response |
|---|---|
| Module calls undeclared service | Reject RPC. Log violation with caller ID, target service. Kill module. |
| Module publishes to undeclared topic | Reject publish. Log violation. Kill module. |
| Module manifest hash mismatch | Refuse to load module. Log error with module ID. |
All timeouts, limits, and defaults in one place. These values should be configurable where noted.
| Parameter | Default | Configurable | Description |
|---|---|---|---|
| RPC timeout | 30s | Yes (per-call) | Max time to wait for an RPC response |
| Health check interval | 30s | Yes (kernel config) | Time between ping messages |
| Health check response timeout | 10s | Yes (kernel config) | Max time to wait for pong reply |
| Health check max failures | 3 | Yes (kernel config) | Consecutive failures before kill |
| Module ready timeout | 30s | Yes (kernel config) | Max time to wait for ready after spawn |
| Dependency wait timeout | 30s | Yes (kernel config) | Max time to wait for required services |
| Graceful shutdown timeout | 5s | Yes (kernel config) | Time to wait for modules to exit voluntarily |
| Parameter | Default | Configurable | Description |
|---|---|---|---|
| Max restart attempts | 5 | Yes (kernel config) | After this, module is marked crashed |
| Max restart backoff | 60s | Yes (kernel config) | Upper bound on exponential backoff delay |
| IPC write queue depth | 512 | No (v1) | If a module's stdin write queue reaches this limit, the module is marked as disconnected and all further messages to it are dropped. This prevents memory exhaustion when a module cannot keep up. |
| Max IPC message size | No limit (v1) | Planned (v2) | Recommended: implementations should cap at 10 MB |
| Max concurrent modules | No limit (v1) | Planned (v2) | Limited by OS process capacity |
| Max IPC string field length | 256 chars | No (v1) | Validated on incoming messages. Fields exceeding this limit cause message rejection. |
| Parameter | Default | Description |
|---|---|---|
| Logging level | info |
Kernel-wide default |
| Config file | <home>/.pons/config.yaml |
Main config location |
| Permissions file | <home>/.pons/permissions.yaml |
Approved permissions location |
| Module directory | <home>/.pons/modules/ |
Where modules are installed |
| PID file | <home>/.pons/.runtime/kernel.pid |
Kernel process ID |
| Default runtime | deno |
When manifest omits runtime field |
The kernel provides at-most-once delivery for all message types. This is a deliberate design choice — the kernel is a router, not a broker.
- Messages are delivered in the order they are sent from a single publisher to a single subscriber (FIFO per-pair)
- RPC responses are matched to their requests by ID
- A module only receives
delivermessages for topics it has subscribed to - A module only receives
rpc_requestmessages for services it provides
- If a module's IPC write queue exceeds 512 messages, the module is marked disconnected and messages are dropped (see Section 15)
- If a module crashes between receiving a message and processing it, the message is lost
- Pub/sub has no acknowledgment — the kernel does not know if a subscriber processed the message
- RPC responses that arrive after timeout are silently discarded
- Message ordering across multiple publishers is not guaranteed
Modules that require stronger guarantees must implement them at the application level: idempotency keys for RPC calls, acknowledgment protocols on top of pub/sub, persistent queues for critical events.
The kernel and modules negotiate protocol compatibility via the manifest and the init message.
The init message includes a protocolVersion field:
{ "type": "init", "protocolVersion": "1.0", "config": {}, "workspacePath": "...", "projectRoot": "..." }The module's manifest includes a minProtocolVersion field (optional, defaults to "1.0"):
{
"id": "module-agent",
"minProtocolVersion": "1.0"
}Version matching rules:
- Major version must match exactly (kernel v1.x only loads modules requiring v1.x)
- Minor version: kernel must be ≥ module's minimum (kernel v1.3 can run module requiring v1.0)
- If incompatible: kernel refuses to load module, logs error with both versions
- New message types may be added in minor versions — modules must ignore unknown types gracefully
- Existing message fields are never removed or renamed within a major version
- New optional fields may be added to existing messages in minor versions
The CLI provides pons status which reports:
- Kernel process: running / stopped (PID, uptime)
- Per-module status:
ready,starting,crashed,stopped - Per-module last health check: timestamp, pass/fail
- Service directory: which services are registered and by whom
Beyond regular log messages, the kernel emits structured events for key lifecycle moments:
| Event | Logged when | Key fields |
|---|---|---|
kernel.boot |
Kernel starts | version, moduleCount, configPath |
kernel.shutdown |
Kernel stops | reason, uptime |
module.spawn |
Module process started | moduleId, pid, runtime |
module.ready |
Module sent ready | moduleId, startupMs |
module.crash |
Module exited unexpectedly | moduleId, exitCode, stderr, restartCount |
module.killed |
Kernel killed a module | moduleId, reason |
rpc.timeout |
RPC call exceeded timeout | callerId, targetService, method, timeoutMs |
security.violation |
Capability/permission check failed | moduleId, action, target |
config.reload |
Config hot-reload completed | changedSections, affectedModules |
All events are written to the kernel log. In production mode (JSON output), they can be ingested by external monitoring systems (ELK, Datadog, Grafana Loki, etc.).
This section documents what v1 explicitly does not support. These are intentional scope decisions, not bugs.
- No high availability. The kernel is a single process, single host. If it dies, all modules die. Use an external process supervisor (systemd, launchd) for automatic restart.
- No clustering. Modules cannot span multiple machines. All modules must run on the same host as the kernel.
- No message persistence. Pub/sub is in-memory only. Messages are lost on crash. Modules that need durability must implement their own persistence.
- Limited backpressure. If a module's IPC write queue exceeds 512 messages, the module is marked disconnected and messages are dropped. The kernel does not slow down publishers — it drops messages to the slow consumer.
- No module-level resource limits. The kernel does not enforce CPU/memory quotas per module. This is delegated to OS-level mechanisms (cgroups, containers).
- No horizontal scaling. A single kernel can only be as powerful as the host it runs on.
- No built-in authentication for IPC. Modules are trusted once approved at install time. There is no per-message signing or encryption on the stdin/stdout channel.
- No Windows signal support. SIGUSR1/SIGUSR2/SIGHUP are Unix-only. Windows implementations must use alternative mechanisms (see Section 12).
<home>/.pons/ # or $PONS_HOME
├── modules/
│ └── {module-id}/ # Module directory
│ ├── module.json # Manifest
│ └── src/ # Source code
├── config.yaml # Global config
├── permissions.yaml # Approved permissions
└── .runtime/
├── kernel.pid # Kernel PID file
└── logs/
└── kernel-YYYY-MM-DD.log
Modules can be written in any programming language. The kernel does not assume a specific runtime — it reads the runtime field from the module manifest and spawns the appropriate process.
| runtime | Spawn command | Sandbox mechanism |
|---|---|---|
deno |
deno run [permission-flags] <entry> |
Deno permission flags (--allow-net, --allow-read, etc.) |
node |
node <entry> |
Process-level restrictions (future: policy file) |
bun |
bun run <entry> |
Process-level restrictions |
go |
./<entry> (pre-compiled binary) |
OS-level (seccomp, containers) |
rust |
./<entry> (pre-compiled binary) |
OS-level (seccomp, containers) |
python |
python <entry> |
OS-level (seccomp, containers) |
php |
php <entry> |
OS-level (seccomp, containers) |
binary |
./<entry> (any executable) |
OS-level (seccomp, containers) |
If runtime is omitted, the kernel defaults to deno for backward compatibility.
- Kernel reads
runtimeandentryfrommodule.json - Constructs the spawn command based on the runtime table above
- For
denoruntime: translatespermissionsto Deno CLI flags - For all other runtimes: spawns the process directly; sandbox enforcement is delegated to OS-level mechanisms
- Regardless of runtime, IPC is identical: newline-delimited JSON over stdin/stdout
Each supported language needs a thin SDK (~200–400 lines) that implements the IPC protocol. See the SDK Specification (docs/specs/sdk.md) for the full contract.
Modules written in languages other than TypeScript cannot export a Zod schema. Instead, they declare their config schema as a JSON Schema file:
{
"configKey": "my-module",
"configSchema": "./config.schema.json"
}The kernel accepts both formats: if the schema file extension is .json, it is loaded as JSON Schema. If .ts or .js, it is imported as a Zod schema. Both are used for validation equally.
{
"id": "module-agent",
"name": "Agent Module",
"version": "1.0.0",
"runtime": "deno",
"entry": "src/runner.ts",
"priority": 10,
"provides": ["agent"],
"requires": ["llm", "memory"],
"optionalRequires": ["sandbox"],
"configKey": "agents",
"configSchema": "./src/config.schema.ts",
"permissions": {
"net": [],
"read": ["~/.pons/"],
"write": ["~/.pons/data/"],
"env": ["HOME"],
"run": [],
"sys": []
},
"capabilities": {
"services": ["llm", "memory"],
"topics": ["agent.task", "agent.result", "agent.status"]
}
}Fields:
id— unique identifier (slug, lowercase, hyphens)name— human-readable nameversion— semver stringruntime— execution runtime (see Section 21). Defaults todenoif omitted.entry— path to the entry point, relative to the module directorypriority— spawn order (lower = earlier)provides/requires/optionalRequires— service graph declarationsconfigKey— top-level key inconfig.yamlthat this module ownsconfigSchema— path to the config schema definition (.ts/.jsfor Zod,.jsonfor JSON Schema)permissions— OS-level access requests (approved at install time)capabilities— IPC-level access declarations (enforced at runtime)
- Does not store or queue messages (no persistence, no replay)
- Does not contain any business logic
- Does not integrate with LLMs
- Does not manage user workspace or files
- Does not expose an HTTP API (that is the responsibility of a gateway module)
- Does not implement skills or agents — those live in modules
The CLI includes a service subcommand for registering the kernel as a system service with automatic restart on failure and autostart on boot.
Commands: pons service install, uninstall, start, stop, status, logs
Requirements:
- Auto-detect host platform: systemd (Linux), launchd (macOS), Task Scheduler (Windows)
- User-level install by default (no root required); system-level install optional with privilege elevation
- Service must restart kernel on failure (restart delay: 5s)
uninstallreverses all registration steps without removing Pons data or config- Platform-specific service file formats and commands are implementation details of the CLI, not the kernel
Adding a new kernel call:
- Add a handler in the kernel's call dispatcher
- Add capability/permission checks in the enforcer if the call accesses sensitive resources
- Document it in Section 9 of this specification
Adding error handling for a new failure mode:
- Determine the failure category (module, IPC, RPC, config, security)
- Add a row to the appropriate table in Section 14
- Implement the response in the relevant component
Changing the IPC protocol:
- Update the shared type definitions (SDK / shared types package)
- Update message handlers in the lifecycle manager
- Bump the kernel version and document the change
Adding a new runtime:
- Add a row to the runtime table in Section 21
- Add the spawn command logic in the lifecycle manager's
forkProcess()function - Implement or point to an SDK for that language
- Document sandbox mechanism (if any)
Porting the kernel itself to another language: The kernel can be implemented in any language capable of spawning child processes and communicating over stdin/stdout. The IPC protocol (newline-delimited JSON), manifest format (JSON), and config format (YAML) are the only cross-language contracts that must be preserved exactly. Everything else — class names, file structure, libraries — is an implementation detail.
These examples show the exact JSON messages exchanged between the kernel and modules during real scenarios. Use them as a reference when implementing the protocol.
KERNEL → module-llm: {"type":"init","protocolVersion":"1.0","config":{"models":{"providers":[{"id":"anthropic","type":"anthropic"}]}},"workspacePath":"/home/user/.pons/workspace","projectRoot":"/home/user/project"}
module-llm → KERNEL: {"type":"ready","manifest":{"id":"module-llm","name":"LLM Module","version":"1.0.0","provides":["llm"],"requires":[],"configKey":"models"},"capabilities":{"services":[],"topics":["llm:usage"]}}
KERNEL → module-llm: {"type":"deps_ready"}
KERNEL → module-agent: {"type":"init","protocolVersion":"1.0","config":{"agents":{"defaultModel":"claude-sonnet-4"}},"workspacePath":"/home/user/.pons/workspace","projectRoot":"/home/user/project"}
module-agent → KERNEL: {"type":"ready","manifest":{"id":"module-agent","provides":["agent"],"requires":["llm","memory"]},"capabilities":{"services":["llm","memory"],"topics":["inbound:message","agent:turn:start","agent:turn:end"]}}
(kernel holds module-agent in "waiting" — llm and memory not yet available)
(module-llm and module-memory start and register their services)
KERNEL → module-agent: {"type":"deps_ready"}
KERNEL → module-llm: {"type":"ping"}
module-llm → KERNEL: {"type":"pong"}
module-agent → KERNEL: {"type":"publish","topic":"agent:turn:start","payload":{"agentId":"support-agent","sessionId":"sess-001","runId":"run-abc"}}
(kernel looks up subscribers of "agent:turn:start" — finds module-gateway)
KERNEL → module-gateway: {"type":"deliver","id":"msg-7f3a","topic":"agent:turn:start","payload":{"agentId":"support-agent","sessionId":"sess-001","runId":"run-abc"}}
module-agent → KERNEL: {"type":"rpc_request","id":"rpc-001","service":"llm","method":"generateText","params":{"provider":"anthropic","model":"claude-sonnet-4","system":"You are a helpful assistant.","messages":[{"role":"user","content":"Hello"}]}}
(kernel checks agent's capabilities — "llm" is declared — forwards to module-llm)
KERNEL → module-llm: {"type":"rpc_request","id":"rpc-001","from":"module-agent","service":"llm","method":"generateText","params":{"provider":"anthropic","model":"claude-sonnet-4","system":"You are a helpful assistant.","messages":[{"role":"user","content":"Hello"}]}}
(module-llm processes the request, calls the LLM API)
module-llm → KERNEL: {"type":"rpc_response","id":"rpc-001","result":{"content":"Hello! How can I help you today?","usage":{"promptTokens":15,"completionTokens":9,"totalTokens":24}}}
(kernel forwards response back to caller)
KERNEL → module-agent: {"type":"rpc_response","id":"rpc-001","result":{"content":"Hello! How can I help you today?","usage":{"promptTokens":15,"completionTokens":9,"totalTokens":24}}}
module-sandbox → KERNEL: {"type":"rpc_request","id":"rpc-002","service":"memory","method":"store","params":{"content":"secret data"}}
(kernel checks sandbox's capabilities — "memory" is NOT declared)
KERNEL → module-sandbox: {"type":"rpc_response","id":"rpc-002","error":"forbidden"}
(kernel logs security violation, kills module-sandbox)
module-agent → KERNEL: {"type":"call","id":"call-001","method":"config.get","params":{"key":"agents.defaultModel"}}
KERNEL → module-agent: {"type":"call:response","id":"call-001","result":"claude-sonnet-4"}
(operator runs: pons config set models.providers[0].model claude-opus-4 → CLI writes config.yaml, sends SIGUSR1)
KERNEL → module-llm: {"type":"config:update","config":{"models":{"providers":[{"id":"anthropic","type":"anthropic","model":"claude-opus-4"}]}},"changedSections":["models"]}
(operator runs: pons kernel stop → sends SIGTERM)
(kernel computes shutdown order from dependency graph)
── Tier 1: module-gateway (leaf — no one depends on it) ──
KERNEL → subscribers of system:module:stopping: {"type":"deliver","id":"sys-001","topic":"system:module:stopping","payload":{"moduleId":"module-gateway","tier":1}}
KERNEL → module-gateway: {"type":"shutdown"}
(module-gateway stops accepting connections, cleans up, exits within 5s)
── Tier 2: module-agent, module-sandbox ──
KERNEL → subscribers of system:module:stopping: {"type":"deliver","id":"sys-002","topic":"system:module:stopping","payload":{"moduleId":"module-agent","tier":2}}
KERNEL → module-agent: {"type":"shutdown"}
KERNEL → module-sandbox: {"type":"shutdown"}
(both finish in-flight work, exit within 5s)
── Tier 3: module-llm, module-memory (roots) ──
KERNEL → module-llm: {"type":"shutdown"}
KERNEL → module-memory: {"type":"shutdown"}
(both exit)
(kernel removes PID file, closes logs, exits)
This shows the complete message flow when a user sends "Hello" via the gateway to an agent:
── Step 1: Gateway receives HTTP/WS request, publishes to bus ──
module-gateway → KERNEL: {"type":"publish","topic":"inbound:message","payload":{"agentId":"support-agent","senderId":"user-42","channelType":"chat","channelId":"ws-conn-7","content":"Hello"}}
── Step 2: Kernel delivers to module-agent (subscribed to inbound:message) ──
KERNEL → module-agent: {"type":"deliver","id":"msg-a1b2","topic":"inbound:message","payload":{"agentId":"support-agent","senderId":"user-42","channelType":"chat","channelId":"ws-conn-7","content":"Hello"}}
── Step 3: Agent assembles context and calls LLM via RPC ──
module-agent → KERNEL: {"type":"rpc_request","id":"rpc-010","service":"llm","method":"generateText","params":{"provider":"anthropic","model":"claude-sonnet-4","system":"You are a support agent...","messages":[{"role":"user","content":"Hello"}],"tools":[{"name":"remember","description":"Save to memory","parameters":{}}]}}
KERNEL → module-llm: {"type":"rpc_request","id":"rpc-010","from":"module-agent","service":"llm","method":"generateText","params":{"provider":"anthropic","model":"claude-sonnet-4","system":"You are a support agent...","messages":[{"role":"user","content":"Hello"}],"tools":[{"name":"remember","description":"Save to memory","parameters":{}}]}}
── Step 4: LLM responds ──
module-llm → KERNEL: {"type":"rpc_response","id":"rpc-010","result":{"content":"Hi there! How can I help you today?","usage":{"promptTokens":42,"completionTokens":11,"totalTokens":53}}}
KERNEL → module-agent: {"type":"rpc_response","id":"rpc-010","result":{"content":"Hi there! How can I help you today?","usage":{"promptTokens":42,"completionTokens":11,"totalTokens":53}}}
── Step 5: Agent publishes response to gateway via bus ──
module-agent → KERNEL: {"type":"publish","topic":"outbound:ws","payload":{"type":"stream:final","agentId":"support-agent","sessionId":"sess-001","channelId":"ws-conn-7","content":"Hi there! How can I help you today?"}}
KERNEL → module-gateway: {"type":"deliver","id":"msg-c3d4","topic":"outbound:ws","payload":{"type":"stream:final","agentId":"support-agent","sessionId":"sess-001","channelId":"ws-conn-7","content":"Hi there! How can I help you today?"}}
── Step 6: Agent persists transcript via RPC ──
module-agent → KERNEL: {"type":"rpc_request","id":"rpc-011","service":"transcripts","method":"append","params":{"sessionId":"sess-001","messages":[{"role":"user","content":"Hello"},{"role":"assistant","content":"Hi there! How can I help you today?"}]}}
KERNEL → module-memory: {"type":"rpc_request","id":"rpc-011","from":"module-agent","service":"transcripts","method":"append","params":{"sessionId":"sess-001","messages":[{"role":"user","content":"Hello"},{"role":"assistant","content":"Hi there! How can I help you today?"}]}}
module-memory → KERNEL: {"type":"rpc_response","id":"rpc-011","result":{"ok":true}}
KERNEL → module-agent: {"type":"rpc_response","id":"rpc-011","result":{"ok":true}}
── Step 7: Agent emits turn:end ──
module-agent → KERNEL: {"type":"publish","topic":"agent:turn:end","payload":{"agentId":"support-agent","sessionId":"sess-001","runId":"run-abc","usage":{"promptTokens":42,"completionTokens":11,"totalTokens":53},"durationMs":1250}}