Pons Kernel — Specification

Version: 1.0 Status: Active

1. Purpose

The Pons Kernel is a microkernel — the single orchestration point of the system. Its responsibility is strictly limited to five things:

Message Bus — pub/sub between modules (in-memory, fire-and-forget)
Lifecycle Manager — spawn, kill, restart, hot-swap of module processes
RPC Routing — request/response routing between modules with timeout enforcement
Service Directory — dynamic resolution of who provides what, circular dependency detection
Configuration — layered config with schema validation, hot-reload, scoped per module

Everything else lives in modules. The kernel contains no business logic.

2. Boot Sequence

This is the complete step-by-step sequence from pons kernel start to "system ready".

Phase 1 — Discovery
  1. Read kernel manifest (version, metadata)
  2. Scan <home>/.pons/modules/ for module directories
  3. For each directory:
     a. Read module.json
     b. Validate: id, name, permissions block present
     c. Verify manifest hash against stored hash (tamper detection)
     d. Resolve entry point path (must exist within module dir)
     e. Skip invalid modules with logged reason
  4. Result: list of DiscoveredModules

Phase 2 — Configuration
  5. For each discovered module with a configSchema:
     a. Import/load the schema file
     b. Security check: schema path must be within module directory
  6. Merge all module schemas + kernel schema → unified AppSchema
  7. Load <home>/.pons/config.yaml
  8. Validate config against AppSchema
  9. Fill missing values with schema defaults
  10. If config is invalid: log warnings, continue with best-effort values

Phase 3 — Module Spawn
  11. Sort modules by priority (lower = first)
  12. For each module:
      a. Read approved permissions from PermissionStore
      b. Read approved capabilities from PermissionStore
      c. Construct spawn command based on runtime (see Section 21)
      d. Spawn child process
      e. Send { type: "init", protocolVersion, config, workspacePath, projectRoot }
      f. Wait for { type: "ready" } (max 30s)
      g. On ready: validate manifest, register services, check circular deps
      h. If required services available: send { type: "deps_ready" }, start health checks
      i. If required services missing: hold in "waiting" state
  13. Cascade: as services become available, activate waiting modules

Phase 4 — Running
  14. Register signal handlers (SIGINT, SIGTERM, SIGUSR1, SIGUSR2, SIGHUP)
  15. Write PID to <home>/.pons/.runtime/kernel.pid
  16. Emit kernel.boot log event
  17. System is ready — message bus active, RPC routing active

3. Architecture

┌─────────────────────────────────────────────────────────────┐
│                         Kernel                              │
│                                                             │
│  ┌────────────┐  ┌──────────────┐  ┌────────────────────┐  │
│  │ MessageBus │  │  Lifecycle   │  │  ServiceDirectory  │  │
│  │ (pub/sub)  │  │  Manager     │  │  (registry)        │  │
│  └────────────┘  └──────────────┘  └────────────────────┘  │
│  ┌────────────┐  ┌──────────────┐  ┌────────────────────┐  │
│  │  Config    │  │  Permission  │  │  Security          │  │
│  │  Manager   │  │  Store       │  │  Enforcer          │  │
│  └────────────┘  └──────────────┘  └────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
         │ IPC (newline-delimited JSON over stdin/stdout)
    ┌────┴────┬─────────┬──────────┬───────────┐
    │         │         │          │           │
 module-   module-  module-   module-    module-...
 agent      llm     gateway   memory     sandbox
 (process) (process)(process) (process) (process)

Core principle: modules never import each other. All communication flows through the kernel over IPC.

4. IPC Protocol

Transport

Newline-delimited JSON over stdin/stdout of each child process. One JSON object per line, terminated by \n. No binary framing, no external message broker required.

Kernel → Module messages

type	Fields	Description
`init`	`protocolVersion`, `config`, `workspacePath`, `projectRoot`	First message after spawn. Delivers protocol version and initial config.
`install`	—	Signals first-ever launch. Module should declare any permission requests.
`deps_ready`	—	All required services are available. Module may begin full operation.
`shutdown`	—	Graceful shutdown request. Module should clean up and exit.
`ping`	—	Health check. Module must reply with `pong` within the timeout.
`deliver`	`id`, `topic`, `payload`	Pub/sub message delivery.
`config:update`	`config`, `changedSections`	Config hot-reload. Contains only the module's own section.
`call`	`id`, `method`, `params`	Kernel calling a method on the module.
`rpc_request`	`id`, `from`, `service`, `method`, `params`	Proxied RPC request from another module.
`rpc_response`	`id`, `result?`, `error?`	Response to a previously sent RPC request.
`service_available`	`service`	An optional dependency just became available.

Module → Kernel messages

type	Fields	Description
`ready`	`manifest`, `capabilities?`	Module has initialized and is ready. Sends its parsed manifest and optionally its capabilities for kernel validation.
`log`	`level`, `msg`, `data?`, `topic?`	Structured log entry to be aggregated by the kernel. Optional `topic` enables log grouping (e.g. `agent:loop`).
`log-group`	`level`, `msg`, `items`	Grouped log entries (e.g. a summary with sub-items).
`publish`	`topic`, `payload`	Publish a message to the bus.
`call`	`id`, `method`, `params`	Call a method on the kernel (config.get, module.list, etc.).
`call:response`	`id`, `result?`, `error?`	Response to a kernel call.
`pong`	—	Reply to a `ping` health check.
`rpc_request`	`id`, `service`, `method`, `params`	Initiate an RPC call to another module.
`rpc_response`	`id`, `result?`, `error?`	Response to a proxied RPC request.
`ack`	`id`	Acknowledge successful processing of a `deliver` message. `id` matches the delivered message.
`nack`	`id`, `error`	Reject a `deliver` message. `error` is a human-readable string describing the failure. Informational only — kernel does not retry.

5. RPC Flow

Caller Module              Kernel                 Target Module
     │                       │                         │
     ├──rpc_request(id)─────>│                         │
     │  service, method,      │                         │
     │  params                │                         │
     │                        ├─ capability check       │
     │                        ├─rpc_request(id)────────>│
     │                        │  from, method, params   │
     │                        │                         │ (processing)
     │                        │<──rpc_response(id)──────│
     │<─rpc_response(id)──────│  result / error         │
     │  result / error        │                         │

Timeout: 30 seconds. On expiry — error response is sent back to the caller.

6. Pub/Sub Flow

Publisher              Kernel                    Subscribers
    │                   │                        │    │    │
    ├─publish(topic)───>│                        │    │    │
    │  payload           ├─deliver(id, topic)───>│    │    │
    │                   │  payload               │    │    │
    │                   ├─deliver(id, topic)──────────>│    │
    │                   ├─deliver(id, topic)──────────────>│

Fire-and-forget. No persistence, no retry. The ack/nack messages defined in the IPC protocol (Section 4) exist for module-to-module application-level tracking — the kernel forwards them but does not use them for delivery guarantees or retry logic. Stronger guarantees (at-least-once, persistence, replay) are the responsibility of modules.

Module Lifecycle Events

The kernel publishes lifecycle events to the message bus whenever a module changes state. This allows modules to react to system topology changes without polling.

Topic	Payload	Published when
`system:module:ready`	`{ moduleId, provides, version }`	Module activated and health checks started
`system:module:stopped`	`{ moduleId, reason }`	Module intentionally stopped (shutdown, kill)
`system:module:crashed`	`{ moduleId, exitCode, restartCount }`	Module exited unexpectedly, entering restart flow
`system:module:stopping`	`{ moduleId, tier }`	Module is about to receive shutdown signal (during ordered drain)

Rules:

These are published by the kernel itself, not by modules
They follow the same delivery semantics as all pub/sub — at-most-once, fire-and-forget
Modules must declare system:module:* topics in their capabilities.topics to receive them
The kernel publishes these after the state transition is complete (e.g. system:module:ready is sent after the module's services are registered, not before)
During shutdown, system:module:stopping is sent before the shutdown message to the target module, giving other modules a chance to stop sending work to it

Use cases:

Gateway can show real-time system status to connected clients
Monitoring modules can track uptime and crash frequency
Modules with optional dependencies can react to services appearing/disappearing beyond the built-in service_available mechanism

7. Module Lifecycle

State Machine

Every module has exactly one status at any given time. Transitions are triggered by events from the module process, the kernel, or external signals.

                          spawn()
                            │
                            ▼
                      ┌──────────┐
                      │ starting │
                      └────┬─────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
         ready msg    timeout 30s    crash/exit
              │            │            │
              ▼            ▼            ▼
        ┌──────────┐  ┌────────┐  ┌─────────────┐
        │ waiting  │  │ killed │  │ restarting  │◄──┐
        │ (deps)   │  └────────┘  └──────┬──────┘   │
        └────┬─────┘                     │           │
             │                    backoff delay      │
        deps ready                       │           │
             │                      spawn()          │
             ▼                           │           │
       ┌──────────┐                      └───────────┘
       │  ready   │                  (max 5 attempts)
       └────┬─────┘                         │
            │                               ▼
       ┌────┼────────┐              ┌──────────────┐
       │    │        │              │   crashed    │
   crash  kill()  shutdown          │ (terminal)   │
       │    │        │              └──────────────┘
       ▼    ▼        ▼
  restarting stopped stopped

States:

State	Description
`starting`	Process spawned, waiting for `ready` message (max 30s)
`waiting`	Module sent `ready` but required services are not yet available (max 30s)
`ready`	Module is fully operational. Health checks are active.
`restarting`	Module exited unexpectedly. Kernel is waiting for backoff delay before re-spawning.
`stopped`	Module was intentionally stopped (shutdown or kill).
`crashed`	Module exceeded max restart attempts. Terminal state — requires manual intervention.
`killed`	Module was killed due to a violation or timeout. May lead to restart or stopped.

Transition rules:

Only ready modules receive deliver, rpc_request, and config:update messages
A module in waiting only receives service_available notifications
A module in crashed state can only be restarted manually via pons module restart <id>
killed is a transient state — it transitions to restarting (if attempts remain) or crashed

The full activation sequence is described in Section 2 (Boot Sequence, Phase 3).

Health Checks

Every 30 seconds: kernel sends ping
Module must reply with pong within 10 seconds
After 3 consecutive failures: kill process → restart logic
A single successful pong resets the failure counter

Crash & Restart

Exponential backoff: 1s → 2s → 4s → 8s → 16s → 32s → 60s (max)
Maximum 5 attempts
If module lived less than 1 second → likely an entry point error (logged as warning)
After 5 failed attempts → status: crashed, no further restarts
Restart counter resets if module stays alive for more than 60 seconds

Hot-Swap

Hot-swap replaces a running module with a new version without restarting the kernel or other modules.

1. CLI or operator triggers hot-swap for module X
2. Kernel sends { type: "shutdown" } to module X
3. Wait up to 5s for graceful exit
4. Kill if still running
5. Unregister module X's services from the directory
6. Load new manifest from disk (re-read module.json)
7. Validate new manifest (hash, permissions, entry point)
8. Spawn new process with updated permissions
9. Wait for ready → re-register services
10. Notify modules that had optional dependency on X's services

Constraints:

Hot-swap does not change the module ID — it's the same logical module with new code
If the new version fails to start, the old version is NOT restored (module enters restart/crash flow)
In-flight RPC calls to the old module will timeout and return errors to callers

Graceful Shutdown (Ordered Drain)

The kernel shuts down modules in reverse dependency order to ensure clean drain of in-flight work. Modules that accept external traffic (e.g. gateway) stop first, allowing downstream modules (e.g. agent, LLM) to finish processing before they are stopped.

Phase 1 — Compute shutdown order
  1. Build dependency graph from all running modules' `requires` and `provides`
  2. Topological sort: modules with no dependents (leaves) shut down first
  3. Group into tiers — modules in the same tier can shut down in parallel

Phase 2 — Tier-by-tier shutdown
  For each tier (leaves first → roots last):
    1. Kernel publishes { topic: "system:module:stopping", payload: { moduleId, tier } }
       to the bus (allows other modules to stop sending work to this module)
    2. Kernel → modules in tier: { type: "shutdown" }
    3. Wait up to 5 seconds for voluntary exit
    4. Kill remaining processes in this tier forcefully
    5. Proceed to next tier

Phase 3 — Cleanup
  1. Close message bus
  2. Remove kernel PID file
  3. Close log file handle
  4. Exit kernel process

Example shutdown order:

Tier 1 (leaves — no one depends on them):  module-gateway
Tier 2 (depended on by gateway only):      module-agent, module-sandbox
Tier 3 (depended on by agent):             module-llm, module-memory

Gateway stops accepting connections first. Agent finishes in-flight turns. LLM and memory stop last after all consumers are gone.

Fallback: if the dependency graph is empty or cannot be computed (e.g. no requires declared), the kernel falls back to sending shutdown to all modules simultaneously (v1 behavior).

8. Service Directory

Each module declares in its manifest:

{
  "provides": ["service-name"],
  "requires": ["other-service"],
  "optionalRequires": ["nice-to-have-service"]
}

Rules:

Each service name may only be provided by one module (duplicates are rejected)
requires — module will not activate until all required services are available (timeout: 30s)
optionalRequires — graceful degradation; module activates even without them, and receives a service_available notification if they come online later
Circular dependency detection runs on every activation (DFS graph traversal); a detected cycle kills the module immediately

Topic Subscription

Topics are the pub/sub channel names used for fire-and-forget messaging. Subscription is declared statically in the module manifest via capabilities.topics:

{
  "capabilities": {
    "topics": ["inbound:message", "agent:turn:end"]
  }
}

When a module sends { type: "ready" }, the kernel reads the capabilities.topics list and registers the module as a subscriber for those topics in the MessageBus.

Rules:

A module can only publish to topics listed in its capabilities.topics — enforced by the SecurityEnforcer
A module can only receive deliver messages for topics it declared
There is no dynamic subscribe/unsubscribe at runtime in v1. Changing topic subscriptions requires a manifest update and module restart (or hot-swap).
Topic names are freeform strings. Convention: domain:event (e.g. agent:turn:start, outbound:ws)

9. Kernel Call API (Module → Kernel)

A module calls the kernel by sending a call message. The kernel responds with call:response.

method	Params	Returns	Description
`config.get`	`{ key }`	value	Read a value from the module's own config section. Key is a dot-separated path (e.g. `"agents.defaultModel"`).
`config.set`	`{ key, value }`	`{ ok: true }`	Write a value to the module's own config section. Validates against schema. Persists to config.yaml atomically. Notifies affected modules via `config:update`.
`config.sections`	—	list of strings	List available config section names (filtered to the module's own section only).
`module.list`	—	list of `{ id, status, provides }`	Get all modules with their current status and provided services.
`module.commands`	—	list of CommandDeclarations	Get all CLI command declarations from all module manifests.
`service.discover`	—	list of `{ service, moduleId }`	List all registered services and their provider modules.
`service.resolve`	`{ service }`	`moduleId`	Resolve which module provides a specific service. Returns error if service not found.
`permissions.request`	`{ permissions, reason? }`	`{ granted, pending?, denied?, requestId? }`	Request additional runtime permissions from the user. If approval is required, returns `pending: true` with a `requestId` for tracking.
`permissions.check`	`{ permissions }`	`{ granted, missing }`	Check which of the requested permissions are currently granted and which are missing.

Config scoping: a module can only read/write its own config section (identified by configKey in the manifest). Path traversal patterns (.., __proto__, constructor, prototype) are rejected. Violations are rejected and logged as security events.

10. Configuration System

Config file: `<home>/.pons/config.yaml`

logging:
  level: info            # trace | debug | info | warn | error | fatal
  levels:
    module-agent: debug  # per-module level override

# Module-specific sections below (each module owns its key)
models:
  providers:
    - id: openai
      apiKey: ${OPENAI_API_KEY}

How it works

Each module declares a config schema (using any schema validation library available in the implementation language)
On boot, the kernel discovers and imports all module schemas, merges them into an AppSchema
Loads config.yaml, validates against AppSchema, fills in defaults
On hot-reload signal: compares against previous config, sends config:update only to affected modules
Each module receives only its own section — other sections are never forwarded

Schema file security

Before loading a schema file, the kernel performs security checks to prevent malicious schema files from executing arbitrary code:

Path containment — the schema file path must resolve to a location within the module directory (after symlink resolution). Paths escaping the module dir are rejected.
Extension whitelist — only .ts, .js, or .json extensions are accepted
Static pattern scan — the kernel scans the schema file source for forbidden patterns before importing it:
- Process spawning (e.g. exec, spawn, Command)
- Network access (e.g. listen, connect, fetch, WebSocket)
- Dynamic code execution (e.g. eval, dynamic import())
- File system mutations (e.g. remove, unlink)
- If any pattern is found, the schema is skipped with a warning — the module still loads but without schema validation
- The exact pattern list is implementation-specific; each runtime adds its own dangerous APIs to the scan

This is defense-in-depth: the runtime sandbox already restricts module permissions, but schema files may be imported into the kernel process itself, so additional guards are necessary.

Module config declaration (in manifest)

{
  "configKey": "models",
  "configSchema": "./src/config.schema"
}

11. Security Model

Principle: Fail-Closed, Defense-in-Depth

Layers:

Process Sandbox — each module runs as a separate process with the minimum required OS permissions; the runtime should enforce permission boundaries (e.g. sandboxed runtimes, seccomp, containers)
Manifest Hash — a cryptographic hash (e.g. SHA-256) of the module manifest is stored at install time and verified on every load (tamper detection)
Permission Store — permissions are stored in <home>/.pons/permissions.yaml and must be explicitly approved by the user at install time
Runtime Enforcer — every RPC call, pub/sub publish/subscribe, and config access is checked against declared capabilities before being forwarded
Static Audit — source scan at install time for patterns that could bypass sandbox restrictions (see below)

Permission Types (in manifest → `permissions`)

{
  "permissions": {
    "net": ["api.openai.com"],
    "read": ["~/.pons/", "./workspace/"],
    "write": ["~/.pons/data/"],
    "env": ["OPENAI_API_KEY", "HOME"],
    "run": ["git"],
    "sys": ["hostname"]
  }
}

These are translated into the appropriate sandbox restrictions for the target runtime (e.g. OS-level flags, container policies, seccomp profiles).

Default when no permissions declared: deny all.

Capabilities (RPC and topic access)

{
  "capabilities": {
    "services": ["llm", "memory"],
    "topics": ["agent.task", "agent.result"]
  }
}

A module may only call services listed in its capabilities.services
A module may only publish/subscribe to topics listed in its capabilities.topics
Capabilities are stored in the Permission Store at install approval time — not self-asserted by the module at runtime
On ready, the kernel loads capabilities from the store (falling back to manifest if no store entry exists for backward compatibility)
Violation → log the event + kill the module

Permission Store (`permissions.yaml`)

The permission store persists approved permissions and capabilities across kernel restarts.

modules:
  module-agent:
    manifestHash: "sha256:a1b2c3..."       # SHA-256 of module.json at approval time
    firstSpawn: "2026-01-15T10:30:00Z"      # Timestamp of first install
    permissions:                             # Approved OS-level permissions
      net: ["api.openai.com"]
      read: ["~/.pons/"]
      write: ["~/.pons/data/"]
      env: ["HOME"]
    capabilities:                            # Approved IPC-level capabilities
      services: ["llm", "memory"]
      topics: ["inbound:message", "agent:turn:start"]
    dynamicPermissions: {}                   # Runtime-requested permissions (granted)
    pendingRequests: []                      # Queued permission requests (awaiting user)
    deniedRequests: []                       # Denied requests (prevents re-prompting)

Manifest tamper detection flow:

At pons module install — user reviews and approves permissions. SHA-256 hash of module.json is computed and stored.
On every kernel boot — for each module, compute current hash and compare against stored hash.
If mismatch → refuse to load module. Log error: "manifest hash mismatch for module-X — re-install required".
This prevents privilege escalation via silent manifest edits.

Runtime permission request flow:

Module sends call("permissions.request", { permissions, reason })
Kernel queues the request and sends a system notification (macOS AppleScript / Linux notify-send) — best-effort
User approves or denies via CLI: pons permissions grant <requestId> / pons permissions deny <requestId>
On grant: permissions added to dynamicPermissions, module restarted with new effective permissions
On deny: request moved to deniedRequests, module receives { granted: false, denied: true }

Static Audit Scanner

At module install time (pons module install), the kernel scans module source files for patterns that could bypass the runtime sandbox. This is advisory only — it does not prevent installation but displays warnings to the user during the approval flow.

Scanned patterns:

Node.js API imports: node:fs, node:child_process, node:net, node:http, node:https, node:dgram, node:tls
Dynamic require: createRequire
Dynamic imports: import('node:...')

Limitations:

Cannot detect obfuscated or dynamically constructed imports
Does not scan transitive dependencies
Applies primarily to JavaScript/TypeScript modules — other runtimes rely on OS-level sandboxing

This scanner is defense-in-depth: it catches accidental sandbox bypasses. Intentional bypasses by a malicious module require OS-level containment (containers, seccomp, etc.).

12. Reload Signals

The kernel listens for OS signals to trigger live updates without downtime:

Signal	Action
`SIGINT` / `SIGTERM`	Graceful shutdown
`SIGUSR1`	Config hot-reload (re-read config file)
`SIGUSR2`	Permission hot-reload (re-read permissions file)
`SIGHUP`	Module discovery + hot-load newly installed modules

The CLI sends these signals after writing updated files.

Note for non-Unix implementations: on platforms without POSIX signals, equivalent mechanisms (e.g. named pipes, admin HTTP endpoints, file watchers) should be used to trigger the same behaviors. See Section 19 (Known Limitations) for details.

13. Logging

The kernel aggregates its own logs together with logs forwarded from modules via IPC (log and log-group messages).

Log levels: trace, debug, info, warn, error, fatal

Log output:

Development: human-readable colorized format to stdout + daily rotating file
Production: structured JSON to stdout

Log file path: <home>/.pons/.runtime/logs/kernel-YYYY-MM-DD.log

Each log entry includes at minimum: level, timestamp, module (source), msg.

14. Error Handling

Every failure in the kernel has a defined response. This section is the single source of truth for what happens when things go wrong.

Module failures

Failure	Kernel response
Module crashes (exit code ≠ 0)	Log error with last stderr line. Restart with exponential backoff (see Section 7).
Module does not send `ready` within 30s	Kill process. Treat as crash → restart logic.
Module fails health check (no `pong` in 10s)	Retry up to 3 times (one per health interval). After 3 consecutive failures → kill process → restart logic.
Module sends unknown message type	Log warning with module ID and message type. Drop message. Module is not killed.
Module sends malformed JSON on stdout	Log warning. Treat line as plain text (stderr-like). Module is not killed but may time out if `ready` was never sent.
Module writes binary data on stdout	Same as malformed JSON — logged, dropped, not fatal.
Module sends message after being killed	Ignore. Process stdout/stderr may still be buffered; kernel discards messages from modules not in `ready` or `starting` state.
Two modules declare same service	Second module is rejected at registration. Log error. Module is killed with reason `duplicate-service`.
Circular dependency detected	All modules in the cycle are killed with reason `circular-dependency`. Cycle path is logged.
Module exceeds max restart attempts (5)	Module status set to `crashed`. No further restarts. Log error. Operator must intervene (`pons module restart <id>`).

IPC failures

Failure	Kernel response
Write to module stdin fails	Log warning. Message is lost. No retry. Module may be dead — wait for exit event.
Module stdout closes unexpectedly	Treat as crash → call `onShutdown()` if possible, then restart logic.
Very large JSON message (>1 MB)	No built-in limit in v1. Implementations should consider adding a configurable max message size.
Partial JSON (buffer split mid-line)	The newline-delimited protocol guarantees complete lines. Partial reads are buffered until `\n` is received.

RPC failures

Failure	Kernel response
RPC target service not found	Immediate error response to caller: `{ error: "service_not_found" }`
RPC target module not ready	Immediate error response to caller: `{ error: "module_not_ready" }`
RPC timeout (30s)	Error response to caller: `{ error: "timeout" }`. Late responses from target are silently dropped.
RPC caller capability violation	Error response to caller: `{ error: "forbidden" }`. Log security violation.

Config failures

Failure	Kernel response
`config.yaml` missing at boot	Use schema defaults for all sections. Log warning.
`config.yaml` is invalid YAML	Reject load. Keep previous config in memory. Log error with parse details.
`config.yaml` fails schema validation	Log validation errors per field. Fill invalid fields with defaults. Load succeeds with warnings.
`config.yaml` deleted while running	No effect until next hot-reload. On hot-reload: treat as missing → use defaults.
Hot-reload fails mid-save	Keep previous config. Log error. No modules are notified.
Module requests config outside its scope	Reject call with error. Log security violation.

Security violations

Violation	Kernel response
Module calls undeclared service	Reject RPC. Log violation with caller ID, target service. Kill module.
Module publishes to undeclared topic	Reject publish. Log violation. Kill module.
Module manifest hash mismatch	Refuse to load module. Log error with module ID.

15. Limits & Defaults

All timeouts, limits, and defaults in one place. These values should be configurable where noted.

Timeouts

Parameter	Default	Configurable	Description
RPC timeout	30s	Yes (per-call)	Max time to wait for an RPC response
Health check interval	30s	Yes (kernel config)	Time between ping messages
Health check response timeout	10s	Yes (kernel config)	Max time to wait for pong reply
Health check max failures	3	Yes (kernel config)	Consecutive failures before kill
Module ready timeout	30s	Yes (kernel config)	Max time to wait for `ready` after spawn
Dependency wait timeout	30s	Yes (kernel config)	Max time to wait for required services
Graceful shutdown timeout	5s	Yes (kernel config)	Time to wait for modules to exit voluntarily

Limits

Parameter	Default	Configurable	Description
Max restart attempts	5	Yes (kernel config)	After this, module is marked `crashed`
Max restart backoff	60s	Yes (kernel config)	Upper bound on exponential backoff delay
IPC write queue depth	512	No (v1)	If a module's stdin write queue reaches this limit, the module is marked as disconnected and all further messages to it are dropped. This prevents memory exhaustion when a module cannot keep up.
Max IPC message size	No limit (v1)	Planned (v2)	Recommended: implementations should cap at 10 MB
Max concurrent modules	No limit (v1)	Planned (v2)	Limited by OS process capacity
Max IPC string field length	256 chars	No (v1)	Validated on incoming messages. Fields exceeding this limit cause message rejection.

Defaults

Parameter	Default	Description
Logging level	`info`	Kernel-wide default
Config file	`<home>/.pons/config.yaml`	Main config location
Permissions file	`<home>/.pons/permissions.yaml`	Approved permissions location
Module directory	`<home>/.pons/modules/`	Where modules are installed
PID file	`<home>/.pons/.runtime/kernel.pid`	Kernel process ID
Default runtime	`deno`	When manifest omits `runtime` field

16. Message Delivery Guarantees

The kernel provides at-most-once delivery for all message types. This is a deliberate design choice — the kernel is a router, not a broker.

What is guaranteed

Messages are delivered in the order they are sent from a single publisher to a single subscriber (FIFO per-pair)
RPC responses are matched to their requests by ID
A module only receives deliver messages for topics it has subscribed to
A module only receives rpc_request messages for services it provides

What is NOT guaranteed

If a module's IPC write queue exceeds 512 messages, the module is marked disconnected and messages are dropped (see Section 15)
If a module crashes between receiving a message and processing it, the message is lost
Pub/sub has no acknowledgment — the kernel does not know if a subscriber processed the message
RPC responses that arrive after timeout are silently discarded
Message ordering across multiple publishers is not guaranteed

Implications for module developers

Modules that require stronger guarantees must implement them at the application level: idempotency keys for RPC calls, acknowledgment protocols on top of pub/sub, persistent queues for critical events.

17. Protocol Versioning

Kernel ↔ Module compatibility

The kernel and modules negotiate protocol compatibility via the manifest and the init message.

The init message includes a protocolVersion field:

{ "type": "init", "protocolVersion": "1.0", "config": {}, "workspacePath": "...", "projectRoot": "..." }

The module's manifest includes a minProtocolVersion field (optional, defaults to "1.0"):

{
  "id": "module-agent",
  "minProtocolVersion": "1.0"
}

Version matching rules:

Major version must match exactly (kernel v1.x only loads modules requiring v1.x)
Minor version: kernel must be ≥ module's minimum (kernel v1.3 can run module requiring v1.0)
If incompatible: kernel refuses to load module, logs error with both versions

Backward compatibility contract

New message types may be added in minor versions — modules must ignore unknown types gracefully
Existing message fields are never removed or renamed within a major version
New optional fields may be added to existing messages in minor versions

18. Observability

Kernel health

The CLI provides pons status which reports:

Kernel process: running / stopped (PID, uptime)
Per-module status: ready, starting, crashed, stopped
Per-module last health check: timestamp, pass/fail
Service directory: which services are registered and by whom

Structured log events

Beyond regular log messages, the kernel emits structured events for key lifecycle moments:

Event	Logged when	Key fields
`kernel.boot`	Kernel starts	version, moduleCount, configPath
`kernel.shutdown`	Kernel stops	reason, uptime
`module.spawn`	Module process started	moduleId, pid, runtime
`module.ready`	Module sent ready	moduleId, startupMs
`module.crash`	Module exited unexpectedly	moduleId, exitCode, stderr, restartCount
`module.killed`	Kernel killed a module	moduleId, reason
`rpc.timeout`	RPC call exceeded timeout	callerId, targetService, method, timeoutMs
`security.violation`	Capability/permission check failed	moduleId, action, target
`config.reload`	Config hot-reload completed	changedSections, affectedModules

All events are written to the kernel log. In production mode (JSON output), they can be ingested by external monitoring systems (ELK, Datadog, Grafana Loki, etc.).

19. Known Limitations (v1)

This section documents what v1 explicitly does not support. These are intentional scope decisions, not bugs.

No high availability. The kernel is a single process, single host. If it dies, all modules die. Use an external process supervisor (systemd, launchd) for automatic restart.
No clustering. Modules cannot span multiple machines. All modules must run on the same host as the kernel.
No message persistence. Pub/sub is in-memory only. Messages are lost on crash. Modules that need durability must implement their own persistence.
Limited backpressure. If a module's IPC write queue exceeds 512 messages, the module is marked disconnected and messages are dropped. The kernel does not slow down publishers — it drops messages to the slow consumer.
No module-level resource limits. The kernel does not enforce CPU/memory quotas per module. This is delegated to OS-level mechanisms (cgroups, containers).
No horizontal scaling. A single kernel can only be as powerful as the host it runs on.
No built-in authentication for IPC. Modules are trusted once approved at install time. There is no per-message signing or encryption on the stdin/stdout channel.
No Windows signal support. SIGUSR1/SIGUSR2/SIGHUP are Unix-only. Windows implementations must use alternative mechanisms (see Section 12).

20. File System Layout

<home>/.pons/                     # or $PONS_HOME
├── modules/
│   └── {module-id}/              # Module directory
│       ├── module.json           # Manifest
│       └── src/                  # Source code
├── config.yaml                   # Global config
├── permissions.yaml              # Approved permissions
└── .runtime/
    ├── kernel.pid                # Kernel PID file
    └── logs/
        └── kernel-YYYY-MM-DD.log

21. Multi-Runtime Module Support

Modules can be written in any programming language. The kernel does not assume a specific runtime — it reads the runtime field from the module manifest and spawns the appropriate process.

Supported runtimes

runtime	Spawn command	Sandbox mechanism
`deno`	`deno run [permission-flags] <entry>`	Deno permission flags (--allow-net, --allow-read, etc.)
`node`	`node <entry>`	Process-level restrictions (future: policy file)
`bun`	`bun run <entry>`	Process-level restrictions
`go`	`./<entry>` (pre-compiled binary)	OS-level (seccomp, containers)
`rust`	`./<entry>` (pre-compiled binary)	OS-level (seccomp, containers)
`python`	`python <entry>`	OS-level (seccomp, containers)
`php`	`php <entry>`	OS-level (seccomp, containers)
`binary`	`./<entry>` (any executable)	OS-level (seccomp, containers)

If runtime is omitted, the kernel defaults to deno for backward compatibility.

How it works

Kernel reads runtime and entry from module.json
Constructs the spawn command based on the runtime table above
For deno runtime: translates permissions to Deno CLI flags
For all other runtimes: spawns the process directly; sandbox enforcement is delegated to OS-level mechanisms
Regardless of runtime, IPC is identical: newline-delimited JSON over stdin/stdout

SDK per language

Each supported language needs a thin SDK (~200–400 lines) that implements the IPC protocol. See the SDK Specification (docs/specs/sdk.md) for the full contract.

Config schema for non-TypeScript modules

Modules written in languages other than TypeScript cannot export a Zod schema. Instead, they declare their config schema as a JSON Schema file:

{
  "configKey": "my-module",
  "configSchema": "./config.schema.json"
}

The kernel accepts both formats: if the schema file extension is .json, it is loaded as JSON Schema. If .ts or .js, it is imported as a Zod schema. Both are used for validation equally.

22. Module Manifest (module.json)

{
  "id": "module-agent",
  "name": "Agent Module",
  "version": "1.0.0",
  "runtime": "deno",
  "entry": "src/runner.ts",
  "priority": 10,
  "provides": ["agent"],
  "requires": ["llm", "memory"],
  "optionalRequires": ["sandbox"],
  "configKey": "agents",
  "configSchema": "./src/config.schema.ts",
  "permissions": {
    "net": [],
    "read": ["~/.pons/"],
    "write": ["~/.pons/data/"],
    "env": ["HOME"],
    "run": [],
    "sys": []
  },
  "capabilities": {
    "services": ["llm", "memory"],
    "topics": ["agent.task", "agent.result", "agent.status"]
  }
}

Fields:

id — unique identifier (slug, lowercase, hyphens)
name — human-readable name
version — semver string
runtime — execution runtime (see Section 21). Defaults to deno if omitted.
entry — path to the entry point, relative to the module directory
priority — spawn order (lower = earlier)
provides / requires / optionalRequires — service graph declarations
configKey — top-level key in config.yaml that this module owns
configSchema — path to the config schema definition (.ts/.js for Zod, .json for JSON Schema)
permissions — OS-level access requests (approved at install time)
capabilities — IPC-level access declarations (enforced at runtime)

23. What the Kernel Does NOT Do

Does not store or queue messages (no persistence, no replay)
Does not contain any business logic
Does not integrate with LLMs
Does not manage user workspace or files
Does not expose an HTTP API (that is the responsibility of a gateway module)
Does not implement skills or agents — those live in modules

24. CLI — Service Registration

The CLI includes a service subcommand for registering the kernel as a system service with automatic restart on failure and autostart on boot.

Commands: pons service install, uninstall, start, stop, status, logs

Requirements:

Auto-detect host platform: systemd (Linux), launchd (macOS), Task Scheduler (Windows)
User-level install by default (no root required); system-level install optional with privilege elevation
Service must restart kernel on failure (restart delay: 5s)
uninstall reverses all registration steps without removing Pons data or config
Platform-specific service file formats and commands are implementation details of the CLI, not the kernel

25. Extension Points

Adding a new kernel call:

Add a handler in the kernel's call dispatcher
Add capability/permission checks in the enforcer if the call accesses sensitive resources
Document it in Section 9 of this specification

Adding error handling for a new failure mode:

Determine the failure category (module, IPC, RPC, config, security)
Add a row to the appropriate table in Section 14
Implement the response in the relevant component

Changing the IPC protocol:

Update the shared type definitions (SDK / shared types package)
Update message handlers in the lifecycle manager
Bump the kernel version and document the change

Adding a new runtime:

Add a row to the runtime table in Section 21
Add the spawn command logic in the lifecycle manager's forkProcess() function
Implement or point to an SDK for that language
Document sandbox mechanism (if any)

Porting the kernel itself to another language: The kernel can be implemented in any language capable of spawning child processes and communicating over stdin/stdout. The IPC protocol (newline-delimited JSON), manifest format (JSON), and config format (YAML) are the only cross-language contracts that must be preserved exactly. Everything else — class names, file structure, libraries — is an implementation detail.

Appendix A. IPC Session Examples

These examples show the exact JSON messages exchanged between the kernel and modules during real scenarios. Use them as a reference when implementing the protocol.

A.1. Module startup — happy path

KERNEL → module-llm:   {"type":"init","protocolVersion":"1.0","config":{"models":{"providers":[{"id":"anthropic","type":"anthropic"}]}},"workspacePath":"/home/user/.pons/workspace","projectRoot":"/home/user/project"}
module-llm → KERNEL:   {"type":"ready","manifest":{"id":"module-llm","name":"LLM Module","version":"1.0.0","provides":["llm"],"requires":[],"configKey":"models"},"capabilities":{"services":[],"topics":["llm:usage"]}}
KERNEL → module-llm:   {"type":"deps_ready"}

A.2. Module startup — waiting for dependencies

KERNEL → module-agent:  {"type":"init","protocolVersion":"1.0","config":{"agents":{"defaultModel":"claude-sonnet-4"}},"workspacePath":"/home/user/.pons/workspace","projectRoot":"/home/user/project"}
module-agent → KERNEL:  {"type":"ready","manifest":{"id":"module-agent","provides":["agent"],"requires":["llm","memory"]},"capabilities":{"services":["llm","memory"],"topics":["inbound:message","agent:turn:start","agent:turn:end"]}}

  (kernel holds module-agent in "waiting" — llm and memory not yet available)
  (module-llm and module-memory start and register their services)

KERNEL → module-agent:  {"type":"deps_ready"}

A.3. Health check

KERNEL → module-llm:   {"type":"ping"}
module-llm → KERNEL:   {"type":"pong"}

A.4. Pub/sub — agent publishes turn start, gateway receives it

module-agent → KERNEL:  {"type":"publish","topic":"agent:turn:start","payload":{"agentId":"support-agent","sessionId":"sess-001","runId":"run-abc"}}

  (kernel looks up subscribers of "agent:turn:start" — finds module-gateway)

KERNEL → module-gateway: {"type":"deliver","id":"msg-7f3a","topic":"agent:turn:start","payload":{"agentId":"support-agent","sessionId":"sess-001","runId":"run-abc"}}

A.5. RPC — agent calls LLM to generate text

module-agent → KERNEL:  {"type":"rpc_request","id":"rpc-001","service":"llm","method":"generateText","params":{"provider":"anthropic","model":"claude-sonnet-4","system":"You are a helpful assistant.","messages":[{"role":"user","content":"Hello"}]}}

  (kernel checks agent's capabilities — "llm" is declared — forwards to module-llm)

KERNEL → module-llm:   {"type":"rpc_request","id":"rpc-001","from":"module-agent","service":"llm","method":"generateText","params":{"provider":"anthropic","model":"claude-sonnet-4","system":"You are a helpful assistant.","messages":[{"role":"user","content":"Hello"}]}}

  (module-llm processes the request, calls the LLM API)

module-llm → KERNEL:   {"type":"rpc_response","id":"rpc-001","result":{"content":"Hello! How can I help you today?","usage":{"promptTokens":15,"completionTokens":9,"totalTokens":24}}}

  (kernel forwards response back to caller)

KERNEL → module-agent: {"type":"rpc_response","id":"rpc-001","result":{"content":"Hello! How can I help you today?","usage":{"promptTokens":15,"completionTokens":9,"totalTokens":24}}}

A.6. RPC — capability violation

module-sandbox → KERNEL: {"type":"rpc_request","id":"rpc-002","service":"memory","method":"store","params":{"content":"secret data"}}

  (kernel checks sandbox's capabilities — "memory" is NOT declared)

KERNEL → module-sandbox: {"type":"rpc_response","id":"rpc-002","error":"forbidden"}

  (kernel logs security violation, kills module-sandbox)

A.7. Kernel call — module reads its own config

module-agent → KERNEL:  {"type":"call","id":"call-001","method":"config.get","params":{"key":"agents.defaultModel"}}
KERNEL → module-agent:  {"type":"call:response","id":"call-001","result":"claude-sonnet-4"}

A.8. Config hot-reload

  (operator runs: pons config set models.providers[0].model claude-opus-4 → CLI writes config.yaml, sends SIGUSR1)

KERNEL → module-llm:   {"type":"config:update","config":{"models":{"providers":[{"id":"anthropic","type":"anthropic","model":"claude-opus-4"}]}},"changedSections":["models"]}

A.9. Graceful shutdown (ordered drain)

  (operator runs: pons kernel stop → sends SIGTERM)
  (kernel computes shutdown order from dependency graph)

  ── Tier 1: module-gateway (leaf — no one depends on it) ──

KERNEL → subscribers of system:module:stopping: {"type":"deliver","id":"sys-001","topic":"system:module:stopping","payload":{"moduleId":"module-gateway","tier":1}}
KERNEL → module-gateway: {"type":"shutdown"}
  (module-gateway stops accepting connections, cleans up, exits within 5s)

  ── Tier 2: module-agent, module-sandbox ──

KERNEL → subscribers of system:module:stopping: {"type":"deliver","id":"sys-002","topic":"system:module:stopping","payload":{"moduleId":"module-agent","tier":2}}
KERNEL → module-agent:   {"type":"shutdown"}
KERNEL → module-sandbox: {"type":"shutdown"}
  (both finish in-flight work, exit within 5s)

  ── Tier 3: module-llm, module-memory (roots) ──

KERNEL → module-llm:    {"type":"shutdown"}
KERNEL → module-memory:  {"type":"shutdown"}
  (both exit)
  (kernel removes PID file, closes logs, exits)

A.10. Full scenario — user sends message to agent

This shows the complete message flow when a user sends "Hello" via the gateway to an agent:

  ── Step 1: Gateway receives HTTP/WS request, publishes to bus ──

module-gateway → KERNEL: {"type":"publish","topic":"inbound:message","payload":{"agentId":"support-agent","senderId":"user-42","channelType":"chat","channelId":"ws-conn-7","content":"Hello"}}

  ── Step 2: Kernel delivers to module-agent (subscribed to inbound:message) ──

KERNEL → module-agent:  {"type":"deliver","id":"msg-a1b2","topic":"inbound:message","payload":{"agentId":"support-agent","senderId":"user-42","channelType":"chat","channelId":"ws-conn-7","content":"Hello"}}

  ── Step 3: Agent assembles context and calls LLM via RPC ──

module-agent → KERNEL:  {"type":"rpc_request","id":"rpc-010","service":"llm","method":"generateText","params":{"provider":"anthropic","model":"claude-sonnet-4","system":"You are a support agent...","messages":[{"role":"user","content":"Hello"}],"tools":[{"name":"remember","description":"Save to memory","parameters":{}}]}}
KERNEL → module-llm:    {"type":"rpc_request","id":"rpc-010","from":"module-agent","service":"llm","method":"generateText","params":{"provider":"anthropic","model":"claude-sonnet-4","system":"You are a support agent...","messages":[{"role":"user","content":"Hello"}],"tools":[{"name":"remember","description":"Save to memory","parameters":{}}]}}

  ── Step 4: LLM responds ──

module-llm → KERNEL:    {"type":"rpc_response","id":"rpc-010","result":{"content":"Hi there! How can I help you today?","usage":{"promptTokens":42,"completionTokens":11,"totalTokens":53}}}
KERNEL → module-agent:  {"type":"rpc_response","id":"rpc-010","result":{"content":"Hi there! How can I help you today?","usage":{"promptTokens":42,"completionTokens":11,"totalTokens":53}}}

  ── Step 5: Agent publishes response to gateway via bus ──

module-agent → KERNEL:  {"type":"publish","topic":"outbound:ws","payload":{"type":"stream:final","agentId":"support-agent","sessionId":"sess-001","channelId":"ws-conn-7","content":"Hi there! How can I help you today?"}}
KERNEL → module-gateway: {"type":"deliver","id":"msg-c3d4","topic":"outbound:ws","payload":{"type":"stream:final","agentId":"support-agent","sessionId":"sess-001","channelId":"ws-conn-7","content":"Hi there! How can I help you today?"}}

  ── Step 6: Agent persists transcript via RPC ──

module-agent → KERNEL:  {"type":"rpc_request","id":"rpc-011","service":"transcripts","method":"append","params":{"sessionId":"sess-001","messages":[{"role":"user","content":"Hello"},{"role":"assistant","content":"Hi there! How can I help you today?"}]}}
KERNEL → module-memory:  {"type":"rpc_request","id":"rpc-011","from":"module-agent","service":"transcripts","method":"append","params":{"sessionId":"sess-001","messages":[{"role":"user","content":"Hello"},{"role":"assistant","content":"Hi there! How can I help you today?"}]}}
module-memory → KERNEL:  {"type":"rpc_response","id":"rpc-011","result":{"ok":true}}
KERNEL → module-agent:   {"type":"rpc_response","id":"rpc-011","result":{"ok":true}}

  ── Step 7: Agent emits turn:end ──

module-agent → KERNEL:  {"type":"publish","topic":"agent:turn:end","payload":{"agentId":"support-agent","sessionId":"sess-001","runId":"run-abc","usage":{"promptTokens":42,"completionTokens":11,"totalTokens":53},"durationMs":1250}}

FilesExpand file tree

SPEC.md

Latest commit

History

SPEC.md

File metadata and controls

Pons Kernel — Specification

1. Purpose

2. Boot Sequence

3. Architecture

4. IPC Protocol

Transport

Kernel → Module messages

Module → Kernel messages

5. RPC Flow

6. Pub/Sub Flow

Module Lifecycle Events

7. Module Lifecycle

State Machine

Health Checks

Crash & Restart

Hot-Swap

Graceful Shutdown (Ordered Drain)

8. Service Directory

Topic Subscription

9. Kernel Call API (Module → Kernel)

10. Configuration System

Config file: <home>/.pons/config.yaml

How it works

Schema file security

Module config declaration (in manifest)

11. Security Model

Principle: Fail-Closed, Defense-in-Depth

Permission Types (in manifest → permissions)

Capabilities (RPC and topic access)

Permission Store (permissions.yaml)

Static Audit Scanner

12. Reload Signals

13. Logging

14. Error Handling

Module failures

IPC failures

RPC failures

Config failures

Security violations

15. Limits & Defaults

Timeouts

Limits

Defaults

16. Message Delivery Guarantees

What is guaranteed

What is NOT guaranteed

Implications for module developers

17. Protocol Versioning

Kernel ↔ Module compatibility

Backward compatibility contract

18. Observability

Kernel health

Structured log events

19. Known Limitations (v1)

20. File System Layout

21. Multi-Runtime Module Support

Supported runtimes

How it works

SDK per language

Config schema for non-TypeScript modules

22. Module Manifest (module.json)

23. What the Kernel Does NOT Do

24. CLI — Service Registration

25. Extension Points

Appendix A. IPC Session Examples

A.1. Module startup — happy path

A.2. Module startup — waiting for dependencies

A.3. Health check

A.4. Pub/sub — agent publishes turn start, gateway receives it

A.5. RPC — agent calls LLM to generate text

A.6. RPC — capability violation

A.7. Kernel call — module reads its own config

A.8. Config hot-reload

A.9. Graceful shutdown (ordered drain)

Config file: `<home>/.pons/config.yaml`

Permission Types (in manifest → `permissions`)

Permission Store (`permissions.yaml`)