-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Problem Statement
With the openshell-ocsf crate built and tested (see #392), we need to wire it into the sandbox supervisor to replace all 93 file-level log sites with typed OCSF events, set up the dual-file output (openshell.log shorthand + openshell-ocsf.log JSONL), and add the SandboxConfig proto mechanism for gateway-controlled OCSF toggle with hot-reload.
Proposed Design
This issue covers all integration work: proto changes, gateway config plumbing, sandbox subscriber wiring, config poll hot-reload, migration of every log site from ad-hoc info!()/warn!() to builder → ocsf_emit!(), OCSF profile enrichment, and E2E verification.
The full design is documented in .opencode/plans/ocsf-log-export.md. Key sections: "SandboxConfig: Gateway → Sandbox Operational Config", "Tracing Layer Integration", "Implementation Plan", "Delivery Plan — Part 2".
Architecture
tracing event
│
▼
OcsfEvent struct (built in openshell-sandbox using openshell-ocsf builders)
│
├──► OcsfShorthandLayer ──► /var/log/openshell.log (always on)
│ (openshell-ocsf) └──► gRPC log push to gateway
│
└──► OcsfJsonlLayer ──► /var/log/openshell-ocsf.log (toggle via SandboxConfig)
(openshell-ocsf) wrapped in reload::Layer for hot-reload
SandboxConfig Proto
message SandboxConfig {
uint64 config_revision = 1;
LoggingConfig logging = 2;
// Future: DiagnosticsConfig, FeatureFlags, BroadcastMessage
}
message LoggingConfig {
bool ocsf_enabled = 1;
// Future: log_level_override, rotation
}Added as optional SandboxConfig sandbox_config = 4 on GetSandboxPolicyResponse. Independent revision tracking from policy version. Designed as a general-purpose gateway → sandbox operational config channel.
Log Site Migration Scope
93 file-level log sites across 18 source files:
| Event Class | Count | Primary Files |
|---|---|---|
| Network Activity [4001] | 19 | proxy.rs, bypass_monitor.rs |
| HTTP Activity [4002] | 7 | proxy.rs, l7/relay.rs |
| SSH Activity [4007] | 10 | ssh.rs |
| Process Activity [1007] | 4 | lib.rs, process.rs |
| Detection Finding [2004] | 9 | ssh.rs, opa.rs, l7/relay.rs, proxy.rs, bypass_monitor.rs |
| Application Lifecycle [6002] | 18 | main.rs, lib.rs, netns.rs |
| Device Config State Change [5019] | 24 | lib.rs |
| Base Event [0] | 20 | netns.rs, mechanistic_mapper.rs, proxy.rs, bypass_monitor.rs |
Dual-emit events: BYPASS_DETECT (Network Activity + Detection Finding), NSSH1 nonce replay (SSH Activity + Detection Finding).
Dependencies
- Blocked by feat(ocsf): create openshell-ocsf crate — standalone OCSF event types, formatters, and tracing layers #392: The
openshell-ocsfcrate must be complete — all types, builders, formatters, and layers exist and are tested.
Order of Battle
Each step depends on prior steps unless noted.
Step 1: Proto changes (~0.5 day)
- Add
SandboxConfigandLoggingConfigmessages toproto/sandbox.proto:message SandboxConfig { uint64 config_revision = 1; LoggingConfig logging = 2; } message LoggingConfig { bool ocsf_enabled = 1; }
- Add
optional SandboxConfig sandbox_config = 4toGetSandboxPolicyResponse - Regenerate Rust proto bindings (
mise run proto:genor equivalent) - Verify proto compilation succeeds and generated Rust types are accessible
- Verify existing proto tests still pass
Done when: Proto compiles. SandboxConfig and LoggingConfig types exist in generated Rust code. GetSandboxPolicyResponse has an optional sandbox_config field. Existing tests pass.
Step 2: Gateway config plumbing (~1-1.5 days)
- In
openshell-server, readocsf_logging_enabledfrom gateway config sources (priority: YAML > CLI flag > env varOPENSHELL_OCSF_LOGGING) - Maintain
config_revision: u64counter in gateway, initialized to1on startup, incremented on any config change - Populate
SandboxConfigin everyGetSandboxPolicyResponsewith currentconfig_revisionandocsf_enabledvalue - When
ocsf_logging_enabledis not configured, default tofalse - Add unit tests: config parsing from each source, revision counter increments, response population
Done when: Gateway populates SandboxConfig in every GetSandboxPolicyResponse. Config revision starts at 1 and increments correctly. Tests verify config source priority and default behavior.
Depends on: Step 1.
Step 3: Sandbox subscriber wiring (~1-1.5 days)
- Add
openshell-ocsfas dependency ofopenshell-sandboxinCargo.toml - In
main.rs, replace existingfmt::Fullfile layer for/var/log/openshell.logwithOcsfShorthandLayer - Set up
OcsfJsonlLayerfor/var/log/openshell-ocsf.log, wrapped intracing_subscriber::reload::Layer - Initialize JSONL layer as
None(disabled) orSome(...)(enabled) based on initialSandboxConfigfrom firstGetSandboxPolicyresponse - Create
SandboxContextfrom sandbox config values (sandbox ID, name, image, hostname, proxy address) and store for use by all log sites - Store
reload::Handlefor JSONL layer for use in config poll loop - Wire subscriber:
registry().with(shorthand_layer).with(jsonl_reload_layer).with(stdout_layer).with(log_push_layer) - Add integration tests: subscriber setup with OCSF on/off, verify shorthand always active, JSONL conditional
Done when: Sandbox starts with new subscriber stack. OcsfShorthandLayer writes to openshell.log. OcsfJsonlLayer writes to openshell-ocsf.log only when enabled. SandboxContext created and accessible. Existing stdout and log push layers remain functional.
Depends on: Steps 1, 2.
Step 4: Config poll integration (~0.5-1 day)
- Add
SandboxConfighandling to existing policy poll loop inlib.rs - Track
current_config_revision: u64in sandbox state (initialized to 0) - On each poll response: if
sandbox_configpresent andconfig_revision > current_config_revision, apply changes - For OCSF toggle: use
jsonl_reload_handle.modify(|layer| ...)to hot-reload- Enable: create JSONL file + non-blocking writer +
Some(OcsfJsonlLayer) - Disable:
None
- Enable: create JSONL file + non-blocking writer +
- Emit
CONFIG:UPDATEDevent (viaConfigStateChangeBuilder) recording toggle change - Handle absent
sandbox_config(older gateway) gracefully: no-op, keep current config - Add unit tests: revision tracking, toggle on/off via reload handle, backward compat with absent config
Done when: Config poll loop correctly tracks revisions and hot-reloads the JSONL layer. Toggling ocsf_enabled from gateway config creates/removes JSONL file at runtime without sandbox restart. CONFIG:UPDATED event emitted on change. Older gateway (no sandbox_config) doesn't break sandbox.
Depends on: Steps 2, 3.
Step 5: Log site migration — Network + HTTP events (25 sites) (~1.5-2 days)
- Refactor all 19 Network Activity [4001] log sites in
proxy.rsandbypass_monitor.rs:- CONNECT allow (L4-only), CONNECT_L7 (allow, L7 follows), CONNECT deny →
NetworkActivityBuilder - BYPASS_DETECT network event →
NetworkActivityBuilder(withobservation_point_id=3) - Proxy listen, connection errors, relay errors →
NetworkActivityBuilder - FORWARD parse/reject/upstream errors →
NetworkActivityBuilder - SSRF blocks (allowed_ips failed, invalid config, internal IP) →
NetworkActivityBuilder
- CONNECT allow (L4-only), CONNECT_L7 (allow, L7 follows), CONNECT deny →
- Refactor all 7 HTTP Activity [4002] log sites in
proxy.rsandl7/relay.rs:- FORWARD allow/deny →
HttpActivityBuilderwith HTTP method →activity_idmapping - L7_REQUEST →
HttpActivityBuilder - SSRF blocks →
HttpActivityBuilder - Non-inference request at inference.local →
HttpActivityBuilder
- FORWARD allow/deny →
- Each refactored site: construct builder →
ocsf_emit!(). Remove oldinfo!()/warn!()call - Verify shorthand output matches expected patterns for proxy events
Done when: All 25 Network + HTTP log sites use builder → ocsf_emit!(). No ad-hoc info!()/warn!() calls remain for these event types. cargo test passes.
Depends on: Steps 3, 4 (subscriber wired, SandboxContext available).
Step 6: Log site migration — SSH + Process + Finding events (22 sites) (~1.5-2 days)
- Refactor 10 SSH Activity [4007] log sites in
ssh.rs:- SSH listen, handshake read/verify/accepted/failed →
SshActivityBuilder - NSSH1 nonce replay (SSH side of dual-emit) →
SshActivityBuilder - direct-tcpip refuse/fail, unsupported subsystem →
SshActivityBuilder
- SSH listen, handshake read/verify/accepted/failed →
- Refactor 4 Process Activity [1007] log sites in
lib.rsandprocess.rs:- Process started →
ProcessActivityBuilderwithlaunch_type_id=1(Spawn) - Process exited →
ProcessActivityBuilderwithexit_code - Process timed out →
ProcessActivityBuilderwith forced kill - SIGTERM failed →
ProcessActivityBuilder
- Process started →
- Refactor 9 Detection Finding [2004] log sites:
- NSSH1 nonce replay finding (dual-emit with SSH) →
DetectionFindingBuilderwith MITRE T1550/TA0008 - BYPASS_DETECT finding (dual-emit with Network) →
DetectionFindingBuilderwith MITRE T1090.003/TA0011,remediation.descfrom hint - Unsafe disk policy →
DetectionFindingBuilder - L7 policy validation warnings (2x) →
DetectionFindingBuilder - SQL L7 not implemented, HTTP parse error →
DetectionFindingBuilder - Inference interception, upstream chunk error →
DetectionFindingBuilder
- NSSH1 nonce replay finding (dual-emit with SSH) →
- Verify dual-emit: NSSH1 replay produces 1 SSH Activity + 1 Detection Finding
- Verify dual-emit: BYPASS_DETECT produces 1 Network Activity (step 5) + 1 Detection Finding
Done when: All 22 log sites use builder → ocsf_emit!(). Dual-emit sites produce exactly 2 events each. No ad-hoc calls remain for these event types. cargo test passes.
Depends on: Step 5 (BYPASS_DETECT network event already migrated; finding side added here).
Step 7: Log site migration — Lifecycle + Config + Base events (46 sites) (~1.5-2 days)
- Refactor 18 Application Lifecycle [6002] log sites in
main.rs,lib.rs,netns.rs:- Sandbox start, log file fallback →
AppLifecycleBuilder - SSH server ready/failed →
AppLifecycleBuilder - Provider env fetch success/failure →
AppLifecycleBuilder - TLS init success/failure →
AppLifecycleBuilder - SIGCHLD handler, image validation, platform warning →
AppLifecycleBuilder - Config validation (zero/invalid interval) →
AppLifecycleBuilder - Bypass detection setup: installing rules, rules installed, iptables not found, install failed →
AppLifecycleBuilder
- Sandbox start, log file fallback →
- Refactor 24+ Device Config State Change [5019] log sites in
lib.rs:- Policy: load/fetch/reload/fallback/enrichment/poll/report →
ConfigStateChangeBuilder - Inference routes: file load/gateway fetch/update/bundle status →
ConfigStateChangeBuilder
- Policy: load/fetch/reload/fallback/enrichment/poll/report →
- Refactor 20 Base Event [0] log sites in
netns.rs,mechanistic_mapper.rs,proxy.rs,bypass_monitor.rs:- Netns create/cleanup (6 events) →
BaseEventBuilder - Denial flush events →
BaseEventBuilder - DNS resolution failures →
BaseEventBuilder - Proxy operational errors →
BaseEventBuilder - Bypass detection operational events (rule failures, dmesg failures) →
BaseEventBuilder
- Netns create/cleanup (6 events) →
Done when: All remaining 46 log sites use builder → ocsf_emit!(). Every file-level log statement in the sandbox (93 total) now goes through OCSF builders. No ad-hoc info!()/warn!() calls remain for events that reach the file layer. cargo test passes.
Depends on: Steps 5, 6.
Step 8: Profile enrichment (~1 day)
- Verify and apply OCSF profiles to all events:
- Container profile: All events include
containerfromSandboxContext::container() - Network Proxy profile: Network Activity and HTTP Activity include
proxy_endpointfromSandboxContext::proxy_endpoint() - Security Control profile: Network, HTTP, SSH, Detection Finding include
action_id,disposition_id,firewall_rulewhere applicable - Host profile: All events include
devicefromSandboxContext::device()
- Container profile: All events include
- Populate v1.6.0+ fields:
observation_point_id=2(Destination) on proxy network eventsobservation_point_id=3(Inline) on BYPASS_DETECT network eventsis_src_dst_assignment_known=trueon all network events
- Verify
metadata.profilesarray is correctly populated per event class - Schema validation tests confirm profile fields are present
Done when: Every event includes correct profile fields. metadata.profiles lists profiles applied. Schema validation confirms profile fields present.
Depends on: Steps 5, 6, 7.
Step 9: Log push channel update (~0.5 day)
- Ensure gRPC log push layer receives shorthand text from
OcsfShorthandLayeroutput - Verify
messagefield inSandboxLogLineproto contains the shorthand line - Verify
openshell sandbox logsCLI command displays shorthand output correctly - Verify TUI log panel renders shorthand lines
- No proto changes needed — same
SandboxLogLinemessage, better formatted content
Done when: Log push delivers shorthand text to gateway. openshell sandbox logs shows shorthand lines. TUI displays correctly. No regressions.
Depends on: Steps 3, 5-7.
Step 10: E2E verification (~1 day)
- Run
mise run e2eand verify all existing tests pass (update assertions as needed for shorthand format) - Verify dual-file output:
/var/log/openshell.logcontains shorthand text lines (not JSON, not old format)/var/log/openshell-ocsf.log(when enabled) contains OCSF JSONL with correct schema structure- Line counts match between files (accounting for dual-emit events)
- Verify config toggle: enable/disable OCSF via gateway config, observe JSONL file creation/cessation without sandbox restart
- Update
test_sandbox_policy.pyassertions for shorthand patterns - Verify log push and TUI end-to-end
Done when: mise run e2e passes. Manual verification of dual-file output, log push, TUI display, and config toggle all succeed.
Depends on: All prior steps.
Step 11: Cleanup (~0.5 day)
- Remove all replaced ad-hoc
info!()/warn!()calls superseded byocsf_emit!() - Verify no OCSF formatting or serialization code exists in
openshell-sandbox— all lives inopenshell-ocsf - Verify
openshell-sandboxonly contains: builder call sites, subscriber wiring inmain.rs,SandboxContextcreation, config poll integration - Run
mise run pre-commitand fix any issues - Run
cargo clippyacross the workspace — zero warnings - Update
architecture/docs to describe new logging architecture (dual-file output, OCSF adoption, SandboxConfig) - Update
docs/for any user-facing log format changes
Done when: No dead code remains. mise run pre-commit passes. cargo clippy zero warnings. Architecture docs updated. No OCSF logic in sandbox outside call sites and wiring.
Depends on: Steps 5-10.
Acceptance Criteria
- All 93 file-level log sites in
openshell-sandboxemit typed OCSF events via builder →ocsf_emit!() /var/log/openshell.logcontains shorthand-formatted text (not oldtracing::fmt::Fulloutput)/var/log/openshell-ocsf.logis created only whenSandboxConfig.logging.ocsf_enabled=trueand contains valid OCSF JSONL- OCSF JSONL toggle is hot-reloadable via
SandboxConfig.config_revision— no sandbox restart needed - Backward compatible: sandbox works correctly with a gateway that does not populate
sandbox_config - gRPC log push delivers shorthand text. TUI and
openshell sandbox logsdisplay correctly - Dual-emit events (BYPASS_DETECT, NSSH1 replay) produce exactly 2 OCSF events each
- All OCSF profiles (Container, Network Proxy, Security Control, Host) correctly applied
mise run e2epasses with updated assertionsmise run pre-commitandcargo clippypass with zero issues- Architecture docs updated to describe dual-file logging and OCSF adoption
Estimated Effort
~9-12 days (after Part 1 is complete)
References
- Full plan:
.opencode/plans/ocsf-log-export.md - Part 1 (dependency): feat(ocsf): create openshell-ocsf crate — standalone OCSF event types, formatters, and tracing layers #392
- OCSF v1.7.0 schema: https://schema.ocsf.io/1.7.0/