feat: report supervisor startup errors to CLI via gateway

## Problem

When a sandbox starts with `--from` (a custom container image), the sandbox supervisor can fail to start for various reasons:

1. **Policy parse failure** — the container ships a stale `policy.yaml` with removed fields (e.g. `inference`). `discover_policy_from_path` catches the error and silently falls back to the restrictive default. The user sees a working sandbox with the wrong policy and no explanation.
2. **User validation failure** — the container doesn't have a `sandbox` user. The pod crashloops with no actionable message.
3. **Network namespace setup failure** — same crashloop-with-no-message outcome.
4. **Any other fatal startup error** — OPA engine construction, TLS generation, etc. Same outcome.

The root cause is that the supervisor has no channel to report errors back to the gateway. When `run_sandbox` returns `Err(...)`, the process exits, the pod restarts, and the CRD watcher sees `DependenciesNotReady` → `Provisioning` forever. The CLI eventually times out (120s) with a generic message.

### Regression context

Commit `e3ea796` removed the `inference` field from the `PolicyFile` serde struct (which uses `deny_unknown_fields`). Any container image that still ships a `policy.yaml` with an `inference:` section fails to parse. The error is swallowed and the restrictive default (all network blocked) is synced to the gateway as the baseline. The intent is NOT to add backward-compat serde fields — the YAML schema is the schema. Instead, the error should be reported to the user.

## Solution

Add a generalized supervisor startup error reporting path:

1. Sandbox supervisor hits a fatal startup error
2. Before exiting, calls a new `ReportSupervisorError` gRPC RPC (fire-and-forget, 5s timeout)
3. Gateway sets `phase=Error` with condition `Ready=False, reason=SupervisorError, message=<the error>`, persists, notifies watch bus, acks immediately, then spawns async K8s resource deletion
4. CLI watch loop (existing) sees `Error` phase, displays the real error message, exits
5. Supervisor exits non-zero — pod is already being deleted, no crashloop

### Key design decisions

- **Fire-and-forget RPC** — supervisor tries once with a short timeout, then exits regardless
- **Async resource deletion** — gateway acks the RPC immediately, spawns `tokio::spawn` to delete K8s resources (avoids deadlock where pod can't terminate while waiting for RPC response)
- **`handle_applied` preservation** — CRD watcher must not overwrite a gateway-set `SupervisorError` phase with `Provisioning`
- **Error truncation** — messages truncated to ~4KB on the gateway
- **`discover_policy_from_path` becomes fallible** — parse/validation errors propagate instead of being silently swallowed
- **Zero CLI changes** — existing watch loop at `run.rs:2030` already handles `Error` phase

## Implementation Plan

Full plan with code sketches: [`architecture/plans/supervisor-startup-error-reporting.md`](https://github.com/anomalyco/openshell/blob/main/architecture/plans/supervisor-startup-error-reporting.md)

### Sequenced work items

| Step | Scope | Files | Independently Mergeable? |
|------|-------|-------|--------------------------|
| 1 | Proto: add `ReportSupervisorError` RPC + messages | `proto/openshell.proto` | Yes (additive) |
| 2 | Supervisor: fire-and-forget gRPC client function | `grpc_client.rs`, `lib.rs` | Yes (dead code until step 3) |
| 3 | Supervisor: wrap `main.rs` error path to call error reporter | `main.rs` | Yes (fire-and-forget, gateway rejects until step 5) |
| 4 | Supervisor: make `discover_policy_from_path` fallible | `lib.rs` (3 functions) | Yes (behavior change, testable standalone) |
| 5 | Gateway: implement `ReportSupervisorError` handler | `grpc.rs` | Yes (no callers until step 2+3 deployed) |
| 6 | Gateway: protect `handle_applied` from overwriting `SupervisorError` | `sandbox/mod.rs` | Yes (purely defensive) |

**Recommended merge order:** 1 → 2+5 (parallel) → 3+6 (parallel) → 4 (last — this is the behavior change)

## Acceptance Criteria

- [ ] Sandbox created with `--from` using a container with a stale/malformed `policy.yaml` shows the parse error to the user during `openshell sandbox create`
- [ ] Sandbox transitions to `Error` phase with the actual error message visible via `openshell sandbox get`
- [ ] K8s resources are cleaned up automatically (no orphaned crashlooping pods)
- [ ] If the gateway is unreachable when the supervisor tries to report, the supervisor still exits (no hang)
- [ ] Other supervisor startup errors (user validation, netns setup, etc.) are also reported through the same channel
- [ ] `handle_applied` does not overwrite a `SupervisorError` phase with `Provisioning`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: report supervisor startup errors to CLI via gateway #289

Problem

Regression context

Solution

Key design decisions

Implementation Plan

Sequenced work items

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Step	Scope	Files	Independently Mergeable?
1	Proto: add `ReportSupervisorError` RPC + messages	`proto/openshell.proto`	Yes (additive)
2	Supervisor: fire-and-forget gRPC client function	`grpc_client.rs`, `lib.rs`	Yes (dead code until step 3)
3	Supervisor: wrap `main.rs` error path to call error reporter	`main.rs`	Yes (fire-and-forget, gateway rejects until step 5)
4	Supervisor: make `discover_policy_from_path` fallible	`lib.rs` (3 functions)	Yes (behavior change, testable standalone)
5	Gateway: implement `ReportSupervisorError` handler	`grpc.rs`	Yes (no callers until step 2+3 deployed)
6	Gateway: protect `handle_applied` from overwriting `SupervisorError`	`sandbox/mod.rs`	Yes (purely defensive)

feat: report supervisor startup errors to CLI via gateway #289

Description

Problem

Regression context

Solution

Key design decisions

Implementation Plan

Sequenced work items

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions