Bayesian-Agent treats each Skill or SOP as a hypothesis about agent success under a task context. The method is harness-agnostic: it can bootstrap Skills in a full run, repair existing agents incrementally, or transfer the same posterior Skill registry across compatible harnesses.
P(success | theta, C, h)
theta: frozen base model parametersC: inference condition, including prompt, memory, tools, retrieved context, and harness feedbackh: Skill/SOP hypothesis
The framework does not train the base model and does not require replacing the agent runtime. It changes the inference environment by maintaining posterior-weighted Skill context that can be injected through adapters.
Each agent run emits TrajectoryEvidence:
- task id
- skill id
- context
- success or failure outcome
- token counts
- latency and turns
- failure mode
- task metadata
Evidence should be action-verified. For example, a benchmark grader, unit test, or deterministic checker should decide whether a run succeeded.
Each Skill uses a Beta posterior:
success: alpha_i += 1
failure: beta_i += 1
E[p_i] = alpha_i / (alpha_i + beta_i)
The registry also tracks cost, context distribution, and failure modes. These statistics guide what gets injected into future context.
The default policy maps posterior state to actions:
compress: repeated success suggests the Skill is stablepatch: failures cluster around a recurring failure modesplit: evidence spans different contextsretire: failures dominate the posteriorexplore: evidence is still sparse or uncertain
The policy is intentionally small in v0.4. It is designed to be replaced by project-specific policies.
Full self-evolving mode runs all tasks and updates Skill beliefs online. This mode tests whether Bayesian Skill Evolution can improve an agent from scratch.
Incremental repair mode starts from a baseline agent's traces:
baseline traces -> failure ids -> Bayesian context -> rerun failures -> merged final result
This mode is the recommended production path because it adds Bayesian-Agent as a plug-in repair layer instead of replacing the base agent.