Technical Deep Dive: Bayesian Skill Validation

How Skills Get Promoted to Procedural Memory

Most AI memory systems just store and retrieve text. AMS goes further — it learns executable patterns and validates them before promotion.

Here's how a raw action becomes a trusted skill:

1 Pattern Detection (Episodic → Candidate)

When an agent performs an action successfully, AMS logs it to Episodic Memory:

{
  "action": "diagnose_low_water_cutoff",
  "context": {"equipment": "CB-700", "symptom": "false_trip"},
  "result": "success",
  "steps": ["check_probe_continuity", "inspect_for_scale", "verify_water_level"],
  "timestamp": "2026-01-27T14:32:00Z"
}

When the same pattern succeeds 3+ times across different contexts, AMS flags it as a Candidate Skill.

2 Bayesian Confidence Scoring

Each candidate skill gets a confidence score using Bayesian inference:

P(Skill works | Evidence) = P(Evidence | Skill works) × P(Skill works) / P(Evidence)

In plain English:

Prior P(Skill works): How often do skills like this succeed in general? (Base rate)
Likelihood P(Evidence | Skill works): Given this skill works, how likely is the observed evidence?
Posterior: Updated confidence after each execution

Execution	Outcome	Prior	Posterior
1	✅ Success	0.50	0.67
2	✅ Success	0.67	0.80
3	❌ Failure	0.80	0.71
4	✅ Success	0.71	0.81
5	✅ Success	0.81	0.89

Promotion threshold: 0.85 confidence after 5+ executions

3 Code Crystallization

Once a pattern hits the confidence threshold, AMS generates executable code:

# Auto-generated Automaton: diagnose_low_water_cutoff
# Confidence: 0.89 | Executions: 12 | Success Rate: 91.7%
# Source: Episodic memories [em_4821, em_4856, em_4901, ...]

async def diagnose_low_water_cutoff(context: Dict) -> DiagnosisResult:
    """
    Diagnoses false trips on low water cutoffs for CB boilers.
    Bayesian confidence: 0.89 (validated across 12 executions)
    """
    steps = [
        ("check_probe_continuity", check_probe_continuity),
        ("inspect_for_scale", inspect_for_scale),
        ("verify_water_level", verify_water_level),
    ]
    
    for step_name, step_fn in steps:
        result = await step_fn(context)
        if result.is_diagnostic:
            return DiagnosisResult(
                cause=result.finding,
                confidence=self.bayesian_confidence,
                source_memories=self.source_ids
            )
    
    return DiagnosisResult(cause="unknown", confidence=0.0)

Key Details:

Code includes its confidence score and source memories
Every execution updates the Bayesian tracker
If confidence drops below 0.70, the skill is demoted back to Candidate status

4 Continuous Validation

Promoted skills aren't static. The Bayesian tracker keeps learning:

┌─────────────────────────────────────────────────────────┐ │ SKILL: diagnose_low_water_cutoff │ │ Status: PROMOTED (Procedural Memory) │ │ │ │ Confidence: 0.89 ████████████████████░░ (89%) │ │ Executions: 12 total (11 success, 1 failure) │ │ Last used: 2 hours ago │ │ Trend: ↗ Improving (+0.03 over last 5 runs) │ │ │ │ [View Source Memories] [Execution History] [Demote] │ └─────────────────────────────────────────────────────────┘

Why This Matters for Enterprise

No Black Boxes: Every skill traces back to specific episodic memories
Self-Correcting: Bad patterns get demoted automatically
Auditable: Regulators can see exactly how and why a skill was learned
No Retraining: Update one skill without touching the LLM

The Math (For the Technical Audience)

For each skill S with evidence history E = {e₁, e₂, ..., eₙ}:

P(S | E) = P(S) × ∏ᵢ P(eᵢ | S) / P(E)

Where:

P(S) = Prior probability (starts at 0.5 for new patterns)
P(eᵢ | S) = 0.9 for success, 0.1 for failure (configurable)
Normalization ensures P(S | E) + P(¬S | E) = 1

Promotion Criteria

P(S | E) ≥ 0.85
|E| ≥ 5 (minimum 5 observations)
At least 3 unique contexts (prevents overfitting)

Integration with DLPFC Orchestrator

When an agent needs to perform an action, the DLPFC checks Procedural Memory first:

1. Agent receives task: "Diagnose this CB-700 low water cutoff issue"
2. DLPFC queries Procedural Memory for relevant skills
3. Finds: diagnose_low_water_cutoff (confidence: 0.89)
4. Executes skill instead of generating from scratch
5. Logs result → Updates Bayesian tracker

Result: Faster execution, consistent behavior, continuously improving accuracy.

This is how AMS turns experience into expertise — automatically, transparently, and accountably.