Learning Loop

How M3, M5, and M1 work together

The Feedback Cycle

Every task execution creates learning data that improves future tool selection:

M1 Core
Execute Task
->
M3 Observability
Log Event
->
M5 Learning
Store Outcome
->
M5 Learning
Update Score Boost
Next Request -> M1 uses boosted scores

What Gets Stored

M5 maintains two data stores:

Store Technology Purpose
Vector DB LanceDB Task embeddings for semantic similarity
Graph DB Neo4j Tool-Task-Outcome relationships

For each execution, we store:

Score Boost Calculation

The learning boost ranges from 0.5x to 2.0x based on historical success:

success_rate = successes / total_executions temporal_weight = decay_factor ^ days_since_execution weighted_rate = sum(outcome * temporal_weight) / sum(temporal_weight) # Map to boost range [0.5, 2.0] learning_boost = 0.5 + (weighted_rate * 1.5)

Temporal Decay

Recent outcomes matter more than old ones. Default decay factor: 0.95 per day.

After 30 days, an outcome carries only ~23% of its original weight. After 90 days, data is cleaned up (M5-FIX-001).

Example Timeline

How tool scores evolve over time:

Day 1, 10:00 AM
Task: "Generate Fibonacci function"
Tool: claude_code_generation
Success - boost increases to 1.1x
Day 1, 2:00 PM
Task: "Write sorting algorithm"
Tool: claude_code_generation
Success - boost increases to 1.25x
Day 2, 9:00 AM
Task: "Generate complex regex"
Tool: claude_code_generation
Failure - boost decreases to 1.10x
Day 3, 11:00 AM
Task: "Create REST API handler"
Tool: claude_code_generation
Success - boost increases to 1.20x