Learning Loop - MCP Orchestrator

The Feedback Cycle

Every task execution creates learning data that improves future tool selection:

M1 Core

Execute Task

->

M3 Observability

Log Event

->

M5 Learning

Store Outcome

->

M5 Learning

Update Score Boost

Next Request -> M1 uses boosted scores

What Gets Stored

M5 maintains two data stores:

Store	Technology	Purpose
Vector DB	LanceDB	Task embeddings for semantic similarity
Graph DB	Neo4j	Tool-Task-Outcome relationships

For each execution, we store:

Task description (embedded to 384-dim vector)
Tool selected
Capabilities matched
Success/failure outcome
Timestamp (for temporal decay)

Score Boost Calculation

The learning boost ranges from 0.5x to 2.0x based on historical success:

success_rate = successes / total_executions
temporal_weight = decay_factor ^ days_since_execution
weighted_rate = sum(outcome * temporal_weight) / sum(temporal_weight)

# Map to boost range [0.5, 2.0]
learning_boost = 0.5 + (weighted_rate * 1.5)
            

Temporal Decay

Recent outcomes matter more than old ones. Default decay factor: 0.95 per day.

After 30 days, an outcome carries only ~23% of its original weight. After 90 days, data is cleaned up (M5-FIX-001).

Example Timeline

How tool scores evolve over time:

Day 1, 10:00 AM

Task: "Generate Fibonacci function"
Tool: claude_code_generation
Success - boost increases to 1.1x

Day 1, 2:00 PM

Task: "Write sorting algorithm"
Tool: claude_code_generation
Success - boost increases to 1.25x

Day 2, 9:00 AM

Task: "Generate complex regex"
Tool: claude_code_generation
Failure - boost decreases to 1.10x

Day 3, 11:00 AM

Task: "Create REST API handler"
Tool: claude_code_generation
Success - boost increases to 1.20x