The Feedback Cycle
Every task execution creates learning data that improves future tool selection:
->
M3 Observability
Log Event
->
M5 Learning
Store Outcome
->
M5 Learning
Update Score Boost
Next Request -> M1 uses boosted scores
What Gets Stored
M5 maintains two data stores:
| Store |
Technology |
Purpose |
| Vector DB |
LanceDB |
Task embeddings for semantic similarity |
| Graph DB |
Neo4j |
Tool-Task-Outcome relationships |
For each execution, we store:
- Task description (embedded to 384-dim vector)
- Tool selected
- Capabilities matched
- Success/failure outcome
- Timestamp (for temporal decay)
Score Boost Calculation
The learning boost ranges from 0.5x to 2.0x based on historical success:
success_rate = successes / total_executions
temporal_weight = decay_factor ^ days_since_execution
weighted_rate = sum(outcome * temporal_weight) / sum(temporal_weight)
# Map to boost range [0.5, 2.0]
learning_boost = 0.5 + (weighted_rate * 1.5)
Temporal Decay
Recent outcomes matter more than old ones. Default decay factor: 0.95 per day.
After 30 days, an outcome carries only ~23% of its original weight.
After 90 days, data is cleaned up (M5-FIX-001).
Example Timeline
How tool scores evolve over time:
Day 1, 10:00 AM
Task: "Generate Fibonacci function"
Tool: claude_code_generation
Success - boost increases to 1.1x
Day 1, 2:00 PM
Task: "Write sorting algorithm"
Tool: claude_code_generation
Success - boost increases to 1.25x
Day 2, 9:00 AM
Task: "Generate complex regex"
Tool: claude_code_generation
Failure - boost decreases to 1.10x
Day 3, 11:00 AM
Task: "Create REST API handler"
Tool: claude_code_generation
Success - boost increases to 1.20x