Self-Play & Oscillation

Internal debate patterns for quality assurance.


Self-Play Pattern

Two “perspectives” debate until convergence:

┌─────────────────────────────────────────────────────────┐
│                    Self-Play Pattern                    │
├─────────────────────────────────────────────────────────┤
│                                                         │
│    ┌──────────────┐         ┌──────────────┐           │
│    │   PROPOSER   │◄───────►│   CRITIC     │           │
│    │   "Build X"  │         │   "Check X"  │           │
│    └──────┬───────┘         └───────┬──────┘           │
│           │                         │                   │
│           ▼                         ▼                   │
│    ┌──────────────┐         ┌──────────────┐           │
│    │   Proposal   │────────►│   Critique   │           │
│    └──────────────┘         └──────┬───────┘           │
│                                    │                    │
│              ┌─────────────────────┴──────────────┐    │
│              │                                    │    │
│        APPROVED ▼                          REJECTED ▼   │
│       ┌──────────────┐              ┌──────────────┐   │
│       │   COMMIT     │              │   REFINE     │   │
│       └──────────────┘              └──────────────┘   │
│                                                         │
└─────────────────────────────────────────────────────────┘

Implementation

def self_play_orchestration(task, max_rounds=5):
    """
    Internal debate between proposer and critic.
    Converges when critic approves or max rounds reached.
    """
    proposal = None
    history = []

    for round in range(max_rounds):
        # Phase 1: Proposer generates/refines
        if proposal is None:
            proposal = proposer_generate(task)
        else:
            proposal = proposer_refine(task, proposal, critique)

        # Phase 2: Critic evaluates
        critique = critic_evaluate(task, proposal)

        history.append({
            "round": round,
            "proposal": proposal,
            "critique": critique
        })

        # Check for convergence
        if critique.approved:
            return {
                "status": "converged",
                "result": proposal,
                "rounds": round + 1
            }

    return {
        "status": "max_rounds",
        "result": proposal,
        "rounds": max_rounds
    }

DIALECTIC Methodology

SPINE’s self-play pattern uses thesis/antithesis/synthesis:

    THESIS ──────────► ANTITHESIS ──────────► SYNTHESIS
    (Propose)           (Critique)             (Merge)

    ┌─────────┐       ┌─────────┐       ┌─────────┐
    │ Generate│──────►│ Challenge│──────►│ Resolve │
    │ Solution│       │ Solution │       │ Conflict│
    └─────────┘       └─────────┘       └─────────┘
         │                 │                 │
         ▼                 ▼                 ▼
    "Here's my         "But what         "Final
     approach"          about X?"        version"

Oscillation Pattern

Alternating between perspectives to refine understanding:

    Round 1    Round 2    Round 3    Round 4
    ┌─────┐    ┌─────┐    ┌─────┐    ┌─────┐
    │  A  │───►│  B  │───►│  A  │───►│  B  │──► ...
    └─────┘    └─────┘    └─────┘    └─────┘

    Expand     Contract   Expand     Contract
    Explore    Focus      Explore    Focus
    Diverge    Converge   Diverge    Converge

    ─────────────────────────────────────────►
              Stability Increases

Oscillation Detection

Detect when execution is going in circles:

class OscillationDetector:
    def __init__(self, window_size=5):
        self.history = []
        self.window_size = window_size

    def record(self, state):
        state_hash = hash(str(state))
        self.history.append(state_hash)

    def is_oscillating(self) -> bool:
        if len(self.history) < self.window_size:
            return False

        recent = self.history[-self.window_size:]
        unique = set(recent)

        # Only 2 unique states = A → B → A → B pattern
        if len(unique) <= 2:
            return True

        return False

When to Use

Scenario Recommended
Code review Self-play (proposer/critic)
Design decisions Oscillation (expand/contract)
Complex reasoning DIALECTIC
Quality assurance Self-play

Next Steps


← Back to Concepts

Back to top

8me Showcase - AI Agent Orchestration Learning Platform

This site uses Just the Docs, a documentation theme for Jekyll.