Self-Play & Oscillation

Internal debate patterns for quality assurance.

Self-Play Pattern

Two “perspectives” debate until convergence:

graph TB
    PROPOSER["PROPOSER<br/>Build X"]:::primary
    CRITIC["CRITIC<br/>Check X"]:::secondary
    PROPOSAL["Proposal"]:::tertiary
    CRITIQUE["Critique"]:::tertiary
    COMMIT["COMMIT"]:::primary
    REFINE["REFINE"]:::accent

    PROPOSER <--> CRITIC
    PROPOSER --> PROPOSAL
    CRITIC --> CRITIQUE
    PROPOSAL --> CRITIQUE
    CRITIQUE -- APPROVED --> COMMIT
    CRITIQUE -- REJECTED --> REFINE
    REFINE --> PROPOSER

    classDef primary fill:#2563eb,color:#fff
    classDef secondary fill:#7c3aed,color:#fff
    classDef tertiary fill:#0d9488,color:#fff
    classDef accent fill:#f59e0b,color:#000

Implementation

def self_play_orchestration(task, max_rounds=5):
    """
    Internal debate between proposer and critic.
    Converges when critic approves or max rounds reached.
    """
    proposal = None
    history = []

    for round in range(max_rounds):
        # Phase 1: Proposer generates/refines
        if proposal is None:
            proposal = proposer_generate(task)
        else:
            proposal = proposer_refine(task, proposal, critique)

        # Phase 2: Critic evaluates
        critique = critic_evaluate(task, proposal)

        history.append({
            "round": round,
            "proposal": proposal,
            "critique": critique
        })

        # Check for convergence
        if critique.approved:
            return {
                "status": "converged",
                "result": proposal,
                "rounds": round + 1
            }

    return {
        "status": "max_rounds",
        "result": proposal,
        "rounds": max_rounds
    }

DIALECTIC Methodology

SPINE’s self-play pattern uses thesis/antithesis/synthesis:

graph LR
    T["THESIS<br/>Propose"]:::primary
    A["ANTITHESIS<br/>Critique"]:::secondary
    S["SYNTHESIS<br/>Merge"]:::tertiary

    T --> A --> S

    T -.- T1["Generate Solution"]:::primary
    A -.- A1["Challenge Solution"]:::secondary
    S -.- S1["Resolve Conflict"]:::tertiary

    classDef primary fill:#2563eb,color:#fff
    classDef secondary fill:#7c3aed,color:#fff
    classDef tertiary fill:#0d9488,color:#fff

Oscillation Pattern

Alternating between perspectives to refine understanding:

graph LR
    R1["Round 1: A<br/>Expand / Explore / Diverge"]:::primary
    R2["Round 2: B<br/>Contract / Focus / Converge"]:::secondary
    R3["Round 3: A<br/>Expand / Explore / Diverge"]:::primary
    R4["Round 4: B<br/>Contract / Focus / Converge"]:::secondary
    MORE["..."]:::dark

    R1 --> R2 --> R3 --> R4 --> MORE

    classDef primary fill:#2563eb,color:#fff
    classDef secondary fill:#7c3aed,color:#fff
    classDef dark fill:#1e293b,color:#fff

Oscillation Detection

Detect when execution is going in circles:

class OscillationDetector:
    def __init__(self, window_size=5):
        self.history = []
        self.window_size = window_size

    def record(self, state):
        state_hash = hash(str(state))
        self.history.append(state_hash)

    def is_oscillating(self) -> bool:
        if len(self.history) < self.window_size:
            return False

        recent = self.history[-self.window_size:]
        unique = set(recent)

        # Only 2 unique states = A → B → A → B pattern
        if len(unique) <= 2:
            return True

        return False

When to Use

Scenario	Recommended
Code review	Self-play (proposer/critic)
Design decisions	Oscillation (expand/contract)
Complex reasoning	DIALECTIC
Quality assurance	Self-play

Next Steps

Learn about Gating Mechanisms
See implementation: Lab 11: Self-Play

← Back to Concepts