Nonstop CI Autopilot ==================== The autopilot is a self-sustaining loop that continuously proposes, runs, and records coil-optimization cases on a self-hosted CI runner. Its purpose is to **autonomously explore the design space** across multiple plasma surfaces and constraint settings, building the leaderboard without manual intervention. How it works ------------ Three phases repeat indefinitely: 1. **Propose** — ``python -m tools.propose_batch`` generates a batch of cases (default 10; configurable via ``policy/proposer_policy.yaml``). 2. **Run** — The CI runner executes each case via ``stellcoilbench run-ci-case``. Cases run in parallel (up to 10 concurrent jobs) with per-case timeouts. 3. **Record** — Results are written to ``cases/done//summary.json`` (locally, for runner polling; not committed) and to ``submissions//auto//``. Failures are recorded in ``policy/autopilot_failures.json``. The completed results feed the next proposal round. A **batch barrier** ensures that the proposer never writes new cases while ``cases/pending/`` still contains unfinished work. The CI workflow ``update-db-self-hosted.yml`` drives the loop: each push to ``main`` (or a 10-minute cron safety net) triggers ``run_autopilot_cases`` → ``propose_autopilot_batch``. Autopilot commits (``submissions/``, ``policy/autopilot_failures.json``, ``cases/pending/``) are filtered via ``paths-ignore`` so they do not re-trigger benchmark jobs, keeping the loop self-sustaining. Proposer modes -------------- The proposer supports two modes. Both share the same guardrails, validation, and output format; only the case-generation strategy differs. Deterministic (genetic-algorithm) policy — default ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The default proposer is a lightweight, deterministic genetic-algorithm (GA) style optimizer. It requires **no external API** and is fully reproducible with a ``--seed`` flag. Each batch is split into two halves (ratio set by ``exploit_fraction``): - **Exploit (mutation)** — The proposer selects a parent from the top-:math:`k` feasible results (ranked by composite score), clones its ``case_config``, and applies random perturbations: - **Threshold jitter** — Each constraint threshold is multiplied by a log-normal factor (:math:`\sigma` from ``threshold_sigma``), nudging the optimizer toward a different trade-off on the Pareto frontier. - **Structural mutation** — With probability ``structural_mutation_prob``, the number of base coils or Fourier order is changed to an adjacent value. - A **novelty check** (config hash) rejects duplicates of recent runs. - **Explore** — New random cases with surface, coil count, and order drawn from the policy's allowed sets. Thresholds are sampled from log-uniform ranges to create diversity across the design space. .. code-block:: bash # Default (deterministic) proposer python -m tools.propose_batch --batch-size 10 --dry-run --seed 42 LLM-powered policy — opt-in ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When the ``--llm`` flag is passed, the proposer calls the LLM directly with context from ``build_context`` (top parents, failure statistics, surface exploration counts, run cards, postmortems) and ``llm_context.md``. The LLM returns a list of ``mutate`` / ``explore`` **actions** that are converted into CI case dicts by ``apply_llm_action()``. Requires ``ANTHROPIC_API_KEY`` (or ``KB_LLM_*`` env vars). If the LLM returns an error or fewer than ``batch_size`` valid cases, the proposer **falls back automatically** to the deterministic GA policy. .. code-block:: bash # LLM proposer (direct mode) python -m tools.propose_batch --batch-size 10 --llm --dry-run The LLM mode is useful when you want the proposer to reason about which surfaces are under-explored, which constraint combinations have not yet been tried, or which failure modes to avoid — tasks that go beyond what a fixed mutation/exploration policy can do. Guardrails and safe mode ------------------------ The proposer checks a sliding window (default 30) of recent results before every batch: - **Fail-rate guard** — If the failure rate exceeds ``max_fail_rate`` (default 0.6), proposals halt and ``PAUSE_AUTORUN`` is written. - **Repeated-failure guard** — If the most common failure reason repeats more than ``max_common_failure_count`` times, proposals halt. - **Critical-class guard** — Tracks classes like ``vmec_nonconverged``, ``nan_in_objective``, ``timeout``; halts if any exceeds ``max_critical_class_count``. Between the safe-mode threshold (0.35) and the hard halt (0.6), the proposer enters **safe mode**: mutation sigma is reduced, iteration caps are lowered, and exploration is restricted to preferred (simpler) surfaces. Pausing and resuming -------------------- Create the file ``PAUSE_AUTORUN`` in the repo root to halt all proposals. The proposer checks for this file on every invocation and exits immediately if it exists. Guardrail triggers can also create this file automatically (controlled by ``cooldown.write_pause_file`` in the policy). To resume, simply delete the file: .. code-block:: bash # Pause touch PAUSE_AUTORUN # Resume rm PAUSE_AUTORUN Directory layout ---------------- :: cases/pending/ Proposer writes new case JSONs here cases/done// summary.json, case.yaml, coils.json submissions//auto// Results for leaderboard policy/proposer_policy.yaml Proposer configuration tools/propose_batch/ Proposer package (python -m tools.propose_batch) tools/build_context.py Context builder Pending-case format ------------------- Each file in ``cases/pending/`` is a JSON dict with: - ``case_id`` — unique timestamp + random suffix - ``case_config`` — full case YAML equivalent (surface, coils, optimizer, objective terms) - ``resource`` — ``max_total_iterations`` (capped at 10 000) and ``timeout_minutes`` - ``parent_ids`` — list of parent case IDs (empty for exploration) - ``tags`` — ``["exploit"]``, ``["explore"]``, ``["exploit", "llm"]``, etc. - ``random_seed`` — for reproducibility Commands -------- .. code-block:: bash # Preview a batch (dry run) python -m tools.propose_batch --batch-size 10 --dry-run --seed 42 # Build context payload from completed cases python tools/build_context.py [--out context.json] # Run a single CI case locally stellcoilbench run-ci-case [--output-dir cases/done] \ [--policy policy/proposer_policy.yaml] # Emergency stop touch PAUSE_AUTORUN # Resume after stop rm PAUSE_AUTORUN