Add /skill-test suite: linter, behavioral specs, and coverage catalog for 52 skills

- New skill: /skill-test (static | spec | audit modes) - static: 7-check structural linter per skill file - spec: Claude-evaluated behavioral assertions against test specs - audit: coverage report across all 52 skills with priority gaps - New hook: validate-skill-change.sh — advisory reminder to lint after skill edits - New template: skill-test-spec.md — standard structure for authoring test specs - New: tests/skills/catalog.yaml — machine-readable coverage index (52 skills) - New: tests/skills/_fixtures/ — shared fixtures (complete concept, incomplete GDD) - New: 4 seed test specs for critical gate skills (gate-check, design-review, story-readiness, story-done) — 4 cases each - Modified: settings.json — validate-skill-change.sh added to PostToolUse hook Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-27 04:51:46 +00:00 · 2026-03-13 17:05:08 +11:00
parent cdb1aa83b7
commit af2b864796
11 changed files with 1587 additions and 0 deletions
--- a/.claude/docs/templates/skill-test-spec.md
+++ b/.claude/docs/templates/skill-test-spec.md
@@ -0,0 +1,96 @@
 # Skill Test Spec: /[skill-name]
 ## Skill Summary
 [One paragraph: what this skill does, when to use it, what it produces. Include
 the primary output artifact, the verdict format it uses, and which pipeline stage
 it belongs to.]
 ---
 ## Static Assertions (Structural)
 Verified automatically by `/skill-test static` — no fixture needed.
 - [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
 - [ ] Has ≥2 phase headings (## Phase N or numbered ## sections)
 - [ ] Contains verdict keywords: [list the ones expected, e.g., PASS, FAIL, CONCERNS]
 - [ ] Contains "May I write" collaborative protocol language (if skill writes files)
 - [ ] Has a next-step handoff at the end
 ---
 ## Test Cases
 ### Case 1: Happy Path — [short description]
 **Fixture:** [Describe the assumed project state. Which files exist? What do they
 contain? E.g., "game-concept.md exists with all 8 required sections complete.
 systems-index.md exists. All MVP GDDs are present and individually reviewed."]
 **Input:** `/[skill-name] [args]`
 **Expected behavior:**
 1. [Phase 1 action — what the skill should read or check]
 2. [Phase 2 action — what the skill should evaluate]
 3. [Phase N action — what the skill should output]
 **Assertions:**
 - [ ] Skill reads [specific file] before producing output
 - [ ] Output includes verdict keyword [PASS/FAIL/etc.]
 - [ ] Output lists [specific content] from the fixture
 - [ ] Skill asks for approval before writing any file
 ---
 ### Case 2: Failure Path — [short description, e.g., "Missing required artifact"]
 **Fixture:** [Describe the failure state. E.g., "game-concept.md is missing.
 No files exist in design/gdd/."]
 **Input:** `/[skill-name] [args]`
 **Expected behavior:**
 1. [Phase 1: skill detects missing file]
 2. [Phase 2: skill surfaces the gap rather than assuming OK]
 3. [Output: FAIL or BLOCKED verdict with specific blocker named]
 **Assertions:**
 - [ ] Skill does NOT output PASS when the fixture is incomplete
 - [ ] Skill names the specific missing artifact
 - [ ] Skill suggests a remediation action (e.g., "Run /[other-skill]")
 - [ ] Skill does not create files to fill in the gap without asking
 ---
 ### Case 3: Edge Case — [short description, e.g., "No argument provided"]
 **Fixture:** [State of project files for this case]
 **Input:** `/[skill-name]` (no argument)
 **Expected behavior:**
 1. [What the skill should do when invoked without arguments]
 **Assertions:**
 - [ ] [assertion]
 ---
 ## Protocol Compliance
 - [ ] Uses "May I write" before all file writes
 - [ ] Presents findings or report before asking for write approval
 - [ ] Ends with a recommended next step or follow-up skill
 - [ ] Never auto-creates files without explicit user approval
 - [ ] Does not skip phases or jump straight to a verdict without checking
 ---
 ## Coverage Notes
 [Document what is intentionally NOT tested in this spec and why. Examples:
 - "Case 3 (all-mode) is not covered because it runs too many checks to evaluate
  in a single spec — test each sub-mode individually."
 - "The database integration path is not covered as it requires a live environment."
 - "Edge cases involving corrupted YAML files are deferred to a future spec."]
--- a/.claude/hooks/validate-skill-change.sh
+++ b/.claude/hooks/validate-skill-change.sh
@@ -0,0 +1,39 @@
 #!/bin/bash
 # Claude Code PostToolUse hook: Advises running skill-test after skill file changes
 # Fires when any file inside .claude/skills/ is written or edited.
 #
 # Exit behavior:
 #   exit 0 = advisory only (non-blocking)
 #
 # Input schema (PostToolUse for Write|Edit):
 # { "tool_name": "Write", "tool_input": { "file_path": "...", "content": "..." } }
 INPUT=$(cat)
 # Parse file path -- use jq if available, fall back to grep
 if command -v jq >/dev/null 2>&1; then
    FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
 else
    FILE_PATH=$(echo "$INPUT" | grep -oE '"file_path"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"file_path"[[:space:]]*:[[:space:]]*"//;s/"$//')
 fi
 # Normalize path separators (Windows backslash to forward slash)
 FILE_PATH=$(echo "$FILE_PATH" | sed 's|\\|/|g')
 # Only act on files inside .claude/skills/
 if ! echo "$FILE_PATH" | grep -qE '(^|/)\.claude/skills/'; then
    exit 0
 fi
 # Extract skill name from path (.claude/skills/[skill-name]/SKILL.md)
 SKILL_NAME=$(echo "$FILE_PATH" | grep -oE '\.claude/skills/[^/]+' | sed 's|\.claude/skills/||')
 if [ -z "$SKILL_NAME" ]; then
    exit 0
 fi
 echo "=== Skill Modified: $SKILL_NAME ===" >&2
 echo "Run /skill-test static $SKILL_NAME to validate structural compliance." >&2
 echo "====================================" >&2
 exit 0
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -74,6 +74,11 @@
            "type": "command",
            "command": "bash .claude/hooks/validate-assets.sh",
            "timeout": 10
          },
          {
            "type": "command",
            "command": "bash .claude/hooks/validate-skill-change.sh",
            "timeout": 5
          }
        ]
      }
--- a/.claude/skills/skill-test/SKILL.md
+++ b/.claude/skills/skill-test/SKILL.md
@@ -0,0 +1,290 @@
 ---
 name: skill-test
 description: "Validate skill files for structural compliance and behavioral correctness. Three modes: static (linter), spec (behavioral), audit (coverage report)."
 argument-hint: "static [skill-name | all] | spec [skill-name] | audit"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write
 context: fork
 ---
 # Skill Test
 Validates `.claude/skills/*/SKILL.md` files for structural compliance and
 behavioral correctness. No external dependencies — runs entirely within the
 existing skill/hook/template architecture.
 **Three modes:**
 | Mode | Command | Purpose | Token Cost |
 |------|---------|---------|------------|
 | `static` | `/skill-test static [name\|all]` | Structural linter — 7 compliance checks per skill | Low (~1k/skill) |
 | `spec` | `/skill-test spec [name]` | Behavioral verifier — evaluates assertions in test spec | Medium (~5k/skill) |
 | `audit` | `/skill-test audit` | Coverage report — which skills have specs, last test dates | Low (~2k total) |
 ---
 ## Phase 1: Parse Arguments
 Determine mode from the first argument:
 - `static [name]` → run 7 structural checks on one skill
 - `static all` → run 7 structural checks on all skills (Glob `.claude/skills/*/SKILL.md`)
 - `spec [name]` → read skill + test spec, evaluate assertions
 - `audit` (or no argument) → read catalog, list all skills, show coverage
 If argument is missing or unrecognized, output usage and stop.
 ---
 ## Phase 2A: Static Mode — Structural Linter
 For each skill being tested, read its `SKILL.md` fully and run all 7 checks:
 ### Check 1 — Required Frontmatter Fields
 The file must contain all of these in the YAML frontmatter block:
 - `name:`
 - `description:`
 - `argument-hint:`
 - `user-invocable:`
 - `allowed-tools:`
 **FAIL** if any are absent.
 ### Check 2 — Multiple Phases
 The skill must have ≥2 numbered phase headings. Look for patterns like:
 - `## Phase N` or `## Phase N:`
 - `## N.` (numbered top-level sections)
 - At least 2 distinct `##` headings if phases aren't explicitly numbered
 **FAIL** if fewer than 2 phase-like headings are found.
 ### Check 3 — Verdict Keywords
 The skill must contain at least one of: `PASS`, `FAIL`, `CONCERNS`, `APPROVED`,
 `BLOCKED`, `COMPLETE`, `READY`, `COMPLIANT`, `NON-COMPLIANT`
 **FAIL** if none are present.
 ### Check 4 — Collaborative Protocol Language
 The skill must contain ask-before-write language. Look for:
 - `"May I write"` (canonical form)
 - `"before writing"` or `"approval"` near file-write instructions
 - `"ask"` + `"write"` in close proximity (within same section)
 **WARN** if absent (some read-only skills legitimately skip this).
 **FAIL** if `allowed-tools` includes `Write` or `Edit` but no ask-before-write language is found.
 ### Check 5 — Next-Step Handoff
 The skill must end with a recommended next action or follow-up path. Look for:
 - A final section mentioning another skill (e.g., `/story-done`, `/gate-check`)
 - "Recommended next" or "next step" phrasing
 - A "Follow-Up" or "After this" section
 **WARN** if absent.
 ### Check 6 — Fork Context Complexity
 If frontmatter contains `context: fork`, the skill should have ≥5 phase headings
 (`##` level or numbered Phase N headers). Fork context is for complex multi-phase
 skills; simple skills should not use it.
 **WARN** if `context: fork` is set but fewer than 5 phases found.
 ### Check 7 — Argument Hint Plausibility
 `argument-hint` must be non-empty. If the skill body mentions multiple modes
 (e.g., "Mode A | Mode B"), the hint should reflect them. Cross-reference the
 hint against the first phase's "Parse Arguments" section.
 **WARN** if hint is `""` or if documented modes don't match hint.
 ---
 ### Static Mode Output Format
 For a single skill:
 ```
 === Skill Static Check: /[name] ===
 Check 1 — Frontmatter Fields:    PASS
 Check 2 — Multiple Phases:       PASS (7 phases found)
 Check 3 — Verdict Keywords:      PASS (PASS, FAIL, CONCERNS)
 Check 4 — Collaborative Protocol: PASS ("May I write" found)
 Check 5 — Next-Step Handoff:     WARN (no follow-up section found)
 Check 6 — Fork Context Complexity: PASS (8 phases, context: fork set)
 Check 7 — Argument Hint:         PASS
 Verdict: WARNINGS (1 warning, 0 failures)
 Recommended: Add a "Follow-Up Actions" section at the end of the skill.
 ```
 For `static all`, produce a summary table then list any non-compliant skills:
 ```
 === Skill Static Check: All 52 Skills ===
 Skill                  | Result       | Issues
 -----------------------|--------------|-------
 gate-check             | COMPLIANT    |
 design-review          | COMPLIANT    |
 story-readiness        | WARNINGS     | Check 5: no handoff
 ...
 Summary: 48 COMPLIANT, 3 WARNINGS, 1 NON-COMPLIANT
 Aggregate Verdict: N WARNINGS / N FAILURES
 ```
 ---
 ## Phase 2B: Spec Mode — Behavioral Verifier
 ### Step 1 — Locate Files
 Find skill at `.claude/skills/[name]/SKILL.md`.
 Find spec at `tests/skills/[name].md`.
 If either is missing:
 - Missing skill: "Skill '[name]' not found in `.claude/skills/`."
 - Missing spec: "No test spec found for '[name]'. Run `/skill-test audit` to see
  coverage gaps, or create a spec using the template at
  `.claude/docs/templates/skill-test-spec.md`."
 ### Step 2 — Read Both Files
 Read the skill file and test spec file completely.
 ### Step 3 — Evaluate Assertions
 For each **Test Case** in the spec:
 1. Read the **Fixture** description (assumed state of project files)
 2. Read the **Expected behavior** steps
 3. Read each **Assertion** checkbox
 For each assertion, evaluate whether the skill's written instructions, if
 followed correctly given the fixture state, would satisfy it. This is a
 Claude-evaluated reasoning check, not code execution.
 Mark each assertion:
 - **PASS** — skill instructions clearly satisfy this assertion
 - **PARTIAL** — skill instructions partially address it, but with ambiguity
 - **FAIL** — skill instructions would NOT satisfy this assertion given the fixture
 For **Protocol Compliance** assertions (always present):
 - Check whether the skill requires "May I write" before file writes
 - Check whether the skill presents findings before requesting approval
 - Check whether the skill ends with a recommended next step
 - Check whether the skill avoids auto-creating files without approval
 ### Step 4 — Build Report
 ```
 === Skill Spec Test: /[name] ===
 Date: [date]
 Spec: tests/skills/[name].md
 Case 1: [Happy Path — name]
  Fixture: [summary]
  Assertions:
    [PASS] [assertion text]
    [FAIL] [assertion text]
       Reason: The skill's Phase 3 says "..." but the fixture state means "..."
  Case Verdict: FAIL
 Case 2: [Edge Case — name]
  ...
  Case Verdict: PASS
 Protocol Compliance:
  [PASS] Uses "May I write" before file writes
  [PASS] Presents findings before asking approval
  [WARN] No explicit next-step handoff at end
 Overall Verdict: FAIL (1 case failed, 1 warning)
 ```
 ### Step 5 — Offer to Write Results
 "May I write these results to `tests/results/skill-test-spec-[name]-[date].md`
 and update `tests/skills/catalog.yaml`?"
 If yes:
 - Write results file to `tests/results/`
 - Update the skill's entry in `tests/skills/catalog.yaml`:
  - `last_spec: [date]`
  - `last_spec_result: PASS|PARTIAL|FAIL`
 ---
 ## Phase 2C: Audit Mode — Coverage Report
 ### Step 1 — Read Catalog
 Read `tests/skills/catalog.yaml`. If missing, note that catalog doesn't exist
 yet (first-run state).
 ### Step 2 — Enumerate All Skills
 Glob `.claude/skills/*/SKILL.md` to get the complete list of skills.
 Extract skill name from each path (directory name).
 ### Step 3 — Build Coverage Table
 For each skill:
 - Check if a spec file exists at `tests/skills/[name].md`
 - Look up `last_static`, `last_static_result`, `last_spec`, `last_spec_result`
  from catalog (or mark as "never" if not in catalog)
 - Assign priority:
  - `critical` — gate-check, design-review, story-readiness, story-done, review-all-gdds, architecture-review
  - `high` — create-epics-stories, create-control-manifest, propagate-design-change, story-done
  - `medium` — team-* skills, sprint-plan, sprint-status
  - `low` — all others
 ### Step 4 — Output Report
 ```
 === Skill Test Coverage Audit ===
 Date: [date]
 Total skills: 52
 Specs written: 4 (7.7%)
 Never tested (static): 48
 Coverage Table:
 Skill                  | Has Spec | Last Static      | Static Result | Last Spec        | Spec Result | Priority
 -----------------------|----------|------------------|---------------|------------------|-------------|----------
 gate-check             | YES      | never            | —             | never            | —           | critical
 design-review          | YES      | never            | —             | never            | —           | critical
 story-readiness        | YES      | never            | —             | never            | —           | critical
 story-done             | YES      | never            | —             | never            | —           | critical
 architecture-review    | NO       | never            | —             | never            | —           | critical
 review-all-gdds        | NO       | never            | —             | never            | —           | critical
 ...
 Top 5 Priority Gaps (no spec, critical/high priority):
 1. /architecture-review — critical, no spec
 2. /review-all-gdds — critical, no spec
 3. /create-epics-stories — high, no spec
 4. /propagate-design-change — high, no spec
 5. /sprint-plan — medium, no spec
 Coverage: 4/52 specs (7.7%)
 ```
 No file writes in audit mode.
 Offer: "Would you like to run `/skill-test static all` to check structural
 compliance across all skills? Or `/skill-test spec [name]` to run a specific
 behavioral test?"
 ---
 ## Phase 3: Recommended Next Steps
 After any mode completes, offer contextual follow-up:
 - After `static [name]`: "Run `/skill-test spec [name]` to validate behavioral
  correctness if a test spec exists."
 - After `static all` with failures: "Address NON-COMPLIANT skills first. Run
  `/skill-test static [name]` individually for detailed remediation guidance."
 - After `spec [name]` PASS: "Update `tests/skills/catalog.yaml` to record this
  pass date. Consider running `/skill-test audit` to find the next spec gap."
 - After `spec [name]` FAIL: "Review the failing assertions and update the skill
  or the test spec to resolve the mismatch."
 - After `audit`: "Start with the critical-priority gaps. Use the spec template
  at `.claude/docs/templates/skill-test-spec.md` to create new specs."
--- a/tests/skills/_fixtures/incomplete-gdd.md
+++ b/tests/skills/_fixtures/incomplete-gdd.md
@@ -0,0 +1,51 @@
 # GDD: Light Manipulation System
 ## Overview
 The light manipulation system allows players to interact with bioluminescent
 organisms and ancient light conduits to redirect beams of light. Light beams
 illuminate dark areas, power ancient mechanisms, and reveal hidden surfaces.
 ## Player Fantasy
 The player should feel like a puzzle archaeologist — discovering the logic of
 an alien but internally consistent technology. The "aha" moment when a complex
 light path clicks into place should feel earned and satisfying.
 ## Detailed Rules
 - Players can pick up portable light sources (max 3 carried at once)
 - Stationary conduits redirect beams at fixed angles (45°/90°/135°/180°)
 - Light beams are blocked by solid terrain and most objects
 - Living bioluminescent organisms pulse light on a 3-second cycle
 - Ancient mirrors rotate freely and redirect any light beam that touches them
 - A beam must reach a receptor to activate a mechanism
 ## Formulas
 [SECTION MISSING — not yet authored]
 ## Edge Cases
 [SECTION MISSING — not yet authored]
 ## Dependencies
 - **Oxygen System**: Light sources consume no oxygen but picking them up takes
  time (opportunity cost with oxygen drain)
 - **Cave Navigation**: Illuminated paths reveal branching routes not visible
  in darkness
 - Player Inventory System (not yet designed)
 ## Tuning Knobs
 [SECTION MISSING — not yet authored]
 ## Acceptance Criteria
 [SECTION MISSING — not yet authored]
 ---
 *Status: Draft — 4/8 required sections populated*
 *Last updated: 2026-03-13*
--- a/tests/skills/_fixtures/minimal-game-concept.md
+++ b/tests/skills/_fixtures/minimal-game-concept.md
@@ -0,0 +1,62 @@
 # Game Concept: Echoes of the Deep
 ## Overview
 Echoes of the Deep is a single-player atmospheric puzzle-platformer set in
 a bioluminescent underwater cave network. Players control a deep-sea diver
 exploring ancient ruins while managing oxygen supplies and manipulating light
 sources to reveal hidden paths and solve environmental puzzles.
 ## Player Fantasy
 The player should feel like a lone explorer uncovering a lost civilization,
 experiencing wonder at beautiful environments, and the satisfying "aha" moment
 when a clever puzzle clicks into place. The oxygen mechanic creates gentle
 pressure without punishing failure harshly.
 ## Core Loop
 1. **Explore** — navigate branching cave sections using light and movement
 2. **Discover** — find oxygen caches, light sources, and ancient mechanisms
 3. **Solve** — manipulate light and environment to unlock new areas
 4. **Progress** — unlock deeper cave sections with escalating complexity
 ## Game Pillars
 1. **Wonder** — every area should contain something visually or mechanically surprising
 2. **Accessibility** — the game should be completable without frustration; oxygen
   manages pacing, not punishment
 3. **Environmental Storytelling** — the ruins tell a story without text exposition
 ## Target Audience
 Casual-to-midcore players who enjoy relaxed exploration games (Subnautica,
 Journey, ABZÛ) and puzzle games that reward observation over reflexes.
 Target age: 16+. Target sessions: 30–90 minutes.
 ## Unique Selling Points
 - Bioluminescent light manipulation as the core puzzle mechanic
 - No enemies — tension comes from environment and resource management
 - Procedurally decorated (handcrafted levels, procedural detail pass)
 ## Technical Scope
 - **Engine**: Godot 4.6
 - **Platform**: PC (Steam), with console ports post-launch
 - **Team size**: Solo developer
 - **Target completion**: 12-month development cycle
 - **Scope**: 4–6 hours main story, 8–12 hours completionist
 ## Art Direction
 Darkly atmospheric with vibrant bioluminescence providing the primary color
 palette. Deep blues, purples, and blacks punctuated by greens, teals, and
 ambers from living organisms and ancient technology.
 ## Fun Hypothesis
 Players will feel rewarded by the combination of visual beauty and the
 satisfying moment of discovering how light manipulation solves each puzzle.
 The oxygen system will create just enough pressure to make exploration feel
 meaningful without making death feel punishing.
--- a/tests/skills/catalog.yaml
+++ b/tests/skills/catalog.yaml
@@ -0,0 +1,438 @@
 version: 1
 last_updated: ""
 skills:
  # Critical — gate skills that control phase transitions
  - name: gate-check
    spec: tests/skills/gate-check.md
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: critical
  - name: design-review
    spec: tests/skills/design-review.md
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: critical
  - name: story-readiness
    spec: tests/skills/story-readiness.md
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: critical
  - name: story-done
    spec: tests/skills/story-done.md
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: critical
  - name: review-all-gdds
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: critical
  - name: architecture-review
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: critical
  # High — pipeline-critical skills
  - name: create-epics-stories
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: high
  - name: create-control-manifest
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: high
  - name: propagate-design-change
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: high
  - name: architecture-decision
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: high
  - name: map-systems
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: high
  - name: design-system
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: high
  # Medium — team and sprint management skills
  - name: sprint-plan
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: medium
  - name: sprint-status
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: medium
  - name: team-ui
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: medium
  - name: team-combat
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: medium
  - name: team-narrative
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: medium
  - name: team-audio
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: medium
  - name: team-level
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: medium
  - name: team-polish
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: medium
  - name: team-release
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: medium
  - name: team-live-ops
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: medium
  # Low — analysis, reporting, utility skills
  - name: skill-test
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: medium
  - name: start
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: help
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: brainstorm
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: project-stage-detect
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: setup-engine
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: quick-design
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: ux-design
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: ux-review
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: code-review
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: balance-check
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: asset-audit
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: reverse-document
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: create-architecture
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: content-audit
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: bug-report
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: hotfix
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: prototype
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: playtest-report
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: perf-profile
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: tech-debt
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: scope-check
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: estimate
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: milestone-review
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: retrospective
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: changelog
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: patch-notes
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: onboard
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: localize
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: launch-checklist
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: release-checklist
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
  - name: adopt
    spec: ""
    last_static: ""
    last_static_result: ""
    last_spec: ""
    last_spec_result: ""
    priority: low
--- a/tests/skills/design-review.md
+++ b/tests/skills/design-review.md
@@ -0,0 +1,144 @@
 # Skill Test Spec: /design-review
 ## Skill Summary
 `/design-review` reads a game design document (GDD) and evaluates it against
 the project's 8-section design standard (Overview, Player Fantasy, Detailed
 Rules, Formulas, Edge Cases, Dependencies, Tuning Knobs, Acceptance Criteria).
 It checks for internal consistency, implementability, and cross-system
 conflicts. It produces a verdict of APPROVED, NEEDS REVISION, or MAJOR
 REVISION NEEDED. It is a read-only skill (no file writes) and runs as a
 `context: fork` subagent.
 ---
 ## Static Assertions (Structural)
 Verified automatically by `/skill-test static` — no fixture needed.
 - [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
 - [ ] Has ≥2 phase headings or numbered steps
 - [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
 - [ ] Does NOT require "May I write" language (read-only skill — `allowed-tools` excludes Write/Edit)
 - [ ] Output format is documented (review template shown in skill body)
 ---
 ## Test Cases
 ### Case 1: Happy Path — Complete GDD, all 8 sections present
 **Fixture:**
 - `design/gdd/light-manipulation.md` exists (use `_fixtures/minimal-game-concept.md`
  as a stand-in — represents a complete document with all required content)
 - All 8 required sections are populated with substantive content
 - Formulas section contains at least one formula with defined variables
 - Acceptance Criteria section contains at least 3 testable criteria
 **Input:** `/design-review design/gdd/light-manipulation.md`
 **Expected behavior:**
 1. Skill reads the target document in full
 2. Skill reads CLAUDE.md for project context and standards
 3. Skill evaluates all 8 required sections (present/absent check)
 4. Skill checks internal consistency (formulas match described behavior)
 5. Skill checks implementability (rules are precise enough to code)
 6. Skill outputs structured review with section-by-section status
 7. Skill outputs APPROVED verdict
 **Assertions:**
 - [ ] Skill reads the target file before producing any output
 - [ ] Output includes a "Completeness" section showing X/8 sections present
 - [ ] Output includes an "Internal Consistency" section
 - [ ] Output includes an "Implementability" section
 - [ ] Output ends with a verdict line: APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED
 - [ ] APPROVED verdict is given when all 8 sections are present and consistent
 ---
 ### Case 2: Failure Path — Incomplete GDD (4/8 sections)
 **Fixture:**
 - `design/gdd/light-manipulation.md` exists using content from
  `tests/skills/_fixtures/incomplete-gdd.md` (4 of 8 sections populated;
  Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria are missing)
 **Input:** `/design-review design/gdd/light-manipulation.md`
 **Expected behavior:**
 1. Skill reads the document
 2. Skill identifies 4 missing sections
 3. Skill outputs "Completeness: 4/8 sections present"
 4. Skill lists specifically which 4 sections are missing
 5. Skill outputs MAJOR REVISION NEEDED verdict (not APPROVED or NEEDS REVISION)
 **Assertions:**
 - [ ] Output shows "4/8" in the completeness section (not a higher number)
 - [ ] Output explicitly names each missing section (Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria)
 - [ ] Verdict is MAJOR REVISION NEEDED (not APPROVED or NEEDS REVISION) when ≥3 sections are missing
 - [ ] Output does not suggest the document is implementation-ready
 - [ ] Skill does not write any files (read-only enforcement)
 ---
 ### Case 3: Partial Path — 7/8 sections, minor inconsistency
 **Fixture:**
 - GDD has all sections except Formulas
 - The described behavior mentions numeric values but no formulas are defined
 - Acceptance Criteria exist but are vague ("feels good" rather than measurable)
 **Input:** `/design-review design/gdd/[document].md`
 **Expected behavior:**
 1. Skill identifies missing Formulas section
 2. Skill flags vague acceptance criteria as an implementability issue
 3. Skill outputs NEEDS REVISION verdict (not APPROVED, not MAJOR REVISION NEEDED)
 4. Skill provides specific remediation notes for each issue
 **Assertions:**
 - [ ] Verdict is NEEDS REVISION (not APPROVED, not MAJOR REVISION NEEDED) for 7/8 with issues
 - [ ] Output identifies the missing Formulas section specifically
 - [ ] Output flags the vague acceptance criteria as an implementability gap
 - [ ] Each flagged issue has a specific, actionable remediation note
 ---
 ### Case 4: Edge Case — File not found
 **Fixture:**
 - The path provided does not exist in the project
 **Input:** `/design-review design/gdd/nonexistent.md`
 **Expected behavior:**
 1. Skill attempts to read the file
 2. File not found
 3. Skill outputs an error message naming the missing file
 4. Skill suggests checking the path or listing files in `design/gdd/`
 5. Skill does NOT produce a verdict
 **Assertions:**
 - [ ] Skill outputs a clear error when the file is not found
 - [ ] Skill does NOT output APPROVED, NEEDS REVISION, or MAJOR REVISION NEEDED when file is missing
 - [ ] Skill suggests a corrective action (check path, list available GDDs)
 ---
 ## Protocol Compliance
 - [ ] Does NOT use Write or Edit tools (read-only skill)
 - [ ] Presents complete findings before any verdict
 - [ ] Does not ask for approval before producing output (no writes to approve)
 - [ ] Ends with recommended next step (e.g., fix issues and re-run, or proceed to `/map-systems`)
 ---
 ## Coverage Notes
 - Cross-system consistency checking (Case 3 in the skill's own phase list) is
  not directly tested here because it requires multiple GDD files to compare;
  this is covered by the `/review-all-gdds` spec instead.
 - The skill's `context: fork` behavior (running as a subagent) is not tested
  at the spec level — this is a runtime behavior verified manually.
 - Performance and edge cases involving very large GDD files are not in scope.
--- a/tests/skills/gate-check.md
+++ b/tests/skills/gate-check.md
@@ -0,0 +1,144 @@
 # Skill Test Spec: /gate-check
 ## Skill Summary
 `/gate-check` validates whether the project is ready to advance to the next
 development phase. It checks for required artifacts, runs quality checks, asks
 the user about unverifiable items, and produces a PASS/CONCERNS/FAIL verdict.
 On PASS with user confirmation, it writes the new stage name to
 `production/stage.txt`. It governs all 6 phase transitions and is the most
 critical gate-keeping skill in the pipeline.
 ---
 ## Static Assertions (Structural)
 Verified automatically by `/skill-test static` — no fixture needed.
 - [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
 - [ ] Has ≥2 phase headings (numbered Phase N or ## sections)
 - [ ] Contains verdict keywords: PASS, CONCERNS, FAIL
 - [ ] Contains "May I write" collaborative protocol language
 - [ ] Has a next-step handoff at the end (Follow-Up Actions section)
 ---
 ## Test Cases
 ### Case 1: Happy Path — All Concept artifacts present, advancing to Systems Design
 **Fixture:**
 - `design/gdd/game-concept.md` exists, has content including all required sections
 - `design/gdd/game-pillars.md` exists (or pillars defined within concept doc)
 - No systems index yet (which is correct for this stage)
 **Input:** `/gate-check systems-design`
 **Expected behavior:**
 1. Skill reads `design/gdd/game-concept.md` and verifies it has content
 2. Skill checks for game pillars (in concept or separate file)
 3. Skill checks quality items (core loop described, target audience identified)
 4. Skill outputs structured checklist with all items marked
 5. Skill presents PASS/CONCERNS/FAIL verdict
 6. If PASS: skill asks "May I update `production/stage.txt` to 'Systems Design'?"
 **Assertions:**
 - [ ] Skill uses Glob or Read to verify `design/gdd/game-concept.md` exists before marking it checked
 - [ ] Output includes a "Required Artifacts" section with check status per item
 - [ ] Output includes a "Quality Checks" section with check status per item
 - [ ] Output includes a "Verdict" line with one of PASS / CONCERNS / FAIL
 - [ ] Skill asks about unverifiable quality items (e.g., "Has this been reviewed?") rather than assuming PASS
 - [ ] Skill asks "May I write" before updating `production/stage.txt`
 - [ ] Skill does NOT write `production/stage.txt` without explicit user confirmation
 ---
 ### Case 2: Failure Path — Missing required artifacts for Concept → Systems Design
 **Fixture:**
 - `design/gdd/game-concept.md` does NOT exist
 - No game pillars document exists
 - `design/gdd/` directory is empty or absent
 **Input:** `/gate-check systems-design`
 **Expected behavior:**
 1. Skill attempts to read `design/gdd/game-concept.md` — file not found
 2. Skill marks required artifact as missing (not present)
 3. Skill outputs FAIL verdict
 4. Skill lists blocker: "No game concept document found"
 5. Skill suggests remediation: run `/brainstorm` to create one
 **Assertions:**
 - [ ] Verdict is FAIL (not PASS or CONCERNS) when required artifacts are missing
 - [ ] Output explicitly names `design/gdd/game-concept.md` as missing
 - [ ] Output includes a "Blockers" section with at least 1 item
 - [ ] Output recommends `/brainstorm` as the remediation action
 - [ ] Skill does NOT write `production/stage.txt` when verdict is FAIL
 ---
 ### Case 3: No Argument — Auto-detect current stage
 **Fixture:**
 - `production/stage.txt` contains `Concept`
 - `design/gdd/game-concept.md` exists with content
 - No systems index yet
 **Input:** `/gate-check` (no argument)
 **Expected behavior:**
 1. Skill reads `production/stage.txt` to determine current stage
 2. Skill determines the next gate is Concept → Systems Design
 3. Skill proceeds with the Systems Design gate checks
 4. Output clearly states which transition is being validated
 **Assertions:**
 - [ ] Skill reads `production/stage.txt` (or uses project-stage-detect heuristics) to determine current stage
 - [ ] Output header names both current and target phases (e.g., "Gate Check: Concept → Systems Design")
 - [ ] Skill does not ask the user which gate to check if current stage is determinable
 ---
 ### Case 4: Edge Case — Manual check items flagged correctly
 **Fixture:**
 - All required artifacts for Concept → Systems Design are present
 - No playtest or review record exists (can't auto-verify quality checks)
 **Input:** `/gate-check systems-design`
 **Expected behavior:**
 1. Skill verifies all artifact files exist
 2. Skill encounters quality check: "Game concept reviewed (not MAJOR REVISION NEEDED)"
 3. Since no review record exists, skill marks item as MANUAL CHECK NEEDED
 4. Skill asks the user: "Has the game concept been reviewed for design quality?"
 5. Skill waits for user input before finalizing verdict
 **Assertions:**
 - [ ] Items that cannot be auto-verified are marked `[?] MANUAL CHECK NEEDED` rather than assumed PASS
 - [ ] Skill uses a question to the user for at least one unverifiable quality item
 - [ ] Skill does not mark unverifiable items as PASS by default
 ---
 ## Protocol Compliance
 - [ ] Uses "May I write" before updating `production/stage.txt`
 - [ ] Presents the full checklist report before asking for write approval
 - [ ] Ends with a "Follow-Up Actions" section listing next steps per verdict
 - [ ] Never advances the stage without explicit user confirmation
 - [ ] Never auto-creates `production/stage.txt` if it doesn't exist without asking
 ---
 ## Coverage Notes
 - The Production → Polish and Polish → Release gates are not covered here
  because they require complex multi-artifact setups (sprint plans, playtest
  data, QA sign-off); these are deferred to dedicated follow-up specs.
 - The "CONCERNS" verdict path (minor gaps, not blocking) is not explicitly
  tested here; it falls between Case 1 and Case 2 and follows the same pattern.
 - The Vertical Slice validation block (Pre-Production → Production gate) is not
  covered because it requires a playable build context that cannot be expressed
  as a document fixture.
--- a/tests/skills/story-done.md
+++ b/tests/skills/story-done.md
@@ -0,0 +1,165 @@
 # Skill Test Spec: /story-done
 ## Skill Summary
 `/story-done` closes the loop between design and implementation. Run at the
 end of implementing a story, it reads the story file and verifies each
 acceptance criterion against the implementation. It checks for GDD and ADR
 deviations, prompts a code review, updates the story status to `Complete`,
 logs any tech debt, and surfaces the next ready story from the sprint. It
 produces a COMPLETE / COMPLETE WITH NOTES / BLOCKED verdict and writes to
 the story file and optionally to `docs/tech-debt-register.md`.
 ---
 ## Static Assertions (Structural)
 Verified automatically by `/skill-test static` — no fixture needed.
 - [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
 - [ ] Has ≥5 phase headings (complex skill warranting `context: fork` if applicable)
 - [ ] Contains verdict keywords: COMPLETE, BLOCKED
 - [ ] Contains "May I write" collaborative protocol language (writes to story file and tech-debt register)
 - [ ] Has a next-step handoff (surfaces next story from sprint)
 ---
 ## Test Cases
 ### Case 1: Happy Path — All acceptance criteria met, no deviations
 **Fixture:**
 - Story file at `production/epics/core/story-light-pickup.md` with:
  - 3 acceptance criteria, all implemented as described
  - `TR-ID: TR-light-001` referencing a GDD requirement
  - `ADR: docs/architecture/adr-003-inventory.md` (Accepted)
  - `Status: In Progress`
 - Implementation files listed in story exist in `src/`
 - GDD requirement text at TR-light-001 matches how the feature was implemented
 - ADR guidance was followed (no deviations)
 **Input:** `/story-done production/epics/core/story-light-pickup.md`
 **Expected behavior:**
 1. Skill reads the story file and extracts all key fields
 2. Skill reads the GDD requirement fresh from `tr-registry.yaml` (not from story's quoted text)
 3. Skill reads the referenced ADR to understand implementation constraints
 4. Skill evaluates each acceptance criterion (auto where possible, manual prompt where not)
 5. Skill checks for GDD requirement deviations
 6. Skill checks for ADR guideline deviations
 7. Skill prompts user: "Please provide the code review outcome for this story"
 8. Skill presents COMPLETE verdict
 9. Skill asks "May I update story Status to Complete and add Completion Notes?"
 10. If yes: skill updates the story file
 11. Skill surfaces the next `Ready for Dev` story from the sprint
 **Assertions:**
 - [ ] Skill reads `docs/architecture/tr-registry.yaml` for TR-ID requirement text (not just story)
 - [ ] Skill reads the referenced ADR file (not just the story reference)
 - [ ] Each acceptance criterion is listed with VERIFIED / DEFERRED / FAILED status
 - [ ] Skill prompts the user for code review outcome (does not skip this step)
 - [ ] Verdict is COMPLETE when all criteria are verified and no deviations exist
 - [ ] Skill asks "May I write" before updating the story file
 - [ ] Skill does NOT auto-update story status without user confirmation
 - [ ] After completion, skill surfaces the next ready story from `production/sprints/`
 ---
 ### Case 2: Blocked Path — Acceptance criterion cannot be verified
 **Fixture:**
 - Story file has an acceptance criterion: "Player sees correct animation on pickup"
 - No automated test for this criterion exists
 - Manual verification has not been performed
 - All other criteria are met
 **Input:** `/story-done production/epics/core/story-light-pickup.md`
 **Expected behavior:**
 1. Skill processes all acceptance criteria
 2. Reaches the animation criterion — cannot auto-verify
 3. Skill asks the user: "Acceptance criterion 'Player sees correct animation on
   pickup' cannot be auto-verified. Has this been manually tested?"
 4. If user says No: criterion is marked DEFERRED, verdict becomes COMPLETE WITH NOTES
 5. Skill records the deferred criterion in completion notes
 6. Asks "May I write updated story with deferred criterion noted?"
 **Assertions:**
 - [ ] Skill asks the user about unverifiable criteria rather than assuming PASS
 - [ ] Deferred criteria result in COMPLETE WITH NOTES (not COMPLETE or BLOCKED)
 - [ ] The deferred criterion is explicitly named in the completion notes
 - [ ] Skill still asks "May I write" before updating the story file
 ---
 ### Case 3: Blocked Path — GDD deviation detected
 **Fixture:**
 - Story TR-ID points to requirement: "Player can carry max 3 light sources"
 - Implementation in `src/` uses a variable `MAX_CARRIED_LIGHTS = 5`
 - This is a deliberate deviation from the GDD
 **Input:** `/story-done production/epics/core/story-light-pickup.md`
 **Expected behavior:**
 1. Skill reads the GDD requirement text (max 3)
 2. Skill detects discrepancy between requirement and implementation value (5)
 3. Skill flags this as a GDD deviation and asks the user to classify it:
   - INTENTIONAL: document the deviation and reason
   - ERROR: implementation must be fixed before story can be marked Complete
   - OUT OF SCOPE: requirement changed and GDD needs updating
 4. If INTENTIONAL: skill records deviation in completion notes, verdict is COMPLETE WITH NOTES
 5. If ERROR: verdict is BLOCKED until implementation is corrected
 **Assertions:**
 - [ ] Skill detects the mismatch between GDD requirement and implementation value
 - [ ] Skill asks the user to classify the deviation (not auto-assumes either way)
 - [ ] INTENTIONAL deviation → COMPLETE WITH NOTES (not BLOCKED)
 - [ ] ERROR deviation → BLOCKED verdict until fixed
 - [ ] Detected deviations are recorded in completion notes or tech debt register
 ---
 ### Case 4: Edge Case — No argument, auto-detect current story
 **Fixture:**
 - `production/session-state/active.md` contains a reference to
  `production/epics/core/story-oxygen-drain.md` as the active story
 - That story file exists with `Status: In Progress`
 **Input:** `/story-done` (no argument)
 **Expected behavior:**
 1. Skill reads `production/session-state/active.md`
 2. Skill finds the active story reference
 3. Skill reads that story file and proceeds normally
 4. Output confirms which story was auto-detected
 **Assertions:**
 - [ ] Skill reads `production/session-state/active.md` when no argument is given
 - [ ] Skill identifies and confirms the auto-detected story before proceeding
 - [ ] If no story is found in session state, skill asks the user to provide a path
 ---
 ## Protocol Compliance
 - [ ] Uses "May I write" before updating the story file
 - [ ] Uses "May I write" before adding entries to `docs/tech-debt-register.md`
 - [ ] Presents complete findings (criteria check, deviation check) before asking approval
 - [ ] Ends by surfacing the next ready story from the sprint plan
 - [ ] Does not mark a story Complete if any criteria are in ERROR state
 - [ ] Does not skip the code review prompt
 ---
 ## Coverage Notes
 - The full 8-phase flow of the skill is exercised across Cases 1-3; not all
  edge cases within each phase are covered.
 - Tech debt logging (deferred items written to `docs/tech-debt-register.md`)
  is mentioned in Case 2 but not the primary assertion focus; dedicated
  coverage deferred.
 - The `sprint-status.yaml` update (Phase 7 in the skill) is implied by Case 1
  but not the primary assertion; assumed to follow the same "May I write" pattern.
 - Stories with multiple TR-IDs or multiple ADRs are not explicitly tested.
--- a/tests/skills/story-readiness.md
+++ b/tests/skills/story-readiness.md
@@ -0,0 +1,153 @@
 # Skill Test Spec: /story-readiness
 ## Skill Summary
 `/story-readiness` validates that a story file is ready for a developer to
 pick up and implement. It checks four dimensions: Design (embedded GDD
 requirements), Architecture (ADR references and status), Scope (clear
 boundaries and DoD), and Definition of Done (testable criteria). It produces
 a READY / NEEDS WORK / BLOCKED verdict. It is a read-only skill and runs
 before any developer picks up a story.
 ---
 ## Static Assertions (Structural)
 Verified automatically by `/skill-test static` — no fixture needed.
 - [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
 - [ ] Has ≥2 phase headings or numbered check sections
 - [ ] Contains verdict keywords: READY, NEEDS WORK, BLOCKED
 - [ ] Does NOT require "May I write" language (read-only skill)
 - [ ] Has a next-step handoff (what to do after verdict)
 ---
 ## Test Cases
 ### Case 1: Happy Path — Fully ready story
 **Fixture:**
 - Story file exists at `production/epics/core/story-light-pickup.md`
 - Story contains:
  - `TR-ID: TR-light-001` (GDD requirement reference)
  - `ADR: docs/architecture/adr-003-inventory.md`
  - Referenced ADR exists and has status `Accepted`
  - Referenced TR-ID exists in `docs/architecture/tr-registry.yaml`
  - Story has `## Acceptance Criteria` with ≥3 testable items
  - Story has `## Definition of Done` section
  - Story has `Status: Ready for Dev`
  - Manifest version in story header matches current `docs/architecture/control-manifest.md`
 **Input:** `/story-readiness production/epics/core/story-light-pickup.md`
 **Expected behavior:**
 1. Skill reads the story file
 2. Skill reads the referenced ADR — verifies status is `Accepted`
 3. Skill reads `docs/architecture/tr-registry.yaml` — verifies TR-ID exists
 4. Skill reads `docs/architecture/control-manifest.md` — verifies manifest version matches
 5. Skill evaluates all 4 dimensions (Design, Architecture, Scope, DoD)
 6. Skill outputs READY verdict with all checks passing
 **Assertions:**
 - [ ] Skill reads the referenced ADR file (not just the story)
 - [ ] Skill verifies ADR status is `Accepted` (not `Proposed`)
 - [ ] Skill reads `tr-registry.yaml` to verify TR-ID exists
 - [ ] Output includes check results for all 4 dimensions
 - [ ] Verdict is READY when all checks pass
 - [ ] Skill does not write any files
 ---
 ### Case 2: Blocked Path — Referenced ADR is Proposed (not Accepted)
 **Fixture:**
 - Story file exists with `ADR: docs/architecture/adr-005-light-system.md`
 - `adr-005-light-system.md` exists but has `Status: Proposed`
 - All other story content is otherwise complete
 **Input:** `/story-readiness production/epics/core/story-light-system.md`
 **Expected behavior:**
 1. Skill reads the story
 2. Skill reads `adr-005-light-system.md` — finds `Status: Proposed`
 3. Skill flags this as a BLOCKING issue (cannot implement against unaccepted ADR)
 4. Skill outputs BLOCKED verdict
 5. Skill recommends: accept or reject the ADR before picking up the story
 **Assertions:**
 - [ ] Verdict is BLOCKED (not NEEDS WORK or READY) when ADR is Proposed
 - [ ] Output explicitly names the Proposed ADR as the blocker
 - [ ] Output recommends resolving ADR status before proceeding
 - [ ] Skill does not output READY regardless of other checks passing
 ---
 ### Case 3: Needs Work — Missing Acceptance Criteria
 **Fixture:**
 - Story file exists but has no `## Acceptance Criteria` section
 - ADR reference exists and is `Accepted`
 - TR-ID exists in registry
 - Manifest version matches
 **Input:** `/story-readiness production/epics/core/story-oxygen-drain.md`
 **Expected behavior:**
 1. Skill reads the story
 2. Skill finds no Acceptance Criteria section
 3. Skill flags this as a NEEDS WORK issue (story is incomplete, not blocked)
 4. Skill outputs NEEDS WORK verdict
 5. Skill names the missing section and suggests adding measurable criteria
 **Assertions:**
 - [ ] Verdict is NEEDS WORK (not BLOCKED or READY) when Acceptance Criteria section is absent
 - [ ] Output identifies the missing Acceptance Criteria section specifically
 - [ ] Output suggests adding testable/measurable criteria
 - [ ] Skill distinguishes NEEDS WORK (fixable without external dependencies) from BLOCKED (requires outside action)
 ---
 ### Case 4: Edge Case — Stale manifest version
 **Fixture:**
 - Story file has `Manifest Version: 2026-01-15` in its header
 - `docs/architecture/control-manifest.md` has `Manifest Version: 2026-03-10`
 - Versions do not match (story was created before manifest was updated)
 **Input:** `/story-readiness production/epics/core/story-mirror-rotation.md`
 **Expected behavior:**
 1. Skill reads the story and extracts manifest version `2026-01-15`
 2. Skill reads control manifest header and extracts current version `2026-03-10`
 3. Skill detects version mismatch
 4. Skill flags this as an ADVISORY issue (not blocking, but worth noting)
 5. Verdict is NEEDS WORK with manifest staleness noted
 **Assertions:**
 - [ ] Skill reads `docs/architecture/control-manifest.md` to get current version
 - [ ] Skill compares story's embedded manifest version against current manifest version
 - [ ] Stale manifest version results in NEEDS WORK (not BLOCKED, not READY)
 - [ ] Output explains that the story's embedded guidance may be outdated
 ---
 ## Protocol Compliance
 - [ ] Does NOT use Write or Edit tools (read-only skill)
 - [ ] Presents complete check results before verdict
 - [ ] Does not ask for approval (no file writes)
 - [ ] Ends with recommended next step (fix issues or proceed to implementation)
 - [ ] Distinguishes three verdict levels clearly (READY vs NEEDS WORK vs BLOCKED)
 ---
 ## Coverage Notes
 - Case where TR-ID is missing from the registry entirely is not explicitly
  tested here; it follows the same NEEDS WORK pattern as Case 3.
 - The "no argument" path (skill auto-detecting the current story) is not
  tested because it depends on `production/session-state/active.md` content,
  which is hard to fixture reliably.
 - Stories with multiple ADR references are not tested; behavior is assumed to
  be additive (all ADRs must be Accepted for READY verdict).