Add /skill-test suite: linter, behavioral specs, and coverage catalog for 52 skills

- New skill: /skill-test (static | spec | audit modes) - static: 7-check structural linter per skill file - spec: Claude-evaluated behavioral assertions against test specs - audit: coverage report across all 52 skills with priority gaps - New hook: validate-skill-change.sh — advisory reminder to lint after skill edits - New template: skill-test-spec.md — standard structure for authoring test specs - New: tests/skills/catalog.yaml — machine-readable coverage index (52 skills) - New: tests/skills/_fixtures/ — shared fixtures (complete concept, incomplete GDD) - New: 4 seed test specs for critical gate skills (gate-check, design-review, story-readiness, story-done) — 4 cases each - Modified: settings.json — validate-skill-change.sh added to PostToolUse hook Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-27 13:01:50 +00:00 · 2026-03-13 17:05:08 +11:00
parent cdb1aa83b7
commit af2b864796
11 changed files with 1587 additions and 0 deletions
--- a/tests/skills/_fixtures/incomplete-gdd.md
+++ b/tests/skills/_fixtures/incomplete-gdd.md
@@ -0,0 +1,51 @@
+# GDD: Light Manipulation System
+
+## Overview
+
+The light manipulation system allows players to interact with bioluminescent
+organisms and ancient light conduits to redirect beams of light. Light beams
+illuminate dark areas, power ancient mechanisms, and reveal hidden surfaces.
+
+## Player Fantasy
+
+The player should feel like a puzzle archaeologist — discovering the logic of
+an alien but internally consistent technology. The "aha" moment when a complex
+light path clicks into place should feel earned and satisfying.
+
+## Detailed Rules
+
+- Players can pick up portable light sources (max 3 carried at once)
+- Stationary conduits redirect beams at fixed angles (45°/90°/135°/180°)
+- Light beams are blocked by solid terrain and most objects
+- Living bioluminescent organisms pulse light on a 3-second cycle
+- Ancient mirrors rotate freely and redirect any light beam that touches them
+- A beam must reach a receptor to activate a mechanism
+
+## Formulas
+
+[SECTION MISSING — not yet authored]
+
+## Edge Cases
+
+[SECTION MISSING — not yet authored]
+
+## Dependencies
+
+- **Oxygen System**: Light sources consume no oxygen but picking them up takes
+  time (opportunity cost with oxygen drain)
+- **Cave Navigation**: Illuminated paths reveal branching routes not visible
+  in darkness
+- Player Inventory System (not yet designed)
+
+## Tuning Knobs
+
+[SECTION MISSING — not yet authored]
+
+## Acceptance Criteria
+
+[SECTION MISSING — not yet authored]
+
+---
+
+*Status: Draft — 4/8 required sections populated*
+*Last updated: 2026-03-13*
--- a/tests/skills/_fixtures/minimal-game-concept.md
+++ b/tests/skills/_fixtures/minimal-game-concept.md
@@ -0,0 +1,62 @@
+# Game Concept: Echoes of the Deep
+
+## Overview
+
+Echoes of the Deep is a single-player atmospheric puzzle-platformer set in
+a bioluminescent underwater cave network. Players control a deep-sea diver
+exploring ancient ruins while managing oxygen supplies and manipulating light
+sources to reveal hidden paths and solve environmental puzzles.
+
+## Player Fantasy
+
+The player should feel like a lone explorer uncovering a lost civilization,
+experiencing wonder at beautiful environments, and the satisfying "aha" moment
+when a clever puzzle clicks into place. The oxygen mechanic creates gentle
+pressure without punishing failure harshly.
+
+## Core Loop
+
+1. **Explore** — navigate branching cave sections using light and movement
+2. **Discover** — find oxygen caches, light sources, and ancient mechanisms
+3. **Solve** — manipulate light and environment to unlock new areas
+4. **Progress** — unlock deeper cave sections with escalating complexity
+
+## Game Pillars
+
+1. **Wonder** — every area should contain something visually or mechanically surprising
+2. **Accessibility** — the game should be completable without frustration; oxygen
+   manages pacing, not punishment
+3. **Environmental Storytelling** — the ruins tell a story without text exposition
+
+## Target Audience
+
+Casual-to-midcore players who enjoy relaxed exploration games (Subnautica,
+Journey, ABZÛ) and puzzle games that reward observation over reflexes.
+Target age: 16+. Target sessions: 30–90 minutes.
+
+## Unique Selling Points
+
+- Bioluminescent light manipulation as the core puzzle mechanic
+- No enemies — tension comes from environment and resource management
+- Procedurally decorated (handcrafted levels, procedural detail pass)
+
+## Technical Scope
+
+- **Engine**: Godot 4.6
+- **Platform**: PC (Steam), with console ports post-launch
+- **Team size**: Solo developer
+- **Target completion**: 12-month development cycle
+- **Scope**: 4–6 hours main story, 8–12 hours completionist
+
+## Art Direction
+
+Darkly atmospheric with vibrant bioluminescence providing the primary color
+palette. Deep blues, purples, and blacks punctuated by greens, teals, and
+ambers from living organisms and ancient technology.
+
+## Fun Hypothesis
+
+Players will feel rewarded by the combination of visual beauty and the
+satisfying moment of discovering how light manipulation solves each puzzle.
+The oxygen system will create just enough pressure to make exploration feel
+meaningful without making death feel punishing.
--- a/tests/skills/catalog.yaml
+++ b/tests/skills/catalog.yaml
@@ -0,0 +1,438 @@
+version: 1
+last_updated: ""
+skills:
+  # Critical — gate skills that control phase transitions
+  - name: gate-check
+    spec: tests/skills/gate-check.md
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: critical
+
+  - name: design-review
+    spec: tests/skills/design-review.md
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: critical
+
+  - name: story-readiness
+    spec: tests/skills/story-readiness.md
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: critical
+
+  - name: story-done
+    spec: tests/skills/story-done.md
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: critical
+
+  - name: review-all-gdds
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: critical
+
+  - name: architecture-review
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: critical
+
+  # High — pipeline-critical skills
+  - name: create-epics-stories
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: high
+
+  - name: create-control-manifest
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: high
+
+  - name: propagate-design-change
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: high
+
+  - name: architecture-decision
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: high
+
+  - name: map-systems
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: high
+
+  - name: design-system
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: high
+
+  # Medium — team and sprint management skills
+  - name: sprint-plan
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: medium
+
+  - name: sprint-status
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: medium
+
+  - name: team-ui
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: medium
+
+  - name: team-combat
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: medium
+
+  - name: team-narrative
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: medium
+
+  - name: team-audio
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: medium
+
+  - name: team-level
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: medium
+
+  - name: team-polish
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: medium
+
+  - name: team-release
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: medium
+
+  - name: team-live-ops
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: medium
+
+  # Low — analysis, reporting, utility skills
+  - name: skill-test
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: medium
+
+  - name: start
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: help
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: brainstorm
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: project-stage-detect
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: setup-engine
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: quick-design
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: ux-design
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: ux-review
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: code-review
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: balance-check
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: asset-audit
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: reverse-document
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: create-architecture
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: content-audit
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: bug-report
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: hotfix
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: prototype
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: playtest-report
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: perf-profile
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: tech-debt
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: scope-check
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: estimate
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: milestone-review
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: retrospective
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: changelog
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: patch-notes
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: onboard
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: localize
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: launch-checklist
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: release-checklist
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
+
+  - name: adopt
+    spec: ""
+    last_static: ""
+    last_static_result: ""
+    last_spec: ""
+    last_spec_result: ""
+    priority: low
--- a/tests/skills/design-review.md
+++ b/tests/skills/design-review.md
@@ -0,0 +1,144 @@
+# Skill Test Spec: /design-review
+
+## Skill Summary
+
+`/design-review` reads a game design document (GDD) and evaluates it against
+the project's 8-section design standard (Overview, Player Fantasy, Detailed
+Rules, Formulas, Edge Cases, Dependencies, Tuning Knobs, Acceptance Criteria).
+It checks for internal consistency, implementability, and cross-system
+conflicts. It produces a verdict of APPROVED, NEEDS REVISION, or MAJOR
+REVISION NEEDED. It is a read-only skill (no file writes) and runs as a
+`context: fork` subagent.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings or numbered steps
+- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
+- [ ] Does NOT require "May I write" language (read-only skill — `allowed-tools` excludes Write/Edit)
+- [ ] Output format is documented (review template shown in skill body)
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Complete GDD, all 8 sections present
+
+**Fixture:**
+- `design/gdd/light-manipulation.md` exists (use `_fixtures/minimal-game-concept.md`
+  as a stand-in — represents a complete document with all required content)
+- All 8 required sections are populated with substantive content
+- Formulas section contains at least one formula with defined variables
+- Acceptance Criteria section contains at least 3 testable criteria
+
+**Input:** `/design-review design/gdd/light-manipulation.md`
+
+**Expected behavior:**
+1. Skill reads the target document in full
+2. Skill reads CLAUDE.md for project context and standards
+3. Skill evaluates all 8 required sections (present/absent check)
+4. Skill checks internal consistency (formulas match described behavior)
+5. Skill checks implementability (rules are precise enough to code)
+6. Skill outputs structured review with section-by-section status
+7. Skill outputs APPROVED verdict
+
+**Assertions:**
+- [ ] Skill reads the target file before producing any output
+- [ ] Output includes a "Completeness" section showing X/8 sections present
+- [ ] Output includes an "Internal Consistency" section
+- [ ] Output includes an "Implementability" section
+- [ ] Output ends with a verdict line: APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED
+- [ ] APPROVED verdict is given when all 8 sections are present and consistent
+
+---
+
+### Case 2: Failure Path — Incomplete GDD (4/8 sections)
+
+**Fixture:**
+- `design/gdd/light-manipulation.md` exists using content from
+  `tests/skills/_fixtures/incomplete-gdd.md` (4 of 8 sections populated;
+  Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria are missing)
+
+**Input:** `/design-review design/gdd/light-manipulation.md`
+
+**Expected behavior:**
+1. Skill reads the document
+2. Skill identifies 4 missing sections
+3. Skill outputs "Completeness: 4/8 sections present"
+4. Skill lists specifically which 4 sections are missing
+5. Skill outputs MAJOR REVISION NEEDED verdict (not APPROVED or NEEDS REVISION)
+
+**Assertions:**
+- [ ] Output shows "4/8" in the completeness section (not a higher number)
+- [ ] Output explicitly names each missing section (Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria)
+- [ ] Verdict is MAJOR REVISION NEEDED (not APPROVED or NEEDS REVISION) when ≥3 sections are missing
+- [ ] Output does not suggest the document is implementation-ready
+- [ ] Skill does not write any files (read-only enforcement)
+
+---
+
+### Case 3: Partial Path — 7/8 sections, minor inconsistency
+
+**Fixture:**
+- GDD has all sections except Formulas
+- The described behavior mentions numeric values but no formulas are defined
+- Acceptance Criteria exist but are vague ("feels good" rather than measurable)
+
+**Input:** `/design-review design/gdd/[document].md`
+
+**Expected behavior:**
+1. Skill identifies missing Formulas section
+2. Skill flags vague acceptance criteria as an implementability issue
+3. Skill outputs NEEDS REVISION verdict (not APPROVED, not MAJOR REVISION NEEDED)
+4. Skill provides specific remediation notes for each issue
+
+**Assertions:**
+- [ ] Verdict is NEEDS REVISION (not APPROVED, not MAJOR REVISION NEEDED) for 7/8 with issues
+- [ ] Output identifies the missing Formulas section specifically
+- [ ] Output flags the vague acceptance criteria as an implementability gap
+- [ ] Each flagged issue has a specific, actionable remediation note
+
+---
+
+### Case 4: Edge Case — File not found
+
+**Fixture:**
+- The path provided does not exist in the project
+
+**Input:** `/design-review design/gdd/nonexistent.md`
+
+**Expected behavior:**
+1. Skill attempts to read the file
+2. File not found
+3. Skill outputs an error message naming the missing file
+4. Skill suggests checking the path or listing files in `design/gdd/`
+5. Skill does NOT produce a verdict
+
+**Assertions:**
+- [ ] Skill outputs a clear error when the file is not found
+- [ ] Skill does NOT output APPROVED, NEEDS REVISION, or MAJOR REVISION NEEDED when file is missing
+- [ ] Skill suggests a corrective action (check path, list available GDDs)
+
+---
+
+## Protocol Compliance
+
+- [ ] Does NOT use Write or Edit tools (read-only skill)
+- [ ] Presents complete findings before any verdict
+- [ ] Does not ask for approval before producing output (no writes to approve)
+- [ ] Ends with recommended next step (e.g., fix issues and re-run, or proceed to `/map-systems`)
+
+---
+
+## Coverage Notes
+
+- Cross-system consistency checking (Case 3 in the skill's own phase list) is
+  not directly tested here because it requires multiple GDD files to compare;
+  this is covered by the `/review-all-gdds` spec instead.
+- The skill's `context: fork` behavior (running as a subagent) is not tested
+  at the spec level — this is a runtime behavior verified manually.
+- Performance and edge cases involving very large GDD files are not in scope.
--- a/tests/skills/gate-check.md
+++ b/tests/skills/gate-check.md
@@ -0,0 +1,144 @@
+# Skill Test Spec: /gate-check
+
+## Skill Summary
+
+`/gate-check` validates whether the project is ready to advance to the next
+development phase. It checks for required artifacts, runs quality checks, asks
+the user about unverifiable items, and produces a PASS/CONCERNS/FAIL verdict.
+On PASS with user confirmation, it writes the new stage name to
+`production/stage.txt`. It governs all 6 phase transitions and is the most
+critical gate-keeping skill in the pipeline.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings (numbered Phase N or ## sections)
+- [ ] Contains verdict keywords: PASS, CONCERNS, FAIL
+- [ ] Contains "May I write" collaborative protocol language
+- [ ] Has a next-step handoff at the end (Follow-Up Actions section)
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All Concept artifacts present, advancing to Systems Design
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists, has content including all required sections
+- `design/gdd/game-pillars.md` exists (or pillars defined within concept doc)
+- No systems index yet (which is correct for this stage)
+
+**Input:** `/gate-check systems-design`
+
+**Expected behavior:**
+1. Skill reads `design/gdd/game-concept.md` and verifies it has content
+2. Skill checks for game pillars (in concept or separate file)
+3. Skill checks quality items (core loop described, target audience identified)
+4. Skill outputs structured checklist with all items marked
+5. Skill presents PASS/CONCERNS/FAIL verdict
+6. If PASS: skill asks "May I update `production/stage.txt` to 'Systems Design'?"
+
+**Assertions:**
+- [ ] Skill uses Glob or Read to verify `design/gdd/game-concept.md` exists before marking it checked
+- [ ] Output includes a "Required Artifacts" section with check status per item
+- [ ] Output includes a "Quality Checks" section with check status per item
+- [ ] Output includes a "Verdict" line with one of PASS / CONCERNS / FAIL
+- [ ] Skill asks about unverifiable quality items (e.g., "Has this been reviewed?") rather than assuming PASS
+- [ ] Skill asks "May I write" before updating `production/stage.txt`
+- [ ] Skill does NOT write `production/stage.txt` without explicit user confirmation
+
+---
+
+### Case 2: Failure Path — Missing required artifacts for Concept → Systems Design
+
+**Fixture:**
+- `design/gdd/game-concept.md` does NOT exist
+- No game pillars document exists
+- `design/gdd/` directory is empty or absent
+
+**Input:** `/gate-check systems-design`
+
+**Expected behavior:**
+1. Skill attempts to read `design/gdd/game-concept.md` — file not found
+2. Skill marks required artifact as missing (not present)
+3. Skill outputs FAIL verdict
+4. Skill lists blocker: "No game concept document found"
+5. Skill suggests remediation: run `/brainstorm` to create one
+
+**Assertions:**
+- [ ] Verdict is FAIL (not PASS or CONCERNS) when required artifacts are missing
+- [ ] Output explicitly names `design/gdd/game-concept.md` as missing
+- [ ] Output includes a "Blockers" section with at least 1 item
+- [ ] Output recommends `/brainstorm` as the remediation action
+- [ ] Skill does NOT write `production/stage.txt` when verdict is FAIL
+
+---
+
+### Case 3: No Argument — Auto-detect current stage
+
+**Fixture:**
+- `production/stage.txt` contains `Concept`
+- `design/gdd/game-concept.md` exists with content
+- No systems index yet
+
+**Input:** `/gate-check` (no argument)
+
+**Expected behavior:**
+1. Skill reads `production/stage.txt` to determine current stage
+2. Skill determines the next gate is Concept → Systems Design
+3. Skill proceeds with the Systems Design gate checks
+4. Output clearly states which transition is being validated
+
+**Assertions:**
+- [ ] Skill reads `production/stage.txt` (or uses project-stage-detect heuristics) to determine current stage
+- [ ] Output header names both current and target phases (e.g., "Gate Check: Concept → Systems Design")
+- [ ] Skill does not ask the user which gate to check if current stage is determinable
+
+---
+
+### Case 4: Edge Case — Manual check items flagged correctly
+
+**Fixture:**
+- All required artifacts for Concept → Systems Design are present
+- No playtest or review record exists (can't auto-verify quality checks)
+
+**Input:** `/gate-check systems-design`
+
+**Expected behavior:**
+1. Skill verifies all artifact files exist
+2. Skill encounters quality check: "Game concept reviewed (not MAJOR REVISION NEEDED)"
+3. Since no review record exists, skill marks item as MANUAL CHECK NEEDED
+4. Skill asks the user: "Has the game concept been reviewed for design quality?"
+5. Skill waits for user input before finalizing verdict
+
+**Assertions:**
+- [ ] Items that cannot be auto-verified are marked `[?] MANUAL CHECK NEEDED` rather than assumed PASS
+- [ ] Skill uses a question to the user for at least one unverifiable quality item
+- [ ] Skill does not mark unverifiable items as PASS by default
+
+---
+
+## Protocol Compliance
+
+- [ ] Uses "May I write" before updating `production/stage.txt`
+- [ ] Presents the full checklist report before asking for write approval
+- [ ] Ends with a "Follow-Up Actions" section listing next steps per verdict
+- [ ] Never advances the stage without explicit user confirmation
+- [ ] Never auto-creates `production/stage.txt` if it doesn't exist without asking
+
+---
+
+## Coverage Notes
+
+- The Production → Polish and Polish → Release gates are not covered here
+  because they require complex multi-artifact setups (sprint plans, playtest
+  data, QA sign-off); these are deferred to dedicated follow-up specs.
+- The "CONCERNS" verdict path (minor gaps, not blocking) is not explicitly
+  tested here; it falls between Case 1 and Case 2 and follows the same pattern.
+- The Vertical Slice validation block (Pre-Production → Production gate) is not
+  covered because it requires a playable build context that cannot be expressed
+  as a document fixture.
--- a/tests/skills/story-done.md
+++ b/tests/skills/story-done.md
@@ -0,0 +1,165 @@
+# Skill Test Spec: /story-done
+
+## Skill Summary
+
+`/story-done` closes the loop between design and implementation. Run at the
+end of implementing a story, it reads the story file and verifies each
+acceptance criterion against the implementation. It checks for GDD and ADR
+deviations, prompts a code review, updates the story status to `Complete`,
+logs any tech debt, and surfaces the next ready story from the sprint. It
+produces a COMPLETE / COMPLETE WITH NOTES / BLOCKED verdict and writes to
+the story file and optionally to `docs/tech-debt-register.md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥5 phase headings (complex skill warranting `context: fork` if applicable)
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" collaborative protocol language (writes to story file and tech-debt register)
+- [ ] Has a next-step handoff (surfaces next story from sprint)
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All acceptance criteria met, no deviations
+
+**Fixture:**
+- Story file at `production/epics/core/story-light-pickup.md` with:
+  - 3 acceptance criteria, all implemented as described
+  - `TR-ID: TR-light-001` referencing a GDD requirement
+  - `ADR: docs/architecture/adr-003-inventory.md` (Accepted)
+  - `Status: In Progress`
+- Implementation files listed in story exist in `src/`
+- GDD requirement text at TR-light-001 matches how the feature was implemented
+- ADR guidance was followed (no deviations)
+
+**Input:** `/story-done production/epics/core/story-light-pickup.md`
+
+**Expected behavior:**
+1. Skill reads the story file and extracts all key fields
+2. Skill reads the GDD requirement fresh from `tr-registry.yaml` (not from story's quoted text)
+3. Skill reads the referenced ADR to understand implementation constraints
+4. Skill evaluates each acceptance criterion (auto where possible, manual prompt where not)
+5. Skill checks for GDD requirement deviations
+6. Skill checks for ADR guideline deviations
+7. Skill prompts user: "Please provide the code review outcome for this story"
+8. Skill presents COMPLETE verdict
+9. Skill asks "May I update story Status to Complete and add Completion Notes?"
+10. If yes: skill updates the story file
+11. Skill surfaces the next `Ready for Dev` story from the sprint
+
+**Assertions:**
+- [ ] Skill reads `docs/architecture/tr-registry.yaml` for TR-ID requirement text (not just story)
+- [ ] Skill reads the referenced ADR file (not just the story reference)
+- [ ] Each acceptance criterion is listed with VERIFIED / DEFERRED / FAILED status
+- [ ] Skill prompts the user for code review outcome (does not skip this step)
+- [ ] Verdict is COMPLETE when all criteria are verified and no deviations exist
+- [ ] Skill asks "May I write" before updating the story file
+- [ ] Skill does NOT auto-update story status without user confirmation
+- [ ] After completion, skill surfaces the next ready story from `production/sprints/`
+
+---
+
+### Case 2: Blocked Path — Acceptance criterion cannot be verified
+
+**Fixture:**
+- Story file has an acceptance criterion: "Player sees correct animation on pickup"
+- No automated test for this criterion exists
+- Manual verification has not been performed
+- All other criteria are met
+
+**Input:** `/story-done production/epics/core/story-light-pickup.md`
+
+**Expected behavior:**
+1. Skill processes all acceptance criteria
+2. Reaches the animation criterion — cannot auto-verify
+3. Skill asks the user: "Acceptance criterion 'Player sees correct animation on
+   pickup' cannot be auto-verified. Has this been manually tested?"
+4. If user says No: criterion is marked DEFERRED, verdict becomes COMPLETE WITH NOTES
+5. Skill records the deferred criterion in completion notes
+6. Asks "May I write updated story with deferred criterion noted?"
+
+**Assertions:**
+- [ ] Skill asks the user about unverifiable criteria rather than assuming PASS
+- [ ] Deferred criteria result in COMPLETE WITH NOTES (not COMPLETE or BLOCKED)
+- [ ] The deferred criterion is explicitly named in the completion notes
+- [ ] Skill still asks "May I write" before updating the story file
+
+---
+
+### Case 3: Blocked Path — GDD deviation detected
+
+**Fixture:**
+- Story TR-ID points to requirement: "Player can carry max 3 light sources"
+- Implementation in `src/` uses a variable `MAX_CARRIED_LIGHTS = 5`
+- This is a deliberate deviation from the GDD
+
+**Input:** `/story-done production/epics/core/story-light-pickup.md`
+
+**Expected behavior:**
+1. Skill reads the GDD requirement text (max 3)
+2. Skill detects discrepancy between requirement and implementation value (5)
+3. Skill flags this as a GDD deviation and asks the user to classify it:
+   - INTENTIONAL: document the deviation and reason
+   - ERROR: implementation must be fixed before story can be marked Complete
+   - OUT OF SCOPE: requirement changed and GDD needs updating
+4. If INTENTIONAL: skill records deviation in completion notes, verdict is COMPLETE WITH NOTES
+5. If ERROR: verdict is BLOCKED until implementation is corrected
+
+**Assertions:**
+- [ ] Skill detects the mismatch between GDD requirement and implementation value
+- [ ] Skill asks the user to classify the deviation (not auto-assumes either way)
+- [ ] INTENTIONAL deviation → COMPLETE WITH NOTES (not BLOCKED)
+- [ ] ERROR deviation → BLOCKED verdict until fixed
+- [ ] Detected deviations are recorded in completion notes or tech debt register
+
+---
+
+### Case 4: Edge Case — No argument, auto-detect current story
+
+**Fixture:**
+- `production/session-state/active.md` contains a reference to
+  `production/epics/core/story-oxygen-drain.md` as the active story
+- That story file exists with `Status: In Progress`
+
+**Input:** `/story-done` (no argument)
+
+**Expected behavior:**
+1. Skill reads `production/session-state/active.md`
+2. Skill finds the active story reference
+3. Skill reads that story file and proceeds normally
+4. Output confirms which story was auto-detected
+
+**Assertions:**
+- [ ] Skill reads `production/session-state/active.md` when no argument is given
+- [ ] Skill identifies and confirms the auto-detected story before proceeding
+- [ ] If no story is found in session state, skill asks the user to provide a path
+
+---
+
+## Protocol Compliance
+
+- [ ] Uses "May I write" before updating the story file
+- [ ] Uses "May I write" before adding entries to `docs/tech-debt-register.md`
+- [ ] Presents complete findings (criteria check, deviation check) before asking approval
+- [ ] Ends by surfacing the next ready story from the sprint plan
+- [ ] Does not mark a story Complete if any criteria are in ERROR state
+- [ ] Does not skip the code review prompt
+
+---
+
+## Coverage Notes
+
+- The full 8-phase flow of the skill is exercised across Cases 1-3; not all
+  edge cases within each phase are covered.
+- Tech debt logging (deferred items written to `docs/tech-debt-register.md`)
+  is mentioned in Case 2 but not the primary assertion focus; dedicated
+  coverage deferred.
+- The `sprint-status.yaml` update (Phase 7 in the skill) is implied by Case 1
+  but not the primary assertion; assumed to follow the same "May I write" pattern.
+- Stories with multiple TR-IDs or multiple ADRs are not explicitly tested.
--- a/tests/skills/story-readiness.md
+++ b/tests/skills/story-readiness.md
@@ -0,0 +1,153 @@
+# Skill Test Spec: /story-readiness
+
+## Skill Summary
+
+`/story-readiness` validates that a story file is ready for a developer to
+pick up and implement. It checks four dimensions: Design (embedded GDD
+requirements), Architecture (ADR references and status), Scope (clear
+boundaries and DoD), and Definition of Done (testable criteria). It produces
+a READY / NEEDS WORK / BLOCKED verdict. It is a read-only skill and runs
+before any developer picks up a story.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings or numbered check sections
+- [ ] Contains verdict keywords: READY, NEEDS WORK, BLOCKED
+- [ ] Does NOT require "May I write" language (read-only skill)
+- [ ] Has a next-step handoff (what to do after verdict)
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Fully ready story
+
+**Fixture:**
+- Story file exists at `production/epics/core/story-light-pickup.md`
+- Story contains:
+  - `TR-ID: TR-light-001` (GDD requirement reference)
+  - `ADR: docs/architecture/adr-003-inventory.md`
+  - Referenced ADR exists and has status `Accepted`
+  - Referenced TR-ID exists in `docs/architecture/tr-registry.yaml`
+  - Story has `## Acceptance Criteria` with ≥3 testable items
+  - Story has `## Definition of Done` section
+  - Story has `Status: Ready for Dev`
+  - Manifest version in story header matches current `docs/architecture/control-manifest.md`
+
+**Input:** `/story-readiness production/epics/core/story-light-pickup.md`
+
+**Expected behavior:**
+1. Skill reads the story file
+2. Skill reads the referenced ADR — verifies status is `Accepted`
+3. Skill reads `docs/architecture/tr-registry.yaml` — verifies TR-ID exists
+4. Skill reads `docs/architecture/control-manifest.md` — verifies manifest version matches
+5. Skill evaluates all 4 dimensions (Design, Architecture, Scope, DoD)
+6. Skill outputs READY verdict with all checks passing
+
+**Assertions:**
+- [ ] Skill reads the referenced ADR file (not just the story)
+- [ ] Skill verifies ADR status is `Accepted` (not `Proposed`)
+- [ ] Skill reads `tr-registry.yaml` to verify TR-ID exists
+- [ ] Output includes check results for all 4 dimensions
+- [ ] Verdict is READY when all checks pass
+- [ ] Skill does not write any files
+
+---
+
+### Case 2: Blocked Path — Referenced ADR is Proposed (not Accepted)
+
+**Fixture:**
+- Story file exists with `ADR: docs/architecture/adr-005-light-system.md`
+- `adr-005-light-system.md` exists but has `Status: Proposed`
+- All other story content is otherwise complete
+
+**Input:** `/story-readiness production/epics/core/story-light-system.md`
+
+**Expected behavior:**
+1. Skill reads the story
+2. Skill reads `adr-005-light-system.md` — finds `Status: Proposed`
+3. Skill flags this as a BLOCKING issue (cannot implement against unaccepted ADR)
+4. Skill outputs BLOCKED verdict
+5. Skill recommends: accept or reject the ADR before picking up the story
+
+**Assertions:**
+- [ ] Verdict is BLOCKED (not NEEDS WORK or READY) when ADR is Proposed
+- [ ] Output explicitly names the Proposed ADR as the blocker
+- [ ] Output recommends resolving ADR status before proceeding
+- [ ] Skill does not output READY regardless of other checks passing
+
+---
+
+### Case 3: Needs Work — Missing Acceptance Criteria
+
+**Fixture:**
+- Story file exists but has no `## Acceptance Criteria` section
+- ADR reference exists and is `Accepted`
+- TR-ID exists in registry
+- Manifest version matches
+
+**Input:** `/story-readiness production/epics/core/story-oxygen-drain.md`
+
+**Expected behavior:**
+1. Skill reads the story
+2. Skill finds no Acceptance Criteria section
+3. Skill flags this as a NEEDS WORK issue (story is incomplete, not blocked)
+4. Skill outputs NEEDS WORK verdict
+5. Skill names the missing section and suggests adding measurable criteria
+
+**Assertions:**
+- [ ] Verdict is NEEDS WORK (not BLOCKED or READY) when Acceptance Criteria section is absent
+- [ ] Output identifies the missing Acceptance Criteria section specifically
+- [ ] Output suggests adding testable/measurable criteria
+- [ ] Skill distinguishes NEEDS WORK (fixable without external dependencies) from BLOCKED (requires outside action)
+
+---
+
+### Case 4: Edge Case — Stale manifest version
+
+**Fixture:**
+- Story file has `Manifest Version: 2026-01-15` in its header
+- `docs/architecture/control-manifest.md` has `Manifest Version: 2026-03-10`
+- Versions do not match (story was created before manifest was updated)
+
+**Input:** `/story-readiness production/epics/core/story-mirror-rotation.md`
+
+**Expected behavior:**
+1. Skill reads the story and extracts manifest version `2026-01-15`
+2. Skill reads control manifest header and extracts current version `2026-03-10`
+3. Skill detects version mismatch
+4. Skill flags this as an ADVISORY issue (not blocking, but worth noting)
+5. Verdict is NEEDS WORK with manifest staleness noted
+
+**Assertions:**
+- [ ] Skill reads `docs/architecture/control-manifest.md` to get current version
+- [ ] Skill compares story's embedded manifest version against current manifest version
+- [ ] Stale manifest version results in NEEDS WORK (not BLOCKED, not READY)
+- [ ] Output explains that the story's embedded guidance may be outdated
+
+---
+
+## Protocol Compliance
+
+- [ ] Does NOT use Write or Edit tools (read-only skill)
+- [ ] Presents complete check results before verdict
+- [ ] Does not ask for approval (no file writes)
+- [ ] Ends with recommended next step (fix issues or proceed to implementation)
+- [ ] Distinguishes three verdict levels clearly (READY vs NEEDS WORK vs BLOCKED)
+
+---
+
+## Coverage Notes
+
+- Case where TR-ID is missing from the registry entirely is not explicitly
+  tested here; it follows the same NEEDS WORK pattern as Case 3.
+- The "no argument" path (skill auto-detecting the current story) is not
+  tested because it depends on `production/session-state/active.md` content,
+  which is hard to fixture reliably.
+- Stories with multiple ADR references are not tested; behavior is assumed to
+  be additive (all ADRs must be Accepted for READY verdict).