Add /skill-test suite: linter, behavioral specs, and coverage catalog for 52 skills

- New skill: /skill-test (static | spec | audit modes)
  - static: 7-check structural linter per skill file
  - spec: Claude-evaluated behavioral assertions against test specs
  - audit: coverage report across all 52 skills with priority gaps
- New hook: validate-skill-change.sh — advisory reminder to lint after skill edits
- New template: skill-test-spec.md — standard structure for authoring test specs
- New: tests/skills/catalog.yaml — machine-readable coverage index (52 skills)
- New: tests/skills/_fixtures/ — shared fixtures (complete concept, incomplete GDD)
- New: 4 seed test specs for critical gate skills (gate-check, design-review,
  story-readiness, story-done) — 4 cases each
- Modified: settings.json — validate-skill-change.sh added to PostToolUse hook

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Donchitos
2026-03-13 17:05:08 +11:00
parent cdb1aa83b7
commit af2b864796
11 changed files with 1587 additions and 0 deletions

View File

@@ -0,0 +1,51 @@
# GDD: Light Manipulation System
## Overview
The light manipulation system allows players to interact with bioluminescent
organisms and ancient light conduits to redirect beams of light. Light beams
illuminate dark areas, power ancient mechanisms, and reveal hidden surfaces.
## Player Fantasy
The player should feel like a puzzle archaeologist — discovering the logic of
an alien but internally consistent technology. The "aha" moment when a complex
light path clicks into place should feel earned and satisfying.
## Detailed Rules
- Players can pick up portable light sources (max 3 carried at once)
- Stationary conduits redirect beams at fixed angles (45°/90°/135°/180°)
- Light beams are blocked by solid terrain and most objects
- Living bioluminescent organisms pulse light on a 3-second cycle
- Ancient mirrors rotate freely and redirect any light beam that touches them
- A beam must reach a receptor to activate a mechanism
## Formulas
[SECTION MISSING — not yet authored]
## Edge Cases
[SECTION MISSING — not yet authored]
## Dependencies
- **Oxygen System**: Light sources consume no oxygen but picking them up takes
time (opportunity cost with oxygen drain)
- **Cave Navigation**: Illuminated paths reveal branching routes not visible
in darkness
- Player Inventory System (not yet designed)
## Tuning Knobs
[SECTION MISSING — not yet authored]
## Acceptance Criteria
[SECTION MISSING — not yet authored]
---
*Status: Draft — 4/8 required sections populated*
*Last updated: 2026-03-13*

View File

@@ -0,0 +1,62 @@
# Game Concept: Echoes of the Deep
## Overview
Echoes of the Deep is a single-player atmospheric puzzle-platformer set in
a bioluminescent underwater cave network. Players control a deep-sea diver
exploring ancient ruins while managing oxygen supplies and manipulating light
sources to reveal hidden paths and solve environmental puzzles.
## Player Fantasy
The player should feel like a lone explorer uncovering a lost civilization,
experiencing wonder at beautiful environments, and the satisfying "aha" moment
when a clever puzzle clicks into place. The oxygen mechanic creates gentle
pressure without punishing failure harshly.
## Core Loop
1. **Explore** — navigate branching cave sections using light and movement
2. **Discover** — find oxygen caches, light sources, and ancient mechanisms
3. **Solve** — manipulate light and environment to unlock new areas
4. **Progress** — unlock deeper cave sections with escalating complexity
## Game Pillars
1. **Wonder** — every area should contain something visually or mechanically surprising
2. **Accessibility** — the game should be completable without frustration; oxygen
manages pacing, not punishment
3. **Environmental Storytelling** — the ruins tell a story without text exposition
## Target Audience
Casual-to-midcore players who enjoy relaxed exploration games (Subnautica,
Journey, ABZÛ) and puzzle games that reward observation over reflexes.
Target age: 16+. Target sessions: 3090 minutes.
## Unique Selling Points
- Bioluminescent light manipulation as the core puzzle mechanic
- No enemies — tension comes from environment and resource management
- Procedurally decorated (handcrafted levels, procedural detail pass)
## Technical Scope
- **Engine**: Godot 4.6
- **Platform**: PC (Steam), with console ports post-launch
- **Team size**: Solo developer
- **Target completion**: 12-month development cycle
- **Scope**: 46 hours main story, 812 hours completionist
## Art Direction
Darkly atmospheric with vibrant bioluminescence providing the primary color
palette. Deep blues, purples, and blacks punctuated by greens, teals, and
ambers from living organisms and ancient technology.
## Fun Hypothesis
Players will feel rewarded by the combination of visual beauty and the
satisfying moment of discovering how light manipulation solves each puzzle.
The oxygen system will create just enough pressure to make exploration feel
meaningful without making death feel punishing.

438
tests/skills/catalog.yaml Normal file
View File

@@ -0,0 +1,438 @@
version: 1
last_updated: ""
skills:
# Critical — gate skills that control phase transitions
- name: gate-check
spec: tests/skills/gate-check.md
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: critical
- name: design-review
spec: tests/skills/design-review.md
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: critical
- name: story-readiness
spec: tests/skills/story-readiness.md
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: critical
- name: story-done
spec: tests/skills/story-done.md
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: critical
- name: review-all-gdds
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: critical
- name: architecture-review
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: critical
# High — pipeline-critical skills
- name: create-epics-stories
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: high
- name: create-control-manifest
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: high
- name: propagate-design-change
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: high
- name: architecture-decision
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: high
- name: map-systems
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: high
- name: design-system
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: high
# Medium — team and sprint management skills
- name: sprint-plan
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: medium
- name: sprint-status
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: medium
- name: team-ui
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: medium
- name: team-combat
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: medium
- name: team-narrative
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: medium
- name: team-audio
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: medium
- name: team-level
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: medium
- name: team-polish
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: medium
- name: team-release
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: medium
- name: team-live-ops
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: medium
# Low — analysis, reporting, utility skills
- name: skill-test
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: medium
- name: start
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: help
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: brainstorm
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: project-stage-detect
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: setup-engine
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: quick-design
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: ux-design
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: ux-review
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: code-review
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: balance-check
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: asset-audit
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: reverse-document
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: create-architecture
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: content-audit
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: bug-report
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: hotfix
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: prototype
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: playtest-report
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: perf-profile
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: tech-debt
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: scope-check
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: estimate
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: milestone-review
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: retrospective
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: changelog
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: patch-notes
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: onboard
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: localize
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: launch-checklist
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: release-checklist
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low
- name: adopt
spec: ""
last_static: ""
last_static_result: ""
last_spec: ""
last_spec_result: ""
priority: low

View File

@@ -0,0 +1,144 @@
# Skill Test Spec: /design-review
## Skill Summary
`/design-review` reads a game design document (GDD) and evaluates it against
the project's 8-section design standard (Overview, Player Fantasy, Detailed
Rules, Formulas, Edge Cases, Dependencies, Tuning Knobs, Acceptance Criteria).
It checks for internal consistency, implementability, and cross-system
conflicts. It produces a verdict of APPROVED, NEEDS REVISION, or MAJOR
REVISION NEEDED. It is a read-only skill (no file writes) and runs as a
`context: fork` subagent.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings or numbered steps
- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
- [ ] Does NOT require "May I write" language (read-only skill — `allowed-tools` excludes Write/Edit)
- [ ] Output format is documented (review template shown in skill body)
---
## Test Cases
### Case 1: Happy Path — Complete GDD, all 8 sections present
**Fixture:**
- `design/gdd/light-manipulation.md` exists (use `_fixtures/minimal-game-concept.md`
as a stand-in — represents a complete document with all required content)
- All 8 required sections are populated with substantive content
- Formulas section contains at least one formula with defined variables
- Acceptance Criteria section contains at least 3 testable criteria
**Input:** `/design-review design/gdd/light-manipulation.md`
**Expected behavior:**
1. Skill reads the target document in full
2. Skill reads CLAUDE.md for project context and standards
3. Skill evaluates all 8 required sections (present/absent check)
4. Skill checks internal consistency (formulas match described behavior)
5. Skill checks implementability (rules are precise enough to code)
6. Skill outputs structured review with section-by-section status
7. Skill outputs APPROVED verdict
**Assertions:**
- [ ] Skill reads the target file before producing any output
- [ ] Output includes a "Completeness" section showing X/8 sections present
- [ ] Output includes an "Internal Consistency" section
- [ ] Output includes an "Implementability" section
- [ ] Output ends with a verdict line: APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED
- [ ] APPROVED verdict is given when all 8 sections are present and consistent
---
### Case 2: Failure Path — Incomplete GDD (4/8 sections)
**Fixture:**
- `design/gdd/light-manipulation.md` exists using content from
`tests/skills/_fixtures/incomplete-gdd.md` (4 of 8 sections populated;
Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria are missing)
**Input:** `/design-review design/gdd/light-manipulation.md`
**Expected behavior:**
1. Skill reads the document
2. Skill identifies 4 missing sections
3. Skill outputs "Completeness: 4/8 sections present"
4. Skill lists specifically which 4 sections are missing
5. Skill outputs MAJOR REVISION NEEDED verdict (not APPROVED or NEEDS REVISION)
**Assertions:**
- [ ] Output shows "4/8" in the completeness section (not a higher number)
- [ ] Output explicitly names each missing section (Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria)
- [ ] Verdict is MAJOR REVISION NEEDED (not APPROVED or NEEDS REVISION) when ≥3 sections are missing
- [ ] Output does not suggest the document is implementation-ready
- [ ] Skill does not write any files (read-only enforcement)
---
### Case 3: Partial Path — 7/8 sections, minor inconsistency
**Fixture:**
- GDD has all sections except Formulas
- The described behavior mentions numeric values but no formulas are defined
- Acceptance Criteria exist but are vague ("feels good" rather than measurable)
**Input:** `/design-review design/gdd/[document].md`
**Expected behavior:**
1. Skill identifies missing Formulas section
2. Skill flags vague acceptance criteria as an implementability issue
3. Skill outputs NEEDS REVISION verdict (not APPROVED, not MAJOR REVISION NEEDED)
4. Skill provides specific remediation notes for each issue
**Assertions:**
- [ ] Verdict is NEEDS REVISION (not APPROVED, not MAJOR REVISION NEEDED) for 7/8 with issues
- [ ] Output identifies the missing Formulas section specifically
- [ ] Output flags the vague acceptance criteria as an implementability gap
- [ ] Each flagged issue has a specific, actionable remediation note
---
### Case 4: Edge Case — File not found
**Fixture:**
- The path provided does not exist in the project
**Input:** `/design-review design/gdd/nonexistent.md`
**Expected behavior:**
1. Skill attempts to read the file
2. File not found
3. Skill outputs an error message naming the missing file
4. Skill suggests checking the path or listing files in `design/gdd/`
5. Skill does NOT produce a verdict
**Assertions:**
- [ ] Skill outputs a clear error when the file is not found
- [ ] Skill does NOT output APPROVED, NEEDS REVISION, or MAJOR REVISION NEEDED when file is missing
- [ ] Skill suggests a corrective action (check path, list available GDDs)
---
## Protocol Compliance
- [ ] Does NOT use Write or Edit tools (read-only skill)
- [ ] Presents complete findings before any verdict
- [ ] Does not ask for approval before producing output (no writes to approve)
- [ ] Ends with recommended next step (e.g., fix issues and re-run, or proceed to `/map-systems`)
---
## Coverage Notes
- Cross-system consistency checking (Case 3 in the skill's own phase list) is
not directly tested here because it requires multiple GDD files to compare;
this is covered by the `/review-all-gdds` spec instead.
- The skill's `context: fork` behavior (running as a subagent) is not tested
at the spec level — this is a runtime behavior verified manually.
- Performance and edge cases involving very large GDD files are not in scope.

144
tests/skills/gate-check.md Normal file
View File

@@ -0,0 +1,144 @@
# Skill Test Spec: /gate-check
## Skill Summary
`/gate-check` validates whether the project is ready to advance to the next
development phase. It checks for required artifacts, runs quality checks, asks
the user about unverifiable items, and produces a PASS/CONCERNS/FAIL verdict.
On PASS with user confirmation, it writes the new stage name to
`production/stage.txt`. It governs all 6 phase transitions and is the most
critical gate-keeping skill in the pipeline.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings (numbered Phase N or ## sections)
- [ ] Contains verdict keywords: PASS, CONCERNS, FAIL
- [ ] Contains "May I write" collaborative protocol language
- [ ] Has a next-step handoff at the end (Follow-Up Actions section)
---
## Test Cases
### Case 1: Happy Path — All Concept artifacts present, advancing to Systems Design
**Fixture:**
- `design/gdd/game-concept.md` exists, has content including all required sections
- `design/gdd/game-pillars.md` exists (or pillars defined within concept doc)
- No systems index yet (which is correct for this stage)
**Input:** `/gate-check systems-design`
**Expected behavior:**
1. Skill reads `design/gdd/game-concept.md` and verifies it has content
2. Skill checks for game pillars (in concept or separate file)
3. Skill checks quality items (core loop described, target audience identified)
4. Skill outputs structured checklist with all items marked
5. Skill presents PASS/CONCERNS/FAIL verdict
6. If PASS: skill asks "May I update `production/stage.txt` to 'Systems Design'?"
**Assertions:**
- [ ] Skill uses Glob or Read to verify `design/gdd/game-concept.md` exists before marking it checked
- [ ] Output includes a "Required Artifacts" section with check status per item
- [ ] Output includes a "Quality Checks" section with check status per item
- [ ] Output includes a "Verdict" line with one of PASS / CONCERNS / FAIL
- [ ] Skill asks about unverifiable quality items (e.g., "Has this been reviewed?") rather than assuming PASS
- [ ] Skill asks "May I write" before updating `production/stage.txt`
- [ ] Skill does NOT write `production/stage.txt` without explicit user confirmation
---
### Case 2: Failure Path — Missing required artifacts for Concept → Systems Design
**Fixture:**
- `design/gdd/game-concept.md` does NOT exist
- No game pillars document exists
- `design/gdd/` directory is empty or absent
**Input:** `/gate-check systems-design`
**Expected behavior:**
1. Skill attempts to read `design/gdd/game-concept.md` — file not found
2. Skill marks required artifact as missing (not present)
3. Skill outputs FAIL verdict
4. Skill lists blocker: "No game concept document found"
5. Skill suggests remediation: run `/brainstorm` to create one
**Assertions:**
- [ ] Verdict is FAIL (not PASS or CONCERNS) when required artifacts are missing
- [ ] Output explicitly names `design/gdd/game-concept.md` as missing
- [ ] Output includes a "Blockers" section with at least 1 item
- [ ] Output recommends `/brainstorm` as the remediation action
- [ ] Skill does NOT write `production/stage.txt` when verdict is FAIL
---
### Case 3: No Argument — Auto-detect current stage
**Fixture:**
- `production/stage.txt` contains `Concept`
- `design/gdd/game-concept.md` exists with content
- No systems index yet
**Input:** `/gate-check` (no argument)
**Expected behavior:**
1. Skill reads `production/stage.txt` to determine current stage
2. Skill determines the next gate is Concept → Systems Design
3. Skill proceeds with the Systems Design gate checks
4. Output clearly states which transition is being validated
**Assertions:**
- [ ] Skill reads `production/stage.txt` (or uses project-stage-detect heuristics) to determine current stage
- [ ] Output header names both current and target phases (e.g., "Gate Check: Concept → Systems Design")
- [ ] Skill does not ask the user which gate to check if current stage is determinable
---
### Case 4: Edge Case — Manual check items flagged correctly
**Fixture:**
- All required artifacts for Concept → Systems Design are present
- No playtest or review record exists (can't auto-verify quality checks)
**Input:** `/gate-check systems-design`
**Expected behavior:**
1. Skill verifies all artifact files exist
2. Skill encounters quality check: "Game concept reviewed (not MAJOR REVISION NEEDED)"
3. Since no review record exists, skill marks item as MANUAL CHECK NEEDED
4. Skill asks the user: "Has the game concept been reviewed for design quality?"
5. Skill waits for user input before finalizing verdict
**Assertions:**
- [ ] Items that cannot be auto-verified are marked `[?] MANUAL CHECK NEEDED` rather than assumed PASS
- [ ] Skill uses a question to the user for at least one unverifiable quality item
- [ ] Skill does not mark unverifiable items as PASS by default
---
## Protocol Compliance
- [ ] Uses "May I write" before updating `production/stage.txt`
- [ ] Presents the full checklist report before asking for write approval
- [ ] Ends with a "Follow-Up Actions" section listing next steps per verdict
- [ ] Never advances the stage without explicit user confirmation
- [ ] Never auto-creates `production/stage.txt` if it doesn't exist without asking
---
## Coverage Notes
- The Production → Polish and Polish → Release gates are not covered here
because they require complex multi-artifact setups (sprint plans, playtest
data, QA sign-off); these are deferred to dedicated follow-up specs.
- The "CONCERNS" verdict path (minor gaps, not blocking) is not explicitly
tested here; it falls between Case 1 and Case 2 and follows the same pattern.
- The Vertical Slice validation block (Pre-Production → Production gate) is not
covered because it requires a playable build context that cannot be expressed
as a document fixture.

165
tests/skills/story-done.md Normal file
View File

@@ -0,0 +1,165 @@
# Skill Test Spec: /story-done
## Skill Summary
`/story-done` closes the loop between design and implementation. Run at the
end of implementing a story, it reads the story file and verifies each
acceptance criterion against the implementation. It checks for GDD and ADR
deviations, prompts a code review, updates the story status to `Complete`,
logs any tech debt, and surfaces the next ready story from the sprint. It
produces a COMPLETE / COMPLETE WITH NOTES / BLOCKED verdict and writes to
the story file and optionally to `docs/tech-debt-register.md`.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥5 phase headings (complex skill warranting `context: fork` if applicable)
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains "May I write" collaborative protocol language (writes to story file and tech-debt register)
- [ ] Has a next-step handoff (surfaces next story from sprint)
---
## Test Cases
### Case 1: Happy Path — All acceptance criteria met, no deviations
**Fixture:**
- Story file at `production/epics/core/story-light-pickup.md` with:
- 3 acceptance criteria, all implemented as described
- `TR-ID: TR-light-001` referencing a GDD requirement
- `ADR: docs/architecture/adr-003-inventory.md` (Accepted)
- `Status: In Progress`
- Implementation files listed in story exist in `src/`
- GDD requirement text at TR-light-001 matches how the feature was implemented
- ADR guidance was followed (no deviations)
**Input:** `/story-done production/epics/core/story-light-pickup.md`
**Expected behavior:**
1. Skill reads the story file and extracts all key fields
2. Skill reads the GDD requirement fresh from `tr-registry.yaml` (not from story's quoted text)
3. Skill reads the referenced ADR to understand implementation constraints
4. Skill evaluates each acceptance criterion (auto where possible, manual prompt where not)
5. Skill checks for GDD requirement deviations
6. Skill checks for ADR guideline deviations
7. Skill prompts user: "Please provide the code review outcome for this story"
8. Skill presents COMPLETE verdict
9. Skill asks "May I update story Status to Complete and add Completion Notes?"
10. If yes: skill updates the story file
11. Skill surfaces the next `Ready for Dev` story from the sprint
**Assertions:**
- [ ] Skill reads `docs/architecture/tr-registry.yaml` for TR-ID requirement text (not just story)
- [ ] Skill reads the referenced ADR file (not just the story reference)
- [ ] Each acceptance criterion is listed with VERIFIED / DEFERRED / FAILED status
- [ ] Skill prompts the user for code review outcome (does not skip this step)
- [ ] Verdict is COMPLETE when all criteria are verified and no deviations exist
- [ ] Skill asks "May I write" before updating the story file
- [ ] Skill does NOT auto-update story status without user confirmation
- [ ] After completion, skill surfaces the next ready story from `production/sprints/`
---
### Case 2: Blocked Path — Acceptance criterion cannot be verified
**Fixture:**
- Story file has an acceptance criterion: "Player sees correct animation on pickup"
- No automated test for this criterion exists
- Manual verification has not been performed
- All other criteria are met
**Input:** `/story-done production/epics/core/story-light-pickup.md`
**Expected behavior:**
1. Skill processes all acceptance criteria
2. Reaches the animation criterion — cannot auto-verify
3. Skill asks the user: "Acceptance criterion 'Player sees correct animation on
pickup' cannot be auto-verified. Has this been manually tested?"
4. If user says No: criterion is marked DEFERRED, verdict becomes COMPLETE WITH NOTES
5. Skill records the deferred criterion in completion notes
6. Asks "May I write updated story with deferred criterion noted?"
**Assertions:**
- [ ] Skill asks the user about unverifiable criteria rather than assuming PASS
- [ ] Deferred criteria result in COMPLETE WITH NOTES (not COMPLETE or BLOCKED)
- [ ] The deferred criterion is explicitly named in the completion notes
- [ ] Skill still asks "May I write" before updating the story file
---
### Case 3: Blocked Path — GDD deviation detected
**Fixture:**
- Story TR-ID points to requirement: "Player can carry max 3 light sources"
- Implementation in `src/` uses a variable `MAX_CARRIED_LIGHTS = 5`
- This is a deliberate deviation from the GDD
**Input:** `/story-done production/epics/core/story-light-pickup.md`
**Expected behavior:**
1. Skill reads the GDD requirement text (max 3)
2. Skill detects discrepancy between requirement and implementation value (5)
3. Skill flags this as a GDD deviation and asks the user to classify it:
- INTENTIONAL: document the deviation and reason
- ERROR: implementation must be fixed before story can be marked Complete
- OUT OF SCOPE: requirement changed and GDD needs updating
4. If INTENTIONAL: skill records deviation in completion notes, verdict is COMPLETE WITH NOTES
5. If ERROR: verdict is BLOCKED until implementation is corrected
**Assertions:**
- [ ] Skill detects the mismatch between GDD requirement and implementation value
- [ ] Skill asks the user to classify the deviation (not auto-assumes either way)
- [ ] INTENTIONAL deviation → COMPLETE WITH NOTES (not BLOCKED)
- [ ] ERROR deviation → BLOCKED verdict until fixed
- [ ] Detected deviations are recorded in completion notes or tech debt register
---
### Case 4: Edge Case — No argument, auto-detect current story
**Fixture:**
- `production/session-state/active.md` contains a reference to
`production/epics/core/story-oxygen-drain.md` as the active story
- That story file exists with `Status: In Progress`
**Input:** `/story-done` (no argument)
**Expected behavior:**
1. Skill reads `production/session-state/active.md`
2. Skill finds the active story reference
3. Skill reads that story file and proceeds normally
4. Output confirms which story was auto-detected
**Assertions:**
- [ ] Skill reads `production/session-state/active.md` when no argument is given
- [ ] Skill identifies and confirms the auto-detected story before proceeding
- [ ] If no story is found in session state, skill asks the user to provide a path
---
## Protocol Compliance
- [ ] Uses "May I write" before updating the story file
- [ ] Uses "May I write" before adding entries to `docs/tech-debt-register.md`
- [ ] Presents complete findings (criteria check, deviation check) before asking approval
- [ ] Ends by surfacing the next ready story from the sprint plan
- [ ] Does not mark a story Complete if any criteria are in ERROR state
- [ ] Does not skip the code review prompt
---
## Coverage Notes
- The full 8-phase flow of the skill is exercised across Cases 1-3; not all
edge cases within each phase are covered.
- Tech debt logging (deferred items written to `docs/tech-debt-register.md`)
is mentioned in Case 2 but not the primary assertion focus; dedicated
coverage deferred.
- The `sprint-status.yaml` update (Phase 7 in the skill) is implied by Case 1
but not the primary assertion; assumed to follow the same "May I write" pattern.
- Stories with multiple TR-IDs or multiple ADRs are not explicitly tested.

View File

@@ -0,0 +1,153 @@
# Skill Test Spec: /story-readiness
## Skill Summary
`/story-readiness` validates that a story file is ready for a developer to
pick up and implement. It checks four dimensions: Design (embedded GDD
requirements), Architecture (ADR references and status), Scope (clear
boundaries and DoD), and Definition of Done (testable criteria). It produces
a READY / NEEDS WORK / BLOCKED verdict. It is a read-only skill and runs
before any developer picks up a story.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings or numbered check sections
- [ ] Contains verdict keywords: READY, NEEDS WORK, BLOCKED
- [ ] Does NOT require "May I write" language (read-only skill)
- [ ] Has a next-step handoff (what to do after verdict)
---
## Test Cases
### Case 1: Happy Path — Fully ready story
**Fixture:**
- Story file exists at `production/epics/core/story-light-pickup.md`
- Story contains:
- `TR-ID: TR-light-001` (GDD requirement reference)
- `ADR: docs/architecture/adr-003-inventory.md`
- Referenced ADR exists and has status `Accepted`
- Referenced TR-ID exists in `docs/architecture/tr-registry.yaml`
- Story has `## Acceptance Criteria` with ≥3 testable items
- Story has `## Definition of Done` section
- Story has `Status: Ready for Dev`
- Manifest version in story header matches current `docs/architecture/control-manifest.md`
**Input:** `/story-readiness production/epics/core/story-light-pickup.md`
**Expected behavior:**
1. Skill reads the story file
2. Skill reads the referenced ADR — verifies status is `Accepted`
3. Skill reads `docs/architecture/tr-registry.yaml` — verifies TR-ID exists
4. Skill reads `docs/architecture/control-manifest.md` — verifies manifest version matches
5. Skill evaluates all 4 dimensions (Design, Architecture, Scope, DoD)
6. Skill outputs READY verdict with all checks passing
**Assertions:**
- [ ] Skill reads the referenced ADR file (not just the story)
- [ ] Skill verifies ADR status is `Accepted` (not `Proposed`)
- [ ] Skill reads `tr-registry.yaml` to verify TR-ID exists
- [ ] Output includes check results for all 4 dimensions
- [ ] Verdict is READY when all checks pass
- [ ] Skill does not write any files
---
### Case 2: Blocked Path — Referenced ADR is Proposed (not Accepted)
**Fixture:**
- Story file exists with `ADR: docs/architecture/adr-005-light-system.md`
- `adr-005-light-system.md` exists but has `Status: Proposed`
- All other story content is otherwise complete
**Input:** `/story-readiness production/epics/core/story-light-system.md`
**Expected behavior:**
1. Skill reads the story
2. Skill reads `adr-005-light-system.md` — finds `Status: Proposed`
3. Skill flags this as a BLOCKING issue (cannot implement against unaccepted ADR)
4. Skill outputs BLOCKED verdict
5. Skill recommends: accept or reject the ADR before picking up the story
**Assertions:**
- [ ] Verdict is BLOCKED (not NEEDS WORK or READY) when ADR is Proposed
- [ ] Output explicitly names the Proposed ADR as the blocker
- [ ] Output recommends resolving ADR status before proceeding
- [ ] Skill does not output READY regardless of other checks passing
---
### Case 3: Needs Work — Missing Acceptance Criteria
**Fixture:**
- Story file exists but has no `## Acceptance Criteria` section
- ADR reference exists and is `Accepted`
- TR-ID exists in registry
- Manifest version matches
**Input:** `/story-readiness production/epics/core/story-oxygen-drain.md`
**Expected behavior:**
1. Skill reads the story
2. Skill finds no Acceptance Criteria section
3. Skill flags this as a NEEDS WORK issue (story is incomplete, not blocked)
4. Skill outputs NEEDS WORK verdict
5. Skill names the missing section and suggests adding measurable criteria
**Assertions:**
- [ ] Verdict is NEEDS WORK (not BLOCKED or READY) when Acceptance Criteria section is absent
- [ ] Output identifies the missing Acceptance Criteria section specifically
- [ ] Output suggests adding testable/measurable criteria
- [ ] Skill distinguishes NEEDS WORK (fixable without external dependencies) from BLOCKED (requires outside action)
---
### Case 4: Edge Case — Stale manifest version
**Fixture:**
- Story file has `Manifest Version: 2026-01-15` in its header
- `docs/architecture/control-manifest.md` has `Manifest Version: 2026-03-10`
- Versions do not match (story was created before manifest was updated)
**Input:** `/story-readiness production/epics/core/story-mirror-rotation.md`
**Expected behavior:**
1. Skill reads the story and extracts manifest version `2026-01-15`
2. Skill reads control manifest header and extracts current version `2026-03-10`
3. Skill detects version mismatch
4. Skill flags this as an ADVISORY issue (not blocking, but worth noting)
5. Verdict is NEEDS WORK with manifest staleness noted
**Assertions:**
- [ ] Skill reads `docs/architecture/control-manifest.md` to get current version
- [ ] Skill compares story's embedded manifest version against current manifest version
- [ ] Stale manifest version results in NEEDS WORK (not BLOCKED, not READY)
- [ ] Output explains that the story's embedded guidance may be outdated
---
## Protocol Compliance
- [ ] Does NOT use Write or Edit tools (read-only skill)
- [ ] Presents complete check results before verdict
- [ ] Does not ask for approval (no file writes)
- [ ] Ends with recommended next step (fix issues or proceed to implementation)
- [ ] Distinguishes three verdict levels clearly (READY vs NEEDS WORK vs BLOCKED)
---
## Coverage Notes
- Case where TR-ID is missing from the registry entirely is not explicitly
tested here; it follows the same NEEDS WORK pattern as Case 3.
- The "no argument" path (skill auto-detecting the current story) is not
tested because it depends on `production/session-state/active.md` content,
which is hard to fixture reliably.
- Stories with multiple ADR references are not tested; behavior is assumed to
be additive (all ADRs must be Accepted for READY verdict).