diff --git a/.claude/docs/templates/skill-test-spec.md b/.claude/docs/templates/skill-test-spec.md new file mode 100644 index 0000000..c39f703 --- /dev/null +++ b/.claude/docs/templates/skill-test-spec.md @@ -0,0 +1,96 @@ +# Skill Test Spec: /[skill-name] + +## Skill Summary + +[One paragraph: what this skill does, when to use it, what it produces. Include +the primary output artifact, the verdict format it uses, and which pipeline stage +it belongs to.] + +--- + +## Static Assertions (Structural) + +Verified automatically by `/skill-test static` — no fixture needed. + +- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools` +- [ ] Has ≥2 phase headings (## Phase N or numbered ## sections) +- [ ] Contains verdict keywords: [list the ones expected, e.g., PASS, FAIL, CONCERNS] +- [ ] Contains "May I write" collaborative protocol language (if skill writes files) +- [ ] Has a next-step handoff at the end + +--- + +## Test Cases + +### Case 1: Happy Path — [short description] + +**Fixture:** [Describe the assumed project state. Which files exist? What do they +contain? E.g., "game-concept.md exists with all 8 required sections complete. +systems-index.md exists. All MVP GDDs are present and individually reviewed."] + +**Input:** `/[skill-name] [args]` + +**Expected behavior:** +1. [Phase 1 action — what the skill should read or check] +2. [Phase 2 action — what the skill should evaluate] +3. [Phase N action — what the skill should output] + +**Assertions:** +- [ ] Skill reads [specific file] before producing output +- [ ] Output includes verdict keyword [PASS/FAIL/etc.] +- [ ] Output lists [specific content] from the fixture +- [ ] Skill asks for approval before writing any file + +--- + +### Case 2: Failure Path — [short description, e.g., "Missing required artifact"] + +**Fixture:** [Describe the failure state. E.g., "game-concept.md is missing. +No files exist in design/gdd/."] + +**Input:** `/[skill-name] [args]` + +**Expected behavior:** +1. [Phase 1: skill detects missing file] +2. [Phase 2: skill surfaces the gap rather than assuming OK] +3. [Output: FAIL or BLOCKED verdict with specific blocker named] + +**Assertions:** +- [ ] Skill does NOT output PASS when the fixture is incomplete +- [ ] Skill names the specific missing artifact +- [ ] Skill suggests a remediation action (e.g., "Run /[other-skill]") +- [ ] Skill does not create files to fill in the gap without asking + +--- + +### Case 3: Edge Case — [short description, e.g., "No argument provided"] + +**Fixture:** [State of project files for this case] + +**Input:** `/[skill-name]` (no argument) + +**Expected behavior:** +1. [What the skill should do when invoked without arguments] + +**Assertions:** +- [ ] [assertion] + +--- + +## Protocol Compliance + +- [ ] Uses "May I write" before all file writes +- [ ] Presents findings or report before asking for write approval +- [ ] Ends with a recommended next step or follow-up skill +- [ ] Never auto-creates files without explicit user approval +- [ ] Does not skip phases or jump straight to a verdict without checking + +--- + +## Coverage Notes + +[Document what is intentionally NOT tested in this spec and why. Examples: +- "Case 3 (all-mode) is not covered because it runs too many checks to evaluate + in a single spec — test each sub-mode individually." +- "The database integration path is not covered as it requires a live environment." +- "Edge cases involving corrupted YAML files are deferred to a future spec."] diff --git a/.claude/hooks/validate-skill-change.sh b/.claude/hooks/validate-skill-change.sh new file mode 100644 index 0000000..23d657d --- /dev/null +++ b/.claude/hooks/validate-skill-change.sh @@ -0,0 +1,39 @@ +#!/bin/bash +# Claude Code PostToolUse hook: Advises running skill-test after skill file changes +# Fires when any file inside .claude/skills/ is written or edited. +# +# Exit behavior: +# exit 0 = advisory only (non-blocking) +# +# Input schema (PostToolUse for Write|Edit): +# { "tool_name": "Write", "tool_input": { "file_path": "...", "content": "..." } } + +INPUT=$(cat) + +# Parse file path -- use jq if available, fall back to grep +if command -v jq >/dev/null 2>&1; then + FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty') +else + FILE_PATH=$(echo "$INPUT" | grep -oE '"file_path"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"file_path"[[:space:]]*:[[:space:]]*"//;s/"$//') +fi + +# Normalize path separators (Windows backslash to forward slash) +FILE_PATH=$(echo "$FILE_PATH" | sed 's|\\|/|g') + +# Only act on files inside .claude/skills/ +if ! echo "$FILE_PATH" | grep -qE '(^|/)\.claude/skills/'; then + exit 0 +fi + +# Extract skill name from path (.claude/skills/[skill-name]/SKILL.md) +SKILL_NAME=$(echo "$FILE_PATH" | grep -oE '\.claude/skills/[^/]+' | sed 's|\.claude/skills/||') + +if [ -z "$SKILL_NAME" ]; then + exit 0 +fi + +echo "=== Skill Modified: $SKILL_NAME ===" >&2 +echo "Run /skill-test static $SKILL_NAME to validate structural compliance." >&2 +echo "====================================" >&2 + +exit 0 diff --git a/.claude/settings.json b/.claude/settings.json index 4c6688f..68bdfa4 100644 --- a/.claude/settings.json +++ b/.claude/settings.json @@ -74,6 +74,11 @@ "type": "command", "command": "bash .claude/hooks/validate-assets.sh", "timeout": 10 + }, + { + "type": "command", + "command": "bash .claude/hooks/validate-skill-change.sh", + "timeout": 5 } ] } diff --git a/.claude/skills/skill-test/SKILL.md b/.claude/skills/skill-test/SKILL.md new file mode 100644 index 0000000..939e16e --- /dev/null +++ b/.claude/skills/skill-test/SKILL.md @@ -0,0 +1,290 @@ +--- +name: skill-test +description: "Validate skill files for structural compliance and behavioral correctness. Three modes: static (linter), spec (behavioral), audit (coverage report)." +argument-hint: "static [skill-name | all] | spec [skill-name] | audit" +user-invocable: true +allowed-tools: Read, Glob, Grep, Write +context: fork +--- + +# Skill Test + +Validates `.claude/skills/*/SKILL.md` files for structural compliance and +behavioral correctness. No external dependencies — runs entirely within the +existing skill/hook/template architecture. + +**Three modes:** + +| Mode | Command | Purpose | Token Cost | +|------|---------|---------|------------| +| `static` | `/skill-test static [name\|all]` | Structural linter — 7 compliance checks per skill | Low (~1k/skill) | +| `spec` | `/skill-test spec [name]` | Behavioral verifier — evaluates assertions in test spec | Medium (~5k/skill) | +| `audit` | `/skill-test audit` | Coverage report — which skills have specs, last test dates | Low (~2k total) | + +--- + +## Phase 1: Parse Arguments + +Determine mode from the first argument: + +- `static [name]` → run 7 structural checks on one skill +- `static all` → run 7 structural checks on all skills (Glob `.claude/skills/*/SKILL.md`) +- `spec [name]` → read skill + test spec, evaluate assertions +- `audit` (or no argument) → read catalog, list all skills, show coverage + +If argument is missing or unrecognized, output usage and stop. + +--- + +## Phase 2A: Static Mode — Structural Linter + +For each skill being tested, read its `SKILL.md` fully and run all 7 checks: + +### Check 1 — Required Frontmatter Fields +The file must contain all of these in the YAML frontmatter block: +- `name:` +- `description:` +- `argument-hint:` +- `user-invocable:` +- `allowed-tools:` + +**FAIL** if any are absent. + +### Check 2 — Multiple Phases +The skill must have ≥2 numbered phase headings. Look for patterns like: +- `## Phase N` or `## Phase N:` +- `## N.` (numbered top-level sections) +- At least 2 distinct `##` headings if phases aren't explicitly numbered + +**FAIL** if fewer than 2 phase-like headings are found. + +### Check 3 — Verdict Keywords +The skill must contain at least one of: `PASS`, `FAIL`, `CONCERNS`, `APPROVED`, +`BLOCKED`, `COMPLETE`, `READY`, `COMPLIANT`, `NON-COMPLIANT` + +**FAIL** if none are present. + +### Check 4 — Collaborative Protocol Language +The skill must contain ask-before-write language. Look for: +- `"May I write"` (canonical form) +- `"before writing"` or `"approval"` near file-write instructions +- `"ask"` + `"write"` in close proximity (within same section) + +**WARN** if absent (some read-only skills legitimately skip this). +**FAIL** if `allowed-tools` includes `Write` or `Edit` but no ask-before-write language is found. + +### Check 5 — Next-Step Handoff +The skill must end with a recommended next action or follow-up path. Look for: +- A final section mentioning another skill (e.g., `/story-done`, `/gate-check`) +- "Recommended next" or "next step" phrasing +- A "Follow-Up" or "After this" section + +**WARN** if absent. + +### Check 6 — Fork Context Complexity +If frontmatter contains `context: fork`, the skill should have ≥5 phase headings +(`##` level or numbered Phase N headers). Fork context is for complex multi-phase +skills; simple skills should not use it. + +**WARN** if `context: fork` is set but fewer than 5 phases found. + +### Check 7 — Argument Hint Plausibility +`argument-hint` must be non-empty. If the skill body mentions multiple modes +(e.g., "Mode A | Mode B"), the hint should reflect them. Cross-reference the +hint against the first phase's "Parse Arguments" section. + +**WARN** if hint is `""` or if documented modes don't match hint. + +--- + +### Static Mode Output Format + +For a single skill: +``` +=== Skill Static Check: /[name] === + +Check 1 — Frontmatter Fields: PASS +Check 2 — Multiple Phases: PASS (7 phases found) +Check 3 — Verdict Keywords: PASS (PASS, FAIL, CONCERNS) +Check 4 — Collaborative Protocol: PASS ("May I write" found) +Check 5 — Next-Step Handoff: WARN (no follow-up section found) +Check 6 — Fork Context Complexity: PASS (8 phases, context: fork set) +Check 7 — Argument Hint: PASS + +Verdict: WARNINGS (1 warning, 0 failures) +Recommended: Add a "Follow-Up Actions" section at the end of the skill. +``` + +For `static all`, produce a summary table then list any non-compliant skills: +``` +=== Skill Static Check: All 52 Skills === + +Skill | Result | Issues +-----------------------|--------------|------- +gate-check | COMPLIANT | +design-review | COMPLIANT | +story-readiness | WARNINGS | Check 5: no handoff +... + +Summary: 48 COMPLIANT, 3 WARNINGS, 1 NON-COMPLIANT +Aggregate Verdict: N WARNINGS / N FAILURES +``` + +--- + +## Phase 2B: Spec Mode — Behavioral Verifier + +### Step 1 — Locate Files + +Find skill at `.claude/skills/[name]/SKILL.md`. +Find spec at `tests/skills/[name].md`. + +If either is missing: +- Missing skill: "Skill '[name]' not found in `.claude/skills/`." +- Missing spec: "No test spec found for '[name]'. Run `/skill-test audit` to see + coverage gaps, or create a spec using the template at + `.claude/docs/templates/skill-test-spec.md`." + +### Step 2 — Read Both Files + +Read the skill file and test spec file completely. + +### Step 3 — Evaluate Assertions + +For each **Test Case** in the spec: + +1. Read the **Fixture** description (assumed state of project files) +2. Read the **Expected behavior** steps +3. Read each **Assertion** checkbox + +For each assertion, evaluate whether the skill's written instructions, if +followed correctly given the fixture state, would satisfy it. This is a +Claude-evaluated reasoning check, not code execution. + +Mark each assertion: +- **PASS** — skill instructions clearly satisfy this assertion +- **PARTIAL** — skill instructions partially address it, but with ambiguity +- **FAIL** — skill instructions would NOT satisfy this assertion given the fixture + +For **Protocol Compliance** assertions (always present): +- Check whether the skill requires "May I write" before file writes +- Check whether the skill presents findings before requesting approval +- Check whether the skill ends with a recommended next step +- Check whether the skill avoids auto-creating files without approval + +### Step 4 — Build Report + +``` +=== Skill Spec Test: /[name] === +Date: [date] +Spec: tests/skills/[name].md + +Case 1: [Happy Path — name] + Fixture: [summary] + Assertions: + [PASS] [assertion text] + [FAIL] [assertion text] + Reason: The skill's Phase 3 says "..." but the fixture state means "..." + Case Verdict: FAIL + +Case 2: [Edge Case — name] + ... + Case Verdict: PASS + +Protocol Compliance: + [PASS] Uses "May I write" before file writes + [PASS] Presents findings before asking approval + [WARN] No explicit next-step handoff at end + +Overall Verdict: FAIL (1 case failed, 1 warning) +``` + +### Step 5 — Offer to Write Results + +"May I write these results to `tests/results/skill-test-spec-[name]-[date].md` +and update `tests/skills/catalog.yaml`?" + +If yes: +- Write results file to `tests/results/` +- Update the skill's entry in `tests/skills/catalog.yaml`: + - `last_spec: [date]` + - `last_spec_result: PASS|PARTIAL|FAIL` + +--- + +## Phase 2C: Audit Mode — Coverage Report + +### Step 1 — Read Catalog + +Read `tests/skills/catalog.yaml`. If missing, note that catalog doesn't exist +yet (first-run state). + +### Step 2 — Enumerate All Skills + +Glob `.claude/skills/*/SKILL.md` to get the complete list of skills. +Extract skill name from each path (directory name). + +### Step 3 — Build Coverage Table + +For each skill: +- Check if a spec file exists at `tests/skills/[name].md` +- Look up `last_static`, `last_static_result`, `last_spec`, `last_spec_result` + from catalog (or mark as "never" if not in catalog) +- Assign priority: + - `critical` — gate-check, design-review, story-readiness, story-done, review-all-gdds, architecture-review + - `high` — create-epics-stories, create-control-manifest, propagate-design-change, story-done + - `medium` — team-* skills, sprint-plan, sprint-status + - `low` — all others + +### Step 4 — Output Report + +``` +=== Skill Test Coverage Audit === +Date: [date] +Total skills: 52 +Specs written: 4 (7.7%) +Never tested (static): 48 + +Coverage Table: +Skill | Has Spec | Last Static | Static Result | Last Spec | Spec Result | Priority +-----------------------|----------|------------------|---------------|------------------|-------------|---------- +gate-check | YES | never | — | never | — | critical +design-review | YES | never | — | never | — | critical +story-readiness | YES | never | — | never | — | critical +story-done | YES | never | — | never | — | critical +architecture-review | NO | never | — | never | — | critical +review-all-gdds | NO | never | — | never | — | critical +... + +Top 5 Priority Gaps (no spec, critical/high priority): +1. /architecture-review — critical, no spec +2. /review-all-gdds — critical, no spec +3. /create-epics-stories — high, no spec +4. /propagate-design-change — high, no spec +5. /sprint-plan — medium, no spec + +Coverage: 4/52 specs (7.7%) +``` + +No file writes in audit mode. + +Offer: "Would you like to run `/skill-test static all` to check structural +compliance across all skills? Or `/skill-test spec [name]` to run a specific +behavioral test?" + +--- + +## Phase 3: Recommended Next Steps + +After any mode completes, offer contextual follow-up: + +- After `static [name]`: "Run `/skill-test spec [name]` to validate behavioral + correctness if a test spec exists." +- After `static all` with failures: "Address NON-COMPLIANT skills first. Run + `/skill-test static [name]` individually for detailed remediation guidance." +- After `spec [name]` PASS: "Update `tests/skills/catalog.yaml` to record this + pass date. Consider running `/skill-test audit` to find the next spec gap." +- After `spec [name]` FAIL: "Review the failing assertions and update the skill + or the test spec to resolve the mismatch." +- After `audit`: "Start with the critical-priority gaps. Use the spec template + at `.claude/docs/templates/skill-test-spec.md` to create new specs." diff --git a/tests/skills/_fixtures/incomplete-gdd.md b/tests/skills/_fixtures/incomplete-gdd.md new file mode 100644 index 0000000..4764657 --- /dev/null +++ b/tests/skills/_fixtures/incomplete-gdd.md @@ -0,0 +1,51 @@ +# GDD: Light Manipulation System + +## Overview + +The light manipulation system allows players to interact with bioluminescent +organisms and ancient light conduits to redirect beams of light. Light beams +illuminate dark areas, power ancient mechanisms, and reveal hidden surfaces. + +## Player Fantasy + +The player should feel like a puzzle archaeologist — discovering the logic of +an alien but internally consistent technology. The "aha" moment when a complex +light path clicks into place should feel earned and satisfying. + +## Detailed Rules + +- Players can pick up portable light sources (max 3 carried at once) +- Stationary conduits redirect beams at fixed angles (45°/90°/135°/180°) +- Light beams are blocked by solid terrain and most objects +- Living bioluminescent organisms pulse light on a 3-second cycle +- Ancient mirrors rotate freely and redirect any light beam that touches them +- A beam must reach a receptor to activate a mechanism + +## Formulas + +[SECTION MISSING — not yet authored] + +## Edge Cases + +[SECTION MISSING — not yet authored] + +## Dependencies + +- **Oxygen System**: Light sources consume no oxygen but picking them up takes + time (opportunity cost with oxygen drain) +- **Cave Navigation**: Illuminated paths reveal branching routes not visible + in darkness +- Player Inventory System (not yet designed) + +## Tuning Knobs + +[SECTION MISSING — not yet authored] + +## Acceptance Criteria + +[SECTION MISSING — not yet authored] + +--- + +*Status: Draft — 4/8 required sections populated* +*Last updated: 2026-03-13* diff --git a/tests/skills/_fixtures/minimal-game-concept.md b/tests/skills/_fixtures/minimal-game-concept.md new file mode 100644 index 0000000..ea86346 --- /dev/null +++ b/tests/skills/_fixtures/minimal-game-concept.md @@ -0,0 +1,62 @@ +# Game Concept: Echoes of the Deep + +## Overview + +Echoes of the Deep is a single-player atmospheric puzzle-platformer set in +a bioluminescent underwater cave network. Players control a deep-sea diver +exploring ancient ruins while managing oxygen supplies and manipulating light +sources to reveal hidden paths and solve environmental puzzles. + +## Player Fantasy + +The player should feel like a lone explorer uncovering a lost civilization, +experiencing wonder at beautiful environments, and the satisfying "aha" moment +when a clever puzzle clicks into place. The oxygen mechanic creates gentle +pressure without punishing failure harshly. + +## Core Loop + +1. **Explore** — navigate branching cave sections using light and movement +2. **Discover** — find oxygen caches, light sources, and ancient mechanisms +3. **Solve** — manipulate light and environment to unlock new areas +4. **Progress** — unlock deeper cave sections with escalating complexity + +## Game Pillars + +1. **Wonder** — every area should contain something visually or mechanically surprising +2. **Accessibility** — the game should be completable without frustration; oxygen + manages pacing, not punishment +3. **Environmental Storytelling** — the ruins tell a story without text exposition + +## Target Audience + +Casual-to-midcore players who enjoy relaxed exploration games (Subnautica, +Journey, ABZÛ) and puzzle games that reward observation over reflexes. +Target age: 16+. Target sessions: 30–90 minutes. + +## Unique Selling Points + +- Bioluminescent light manipulation as the core puzzle mechanic +- No enemies — tension comes from environment and resource management +- Procedurally decorated (handcrafted levels, procedural detail pass) + +## Technical Scope + +- **Engine**: Godot 4.6 +- **Platform**: PC (Steam), with console ports post-launch +- **Team size**: Solo developer +- **Target completion**: 12-month development cycle +- **Scope**: 4–6 hours main story, 8–12 hours completionist + +## Art Direction + +Darkly atmospheric with vibrant bioluminescence providing the primary color +palette. Deep blues, purples, and blacks punctuated by greens, teals, and +ambers from living organisms and ancient technology. + +## Fun Hypothesis + +Players will feel rewarded by the combination of visual beauty and the +satisfying moment of discovering how light manipulation solves each puzzle. +The oxygen system will create just enough pressure to make exploration feel +meaningful without making death feel punishing. diff --git a/tests/skills/catalog.yaml b/tests/skills/catalog.yaml new file mode 100644 index 0000000..7c82b4e --- /dev/null +++ b/tests/skills/catalog.yaml @@ -0,0 +1,438 @@ +version: 1 +last_updated: "" +skills: + # Critical — gate skills that control phase transitions + - name: gate-check + spec: tests/skills/gate-check.md + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: critical + + - name: design-review + spec: tests/skills/design-review.md + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: critical + + - name: story-readiness + spec: tests/skills/story-readiness.md + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: critical + + - name: story-done + spec: tests/skills/story-done.md + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: critical + + - name: review-all-gdds + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: critical + + - name: architecture-review + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: critical + + # High — pipeline-critical skills + - name: create-epics-stories + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: high + + - name: create-control-manifest + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: high + + - name: propagate-design-change + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: high + + - name: architecture-decision + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: high + + - name: map-systems + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: high + + - name: design-system + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: high + + # Medium — team and sprint management skills + - name: sprint-plan + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: medium + + - name: sprint-status + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: medium + + - name: team-ui + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: medium + + - name: team-combat + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: medium + + - name: team-narrative + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: medium + + - name: team-audio + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: medium + + - name: team-level + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: medium + + - name: team-polish + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: medium + + - name: team-release + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: medium + + - name: team-live-ops + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: medium + + # Low — analysis, reporting, utility skills + - name: skill-test + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: medium + + - name: start + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: help + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: brainstorm + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: project-stage-detect + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: setup-engine + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: quick-design + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: ux-design + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: ux-review + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: code-review + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: balance-check + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: asset-audit + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: reverse-document + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: create-architecture + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: content-audit + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: bug-report + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: hotfix + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: prototype + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: playtest-report + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: perf-profile + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: tech-debt + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: scope-check + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: estimate + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: milestone-review + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: retrospective + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: changelog + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: patch-notes + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: onboard + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: localize + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: launch-checklist + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: release-checklist + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low + + - name: adopt + spec: "" + last_static: "" + last_static_result: "" + last_spec: "" + last_spec_result: "" + priority: low diff --git a/tests/skills/design-review.md b/tests/skills/design-review.md new file mode 100644 index 0000000..ef6af82 --- /dev/null +++ b/tests/skills/design-review.md @@ -0,0 +1,144 @@ +# Skill Test Spec: /design-review + +## Skill Summary + +`/design-review` reads a game design document (GDD) and evaluates it against +the project's 8-section design standard (Overview, Player Fantasy, Detailed +Rules, Formulas, Edge Cases, Dependencies, Tuning Knobs, Acceptance Criteria). +It checks for internal consistency, implementability, and cross-system +conflicts. It produces a verdict of APPROVED, NEEDS REVISION, or MAJOR +REVISION NEEDED. It is a read-only skill (no file writes) and runs as a +`context: fork` subagent. + +--- + +## Static Assertions (Structural) + +Verified automatically by `/skill-test static` — no fixture needed. + +- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools` +- [ ] Has ≥2 phase headings or numbered steps +- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED +- [ ] Does NOT require "May I write" language (read-only skill — `allowed-tools` excludes Write/Edit) +- [ ] Output format is documented (review template shown in skill body) + +--- + +## Test Cases + +### Case 1: Happy Path — Complete GDD, all 8 sections present + +**Fixture:** +- `design/gdd/light-manipulation.md` exists (use `_fixtures/minimal-game-concept.md` + as a stand-in — represents a complete document with all required content) +- All 8 required sections are populated with substantive content +- Formulas section contains at least one formula with defined variables +- Acceptance Criteria section contains at least 3 testable criteria + +**Input:** `/design-review design/gdd/light-manipulation.md` + +**Expected behavior:** +1. Skill reads the target document in full +2. Skill reads CLAUDE.md for project context and standards +3. Skill evaluates all 8 required sections (present/absent check) +4. Skill checks internal consistency (formulas match described behavior) +5. Skill checks implementability (rules are precise enough to code) +6. Skill outputs structured review with section-by-section status +7. Skill outputs APPROVED verdict + +**Assertions:** +- [ ] Skill reads the target file before producing any output +- [ ] Output includes a "Completeness" section showing X/8 sections present +- [ ] Output includes an "Internal Consistency" section +- [ ] Output includes an "Implementability" section +- [ ] Output ends with a verdict line: APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED +- [ ] APPROVED verdict is given when all 8 sections are present and consistent + +--- + +### Case 2: Failure Path — Incomplete GDD (4/8 sections) + +**Fixture:** +- `design/gdd/light-manipulation.md` exists using content from + `tests/skills/_fixtures/incomplete-gdd.md` (4 of 8 sections populated; + Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria are missing) + +**Input:** `/design-review design/gdd/light-manipulation.md` + +**Expected behavior:** +1. Skill reads the document +2. Skill identifies 4 missing sections +3. Skill outputs "Completeness: 4/8 sections present" +4. Skill lists specifically which 4 sections are missing +5. Skill outputs MAJOR REVISION NEEDED verdict (not APPROVED or NEEDS REVISION) + +**Assertions:** +- [ ] Output shows "4/8" in the completeness section (not a higher number) +- [ ] Output explicitly names each missing section (Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria) +- [ ] Verdict is MAJOR REVISION NEEDED (not APPROVED or NEEDS REVISION) when ≥3 sections are missing +- [ ] Output does not suggest the document is implementation-ready +- [ ] Skill does not write any files (read-only enforcement) + +--- + +### Case 3: Partial Path — 7/8 sections, minor inconsistency + +**Fixture:** +- GDD has all sections except Formulas +- The described behavior mentions numeric values but no formulas are defined +- Acceptance Criteria exist but are vague ("feels good" rather than measurable) + +**Input:** `/design-review design/gdd/[document].md` + +**Expected behavior:** +1. Skill identifies missing Formulas section +2. Skill flags vague acceptance criteria as an implementability issue +3. Skill outputs NEEDS REVISION verdict (not APPROVED, not MAJOR REVISION NEEDED) +4. Skill provides specific remediation notes for each issue + +**Assertions:** +- [ ] Verdict is NEEDS REVISION (not APPROVED, not MAJOR REVISION NEEDED) for 7/8 with issues +- [ ] Output identifies the missing Formulas section specifically +- [ ] Output flags the vague acceptance criteria as an implementability gap +- [ ] Each flagged issue has a specific, actionable remediation note + +--- + +### Case 4: Edge Case — File not found + +**Fixture:** +- The path provided does not exist in the project + +**Input:** `/design-review design/gdd/nonexistent.md` + +**Expected behavior:** +1. Skill attempts to read the file +2. File not found +3. Skill outputs an error message naming the missing file +4. Skill suggests checking the path or listing files in `design/gdd/` +5. Skill does NOT produce a verdict + +**Assertions:** +- [ ] Skill outputs a clear error when the file is not found +- [ ] Skill does NOT output APPROVED, NEEDS REVISION, or MAJOR REVISION NEEDED when file is missing +- [ ] Skill suggests a corrective action (check path, list available GDDs) + +--- + +## Protocol Compliance + +- [ ] Does NOT use Write or Edit tools (read-only skill) +- [ ] Presents complete findings before any verdict +- [ ] Does not ask for approval before producing output (no writes to approve) +- [ ] Ends with recommended next step (e.g., fix issues and re-run, or proceed to `/map-systems`) + +--- + +## Coverage Notes + +- Cross-system consistency checking (Case 3 in the skill's own phase list) is + not directly tested here because it requires multiple GDD files to compare; + this is covered by the `/review-all-gdds` spec instead. +- The skill's `context: fork` behavior (running as a subagent) is not tested + at the spec level — this is a runtime behavior verified manually. +- Performance and edge cases involving very large GDD files are not in scope. diff --git a/tests/skills/gate-check.md b/tests/skills/gate-check.md new file mode 100644 index 0000000..08c64bf --- /dev/null +++ b/tests/skills/gate-check.md @@ -0,0 +1,144 @@ +# Skill Test Spec: /gate-check + +## Skill Summary + +`/gate-check` validates whether the project is ready to advance to the next +development phase. It checks for required artifacts, runs quality checks, asks +the user about unverifiable items, and produces a PASS/CONCERNS/FAIL verdict. +On PASS with user confirmation, it writes the new stage name to +`production/stage.txt`. It governs all 6 phase transitions and is the most +critical gate-keeping skill in the pipeline. + +--- + +## Static Assertions (Structural) + +Verified automatically by `/skill-test static` — no fixture needed. + +- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools` +- [ ] Has ≥2 phase headings (numbered Phase N or ## sections) +- [ ] Contains verdict keywords: PASS, CONCERNS, FAIL +- [ ] Contains "May I write" collaborative protocol language +- [ ] Has a next-step handoff at the end (Follow-Up Actions section) + +--- + +## Test Cases + +### Case 1: Happy Path — All Concept artifacts present, advancing to Systems Design + +**Fixture:** +- `design/gdd/game-concept.md` exists, has content including all required sections +- `design/gdd/game-pillars.md` exists (or pillars defined within concept doc) +- No systems index yet (which is correct for this stage) + +**Input:** `/gate-check systems-design` + +**Expected behavior:** +1. Skill reads `design/gdd/game-concept.md` and verifies it has content +2. Skill checks for game pillars (in concept or separate file) +3. Skill checks quality items (core loop described, target audience identified) +4. Skill outputs structured checklist with all items marked +5. Skill presents PASS/CONCERNS/FAIL verdict +6. If PASS: skill asks "May I update `production/stage.txt` to 'Systems Design'?" + +**Assertions:** +- [ ] Skill uses Glob or Read to verify `design/gdd/game-concept.md` exists before marking it checked +- [ ] Output includes a "Required Artifacts" section with check status per item +- [ ] Output includes a "Quality Checks" section with check status per item +- [ ] Output includes a "Verdict" line with one of PASS / CONCERNS / FAIL +- [ ] Skill asks about unverifiable quality items (e.g., "Has this been reviewed?") rather than assuming PASS +- [ ] Skill asks "May I write" before updating `production/stage.txt` +- [ ] Skill does NOT write `production/stage.txt` without explicit user confirmation + +--- + +### Case 2: Failure Path — Missing required artifacts for Concept → Systems Design + +**Fixture:** +- `design/gdd/game-concept.md` does NOT exist +- No game pillars document exists +- `design/gdd/` directory is empty or absent + +**Input:** `/gate-check systems-design` + +**Expected behavior:** +1. Skill attempts to read `design/gdd/game-concept.md` — file not found +2. Skill marks required artifact as missing (not present) +3. Skill outputs FAIL verdict +4. Skill lists blocker: "No game concept document found" +5. Skill suggests remediation: run `/brainstorm` to create one + +**Assertions:** +- [ ] Verdict is FAIL (not PASS or CONCERNS) when required artifacts are missing +- [ ] Output explicitly names `design/gdd/game-concept.md` as missing +- [ ] Output includes a "Blockers" section with at least 1 item +- [ ] Output recommends `/brainstorm` as the remediation action +- [ ] Skill does NOT write `production/stage.txt` when verdict is FAIL + +--- + +### Case 3: No Argument — Auto-detect current stage + +**Fixture:** +- `production/stage.txt` contains `Concept` +- `design/gdd/game-concept.md` exists with content +- No systems index yet + +**Input:** `/gate-check` (no argument) + +**Expected behavior:** +1. Skill reads `production/stage.txt` to determine current stage +2. Skill determines the next gate is Concept → Systems Design +3. Skill proceeds with the Systems Design gate checks +4. Output clearly states which transition is being validated + +**Assertions:** +- [ ] Skill reads `production/stage.txt` (or uses project-stage-detect heuristics) to determine current stage +- [ ] Output header names both current and target phases (e.g., "Gate Check: Concept → Systems Design") +- [ ] Skill does not ask the user which gate to check if current stage is determinable + +--- + +### Case 4: Edge Case — Manual check items flagged correctly + +**Fixture:** +- All required artifacts for Concept → Systems Design are present +- No playtest or review record exists (can't auto-verify quality checks) + +**Input:** `/gate-check systems-design` + +**Expected behavior:** +1. Skill verifies all artifact files exist +2. Skill encounters quality check: "Game concept reviewed (not MAJOR REVISION NEEDED)" +3. Since no review record exists, skill marks item as MANUAL CHECK NEEDED +4. Skill asks the user: "Has the game concept been reviewed for design quality?" +5. Skill waits for user input before finalizing verdict + +**Assertions:** +- [ ] Items that cannot be auto-verified are marked `[?] MANUAL CHECK NEEDED` rather than assumed PASS +- [ ] Skill uses a question to the user for at least one unverifiable quality item +- [ ] Skill does not mark unverifiable items as PASS by default + +--- + +## Protocol Compliance + +- [ ] Uses "May I write" before updating `production/stage.txt` +- [ ] Presents the full checklist report before asking for write approval +- [ ] Ends with a "Follow-Up Actions" section listing next steps per verdict +- [ ] Never advances the stage without explicit user confirmation +- [ ] Never auto-creates `production/stage.txt` if it doesn't exist without asking + +--- + +## Coverage Notes + +- The Production → Polish and Polish → Release gates are not covered here + because they require complex multi-artifact setups (sprint plans, playtest + data, QA sign-off); these are deferred to dedicated follow-up specs. +- The "CONCERNS" verdict path (minor gaps, not blocking) is not explicitly + tested here; it falls between Case 1 and Case 2 and follows the same pattern. +- The Vertical Slice validation block (Pre-Production → Production gate) is not + covered because it requires a playable build context that cannot be expressed + as a document fixture. diff --git a/tests/skills/story-done.md b/tests/skills/story-done.md new file mode 100644 index 0000000..2a072bf --- /dev/null +++ b/tests/skills/story-done.md @@ -0,0 +1,165 @@ +# Skill Test Spec: /story-done + +## Skill Summary + +`/story-done` closes the loop between design and implementation. Run at the +end of implementing a story, it reads the story file and verifies each +acceptance criterion against the implementation. It checks for GDD and ADR +deviations, prompts a code review, updates the story status to `Complete`, +logs any tech debt, and surfaces the next ready story from the sprint. It +produces a COMPLETE / COMPLETE WITH NOTES / BLOCKED verdict and writes to +the story file and optionally to `docs/tech-debt-register.md`. + +--- + +## Static Assertions (Structural) + +Verified automatically by `/skill-test static` — no fixture needed. + +- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools` +- [ ] Has ≥5 phase headings (complex skill warranting `context: fork` if applicable) +- [ ] Contains verdict keywords: COMPLETE, BLOCKED +- [ ] Contains "May I write" collaborative protocol language (writes to story file and tech-debt register) +- [ ] Has a next-step handoff (surfaces next story from sprint) + +--- + +## Test Cases + +### Case 1: Happy Path — All acceptance criteria met, no deviations + +**Fixture:** +- Story file at `production/epics/core/story-light-pickup.md` with: + - 3 acceptance criteria, all implemented as described + - `TR-ID: TR-light-001` referencing a GDD requirement + - `ADR: docs/architecture/adr-003-inventory.md` (Accepted) + - `Status: In Progress` +- Implementation files listed in story exist in `src/` +- GDD requirement text at TR-light-001 matches how the feature was implemented +- ADR guidance was followed (no deviations) + +**Input:** `/story-done production/epics/core/story-light-pickup.md` + +**Expected behavior:** +1. Skill reads the story file and extracts all key fields +2. Skill reads the GDD requirement fresh from `tr-registry.yaml` (not from story's quoted text) +3. Skill reads the referenced ADR to understand implementation constraints +4. Skill evaluates each acceptance criterion (auto where possible, manual prompt where not) +5. Skill checks for GDD requirement deviations +6. Skill checks for ADR guideline deviations +7. Skill prompts user: "Please provide the code review outcome for this story" +8. Skill presents COMPLETE verdict +9. Skill asks "May I update story Status to Complete and add Completion Notes?" +10. If yes: skill updates the story file +11. Skill surfaces the next `Ready for Dev` story from the sprint + +**Assertions:** +- [ ] Skill reads `docs/architecture/tr-registry.yaml` for TR-ID requirement text (not just story) +- [ ] Skill reads the referenced ADR file (not just the story reference) +- [ ] Each acceptance criterion is listed with VERIFIED / DEFERRED / FAILED status +- [ ] Skill prompts the user for code review outcome (does not skip this step) +- [ ] Verdict is COMPLETE when all criteria are verified and no deviations exist +- [ ] Skill asks "May I write" before updating the story file +- [ ] Skill does NOT auto-update story status without user confirmation +- [ ] After completion, skill surfaces the next ready story from `production/sprints/` + +--- + +### Case 2: Blocked Path — Acceptance criterion cannot be verified + +**Fixture:** +- Story file has an acceptance criterion: "Player sees correct animation on pickup" +- No automated test for this criterion exists +- Manual verification has not been performed +- All other criteria are met + +**Input:** `/story-done production/epics/core/story-light-pickup.md` + +**Expected behavior:** +1. Skill processes all acceptance criteria +2. Reaches the animation criterion — cannot auto-verify +3. Skill asks the user: "Acceptance criterion 'Player sees correct animation on + pickup' cannot be auto-verified. Has this been manually tested?" +4. If user says No: criterion is marked DEFERRED, verdict becomes COMPLETE WITH NOTES +5. Skill records the deferred criterion in completion notes +6. Asks "May I write updated story with deferred criterion noted?" + +**Assertions:** +- [ ] Skill asks the user about unverifiable criteria rather than assuming PASS +- [ ] Deferred criteria result in COMPLETE WITH NOTES (not COMPLETE or BLOCKED) +- [ ] The deferred criterion is explicitly named in the completion notes +- [ ] Skill still asks "May I write" before updating the story file + +--- + +### Case 3: Blocked Path — GDD deviation detected + +**Fixture:** +- Story TR-ID points to requirement: "Player can carry max 3 light sources" +- Implementation in `src/` uses a variable `MAX_CARRIED_LIGHTS = 5` +- This is a deliberate deviation from the GDD + +**Input:** `/story-done production/epics/core/story-light-pickup.md` + +**Expected behavior:** +1. Skill reads the GDD requirement text (max 3) +2. Skill detects discrepancy between requirement and implementation value (5) +3. Skill flags this as a GDD deviation and asks the user to classify it: + - INTENTIONAL: document the deviation and reason + - ERROR: implementation must be fixed before story can be marked Complete + - OUT OF SCOPE: requirement changed and GDD needs updating +4. If INTENTIONAL: skill records deviation in completion notes, verdict is COMPLETE WITH NOTES +5. If ERROR: verdict is BLOCKED until implementation is corrected + +**Assertions:** +- [ ] Skill detects the mismatch between GDD requirement and implementation value +- [ ] Skill asks the user to classify the deviation (not auto-assumes either way) +- [ ] INTENTIONAL deviation → COMPLETE WITH NOTES (not BLOCKED) +- [ ] ERROR deviation → BLOCKED verdict until fixed +- [ ] Detected deviations are recorded in completion notes or tech debt register + +--- + +### Case 4: Edge Case — No argument, auto-detect current story + +**Fixture:** +- `production/session-state/active.md` contains a reference to + `production/epics/core/story-oxygen-drain.md` as the active story +- That story file exists with `Status: In Progress` + +**Input:** `/story-done` (no argument) + +**Expected behavior:** +1. Skill reads `production/session-state/active.md` +2. Skill finds the active story reference +3. Skill reads that story file and proceeds normally +4. Output confirms which story was auto-detected + +**Assertions:** +- [ ] Skill reads `production/session-state/active.md` when no argument is given +- [ ] Skill identifies and confirms the auto-detected story before proceeding +- [ ] If no story is found in session state, skill asks the user to provide a path + +--- + +## Protocol Compliance + +- [ ] Uses "May I write" before updating the story file +- [ ] Uses "May I write" before adding entries to `docs/tech-debt-register.md` +- [ ] Presents complete findings (criteria check, deviation check) before asking approval +- [ ] Ends by surfacing the next ready story from the sprint plan +- [ ] Does not mark a story Complete if any criteria are in ERROR state +- [ ] Does not skip the code review prompt + +--- + +## Coverage Notes + +- The full 8-phase flow of the skill is exercised across Cases 1-3; not all + edge cases within each phase are covered. +- Tech debt logging (deferred items written to `docs/tech-debt-register.md`) + is mentioned in Case 2 but not the primary assertion focus; dedicated + coverage deferred. +- The `sprint-status.yaml` update (Phase 7 in the skill) is implied by Case 1 + but not the primary assertion; assumed to follow the same "May I write" pattern. +- Stories with multiple TR-IDs or multiple ADRs are not explicitly tested. diff --git a/tests/skills/story-readiness.md b/tests/skills/story-readiness.md new file mode 100644 index 0000000..946fe0d --- /dev/null +++ b/tests/skills/story-readiness.md @@ -0,0 +1,153 @@ +# Skill Test Spec: /story-readiness + +## Skill Summary + +`/story-readiness` validates that a story file is ready for a developer to +pick up and implement. It checks four dimensions: Design (embedded GDD +requirements), Architecture (ADR references and status), Scope (clear +boundaries and DoD), and Definition of Done (testable criteria). It produces +a READY / NEEDS WORK / BLOCKED verdict. It is a read-only skill and runs +before any developer picks up a story. + +--- + +## Static Assertions (Structural) + +Verified automatically by `/skill-test static` — no fixture needed. + +- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools` +- [ ] Has ≥2 phase headings or numbered check sections +- [ ] Contains verdict keywords: READY, NEEDS WORK, BLOCKED +- [ ] Does NOT require "May I write" language (read-only skill) +- [ ] Has a next-step handoff (what to do after verdict) + +--- + +## Test Cases + +### Case 1: Happy Path — Fully ready story + +**Fixture:** +- Story file exists at `production/epics/core/story-light-pickup.md` +- Story contains: + - `TR-ID: TR-light-001` (GDD requirement reference) + - `ADR: docs/architecture/adr-003-inventory.md` + - Referenced ADR exists and has status `Accepted` + - Referenced TR-ID exists in `docs/architecture/tr-registry.yaml` + - Story has `## Acceptance Criteria` with ≥3 testable items + - Story has `## Definition of Done` section + - Story has `Status: Ready for Dev` + - Manifest version in story header matches current `docs/architecture/control-manifest.md` + +**Input:** `/story-readiness production/epics/core/story-light-pickup.md` + +**Expected behavior:** +1. Skill reads the story file +2. Skill reads the referenced ADR — verifies status is `Accepted` +3. Skill reads `docs/architecture/tr-registry.yaml` — verifies TR-ID exists +4. Skill reads `docs/architecture/control-manifest.md` — verifies manifest version matches +5. Skill evaluates all 4 dimensions (Design, Architecture, Scope, DoD) +6. Skill outputs READY verdict with all checks passing + +**Assertions:** +- [ ] Skill reads the referenced ADR file (not just the story) +- [ ] Skill verifies ADR status is `Accepted` (not `Proposed`) +- [ ] Skill reads `tr-registry.yaml` to verify TR-ID exists +- [ ] Output includes check results for all 4 dimensions +- [ ] Verdict is READY when all checks pass +- [ ] Skill does not write any files + +--- + +### Case 2: Blocked Path — Referenced ADR is Proposed (not Accepted) + +**Fixture:** +- Story file exists with `ADR: docs/architecture/adr-005-light-system.md` +- `adr-005-light-system.md` exists but has `Status: Proposed` +- All other story content is otherwise complete + +**Input:** `/story-readiness production/epics/core/story-light-system.md` + +**Expected behavior:** +1. Skill reads the story +2. Skill reads `adr-005-light-system.md` — finds `Status: Proposed` +3. Skill flags this as a BLOCKING issue (cannot implement against unaccepted ADR) +4. Skill outputs BLOCKED verdict +5. Skill recommends: accept or reject the ADR before picking up the story + +**Assertions:** +- [ ] Verdict is BLOCKED (not NEEDS WORK or READY) when ADR is Proposed +- [ ] Output explicitly names the Proposed ADR as the blocker +- [ ] Output recommends resolving ADR status before proceeding +- [ ] Skill does not output READY regardless of other checks passing + +--- + +### Case 3: Needs Work — Missing Acceptance Criteria + +**Fixture:** +- Story file exists but has no `## Acceptance Criteria` section +- ADR reference exists and is `Accepted` +- TR-ID exists in registry +- Manifest version matches + +**Input:** `/story-readiness production/epics/core/story-oxygen-drain.md` + +**Expected behavior:** +1. Skill reads the story +2. Skill finds no Acceptance Criteria section +3. Skill flags this as a NEEDS WORK issue (story is incomplete, not blocked) +4. Skill outputs NEEDS WORK verdict +5. Skill names the missing section and suggests adding measurable criteria + +**Assertions:** +- [ ] Verdict is NEEDS WORK (not BLOCKED or READY) when Acceptance Criteria section is absent +- [ ] Output identifies the missing Acceptance Criteria section specifically +- [ ] Output suggests adding testable/measurable criteria +- [ ] Skill distinguishes NEEDS WORK (fixable without external dependencies) from BLOCKED (requires outside action) + +--- + +### Case 4: Edge Case — Stale manifest version + +**Fixture:** +- Story file has `Manifest Version: 2026-01-15` in its header +- `docs/architecture/control-manifest.md` has `Manifest Version: 2026-03-10` +- Versions do not match (story was created before manifest was updated) + +**Input:** `/story-readiness production/epics/core/story-mirror-rotation.md` + +**Expected behavior:** +1. Skill reads the story and extracts manifest version `2026-01-15` +2. Skill reads control manifest header and extracts current version `2026-03-10` +3. Skill detects version mismatch +4. Skill flags this as an ADVISORY issue (not blocking, but worth noting) +5. Verdict is NEEDS WORK with manifest staleness noted + +**Assertions:** +- [ ] Skill reads `docs/architecture/control-manifest.md` to get current version +- [ ] Skill compares story's embedded manifest version against current manifest version +- [ ] Stale manifest version results in NEEDS WORK (not BLOCKED, not READY) +- [ ] Output explains that the story's embedded guidance may be outdated + +--- + +## Protocol Compliance + +- [ ] Does NOT use Write or Edit tools (read-only skill) +- [ ] Presents complete check results before verdict +- [ ] Does not ask for approval (no file writes) +- [ ] Ends with recommended next step (fix issues or proceed to implementation) +- [ ] Distinguishes three verdict levels clearly (READY vs NEEDS WORK vs BLOCKED) + +--- + +## Coverage Notes + +- Case where TR-ID is missing from the registry entirely is not explicitly + tested here; it follows the same NEEDS WORK pattern as Case 3. +- The "no argument" path (skill auto-detecting the current story) is not + tested because it depends on `production/session-state/active.md` content, + which is hard to fixture reliably. +- Stories with multiple ADR references are not tested; behavior is assumed to + be additive (all ADRs must be Accepted for READY verdict).