- New skill: /skill-test (static | spec | audit modes) - static: 7-check structural linter per skill file - spec: Claude-evaluated behavioral assertions against test specs - audit: coverage report across all 52 skills with priority gaps - New hook: validate-skill-change.sh — advisory reminder to lint after skill edits - New template: skill-test-spec.md — standard structure for authoring test specs - New: tests/skills/catalog.yaml — machine-readable coverage index (52 skills) - New: tests/skills/_fixtures/ — shared fixtures (complete concept, incomplete GDD) - New: 4 seed test specs for critical gate skills (gate-check, design-review, story-readiness, story-done) — 4 cases each - Modified: settings.json — validate-skill-change.sh added to PostToolUse hook Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.7 KiB
Skill Test Spec: /design-review
Skill Summary
/design-review reads a game design document (GDD) and evaluates it against
the project's 8-section design standard (Overview, Player Fantasy, Detailed
Rules, Formulas, Edge Cases, Dependencies, Tuning Knobs, Acceptance Criteria).
It checks for internal consistency, implementability, and cross-system
conflicts. It produces a verdict of APPROVED, NEEDS REVISION, or MAJOR
REVISION NEEDED. It is a read-only skill (no file writes) and runs as a
context: fork subagent.
Static Assertions (Structural)
Verified automatically by /skill-test static — no fixture needed.
- Has required frontmatter fields:
name,description,argument-hint,user-invocable,allowed-tools - Has ≥2 phase headings or numbered steps
- Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
- Does NOT require "May I write" language (read-only skill —
allowed-toolsexcludes Write/Edit) - Output format is documented (review template shown in skill body)
Test Cases
Case 1: Happy Path — Complete GDD, all 8 sections present
Fixture:
design/gdd/light-manipulation.mdexists (use_fixtures/minimal-game-concept.mdas a stand-in — represents a complete document with all required content)- All 8 required sections are populated with substantive content
- Formulas section contains at least one formula with defined variables
- Acceptance Criteria section contains at least 3 testable criteria
Input: /design-review design/gdd/light-manipulation.md
Expected behavior:
- Skill reads the target document in full
- Skill reads CLAUDE.md for project context and standards
- Skill evaluates all 8 required sections (present/absent check)
- Skill checks internal consistency (formulas match described behavior)
- Skill checks implementability (rules are precise enough to code)
- Skill outputs structured review with section-by-section status
- Skill outputs APPROVED verdict
Assertions:
- Skill reads the target file before producing any output
- Output includes a "Completeness" section showing X/8 sections present
- Output includes an "Internal Consistency" section
- Output includes an "Implementability" section
- Output ends with a verdict line: APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED
- APPROVED verdict is given when all 8 sections are present and consistent
Case 2: Failure Path — Incomplete GDD (4/8 sections)
Fixture:
design/gdd/light-manipulation.mdexists using content fromtests/skills/_fixtures/incomplete-gdd.md(4 of 8 sections populated; Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria are missing)
Input: /design-review design/gdd/light-manipulation.md
Expected behavior:
- Skill reads the document
- Skill identifies 4 missing sections
- Skill outputs "Completeness: 4/8 sections present"
- Skill lists specifically which 4 sections are missing
- Skill outputs MAJOR REVISION NEEDED verdict (not APPROVED or NEEDS REVISION)
Assertions:
- Output shows "4/8" in the completeness section (not a higher number)
- Output explicitly names each missing section (Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria)
- Verdict is MAJOR REVISION NEEDED (not APPROVED or NEEDS REVISION) when ≥3 sections are missing
- Output does not suggest the document is implementation-ready
- Skill does not write any files (read-only enforcement)
Case 3: Partial Path — 7/8 sections, minor inconsistency
Fixture:
- GDD has all sections except Formulas
- The described behavior mentions numeric values but no formulas are defined
- Acceptance Criteria exist but are vague ("feels good" rather than measurable)
Input: /design-review design/gdd/[document].md
Expected behavior:
- Skill identifies missing Formulas section
- Skill flags vague acceptance criteria as an implementability issue
- Skill outputs NEEDS REVISION verdict (not APPROVED, not MAJOR REVISION NEEDED)
- Skill provides specific remediation notes for each issue
Assertions:
- Verdict is NEEDS REVISION (not APPROVED, not MAJOR REVISION NEEDED) for 7/8 with issues
- Output identifies the missing Formulas section specifically
- Output flags the vague acceptance criteria as an implementability gap
- Each flagged issue has a specific, actionable remediation note
Case 4: Edge Case — File not found
Fixture:
- The path provided does not exist in the project
Input: /design-review design/gdd/nonexistent.md
Expected behavior:
- Skill attempts to read the file
- File not found
- Skill outputs an error message naming the missing file
- Skill suggests checking the path or listing files in
design/gdd/ - Skill does NOT produce a verdict
Assertions:
- Skill outputs a clear error when the file is not found
- Skill does NOT output APPROVED, NEEDS REVISION, or MAJOR REVISION NEEDED when file is missing
- Skill suggests a corrective action (check path, list available GDDs)
Protocol Compliance
- Does NOT use Write or Edit tools (read-only skill)
- Presents complete findings before any verdict
- Does not ask for approval before producing output (no writes to approve)
- Ends with recommended next step (e.g., fix issues and re-run, or proceed to
/map-systems)
Coverage Notes
- Cross-system consistency checking (Case 3 in the skill's own phase list) is
not directly tested here because it requires multiple GDD files to compare;
this is covered by the
/review-all-gddsspec instead. - The skill's
context: forkbehavior (running as a subagent) is not tested at the spec level — this is a runtime behavior verified manually. - Performance and edge cases involving very large GDD files are not in scope.