Add /skill-test suite: linter, behavioral specs, and coverage catalog for 52 skills

- New skill: /skill-test (static | spec | audit modes) - static: 7-check structural linter per skill file - spec: Claude-evaluated behavioral assertions against test specs - audit: coverage report across all 52 skills with priority gaps - New hook: validate-skill-change.sh — advisory reminder to lint after skill edits - New template: skill-test-spec.md — standard structure for authoring test specs - New: tests/skills/catalog.yaml — machine-readable coverage index (52 skills) - New: tests/skills/_fixtures/ — shared fixtures (complete concept, incomplete GDD) - New: 4 seed test specs for critical gate skills (gate-check, design-review, story-readiness, story-done) — 4 cases each - Modified: settings.json — validate-skill-change.sh added to PostToolUse hook Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-27 04:51:46 +00:00 · 2026-03-13 17:05:08 +11:00
parent cdb1aa83b7
commit af2b864796
11 changed files with 1587 additions and 0 deletions
--- a/.claude/docs/templates/skill-test-spec.md
+++ b/.claude/docs/templates/skill-test-spec.md
@@ -0,0 +1,96 @@
+# Skill Test Spec: /[skill-name]
+
+## Skill Summary
+
+[One paragraph: what this skill does, when to use it, what it produces. Include
+the primary output artifact, the verdict format it uses, and which pipeline stage
+it belongs to.]
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings (## Phase N or numbered ## sections)
+- [ ] Contains verdict keywords: [list the ones expected, e.g., PASS, FAIL, CONCERNS]
+- [ ] Contains "May I write" collaborative protocol language (if skill writes files)
+- [ ] Has a next-step handoff at the end
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — [short description]
+
+**Fixture:** [Describe the assumed project state. Which files exist? What do they
+contain? E.g., "game-concept.md exists with all 8 required sections complete.
+systems-index.md exists. All MVP GDDs are present and individually reviewed."]
+
+**Input:** `/[skill-name] [args]`
+
+**Expected behavior:**
+1. [Phase 1 action — what the skill should read or check]
+2. [Phase 2 action — what the skill should evaluate]
+3. [Phase N action — what the skill should output]
+
+**Assertions:**
+- [ ] Skill reads [specific file] before producing output
+- [ ] Output includes verdict keyword [PASS/FAIL/etc.]
+- [ ] Output lists [specific content] from the fixture
+- [ ] Skill asks for approval before writing any file
+
+---
+
+### Case 2: Failure Path — [short description, e.g., "Missing required artifact"]
+
+**Fixture:** [Describe the failure state. E.g., "game-concept.md is missing.
+No files exist in design/gdd/."]
+
+**Input:** `/[skill-name] [args]`
+
+**Expected behavior:**
+1. [Phase 1: skill detects missing file]
+2. [Phase 2: skill surfaces the gap rather than assuming OK]
+3. [Output: FAIL or BLOCKED verdict with specific blocker named]
+
+**Assertions:**
+- [ ] Skill does NOT output PASS when the fixture is incomplete
+- [ ] Skill names the specific missing artifact
+- [ ] Skill suggests a remediation action (e.g., "Run /[other-skill]")
+- [ ] Skill does not create files to fill in the gap without asking
+
+---
+
+### Case 3: Edge Case — [short description, e.g., "No argument provided"]
+
+**Fixture:** [State of project files for this case]
+
+**Input:** `/[skill-name]` (no argument)
+
+**Expected behavior:**
+1. [What the skill should do when invoked without arguments]
+
+**Assertions:**
+- [ ] [assertion]
+
+---
+
+## Protocol Compliance
+
+- [ ] Uses "May I write" before all file writes
+- [ ] Presents findings or report before asking for write approval
+- [ ] Ends with a recommended next step or follow-up skill
+- [ ] Never auto-creates files without explicit user approval
+- [ ] Does not skip phases or jump straight to a verdict without checking
+
+---
+
+## Coverage Notes
+
+[Document what is intentionally NOT tested in this spec and why. Examples:
+- "Case 3 (all-mode) is not covered because it runs too many checks to evaluate
+  in a single spec — test each sub-mode individually."
+- "The database integration path is not covered as it requires a live environment."
+- "Edge cases involving corrupted YAML files are deferred to a future spec."]
--- a/.claude/hooks/validate-skill-change.sh
+++ b/.claude/hooks/validate-skill-change.sh
@@ -0,0 +1,39 @@
+#!/bin/bash
+# Claude Code PostToolUse hook: Advises running skill-test after skill file changes
+# Fires when any file inside .claude/skills/ is written or edited.
+#
+# Exit behavior:
+#   exit 0 = advisory only (non-blocking)
+#
+# Input schema (PostToolUse for Write|Edit):
+# { "tool_name": "Write", "tool_input": { "file_path": "...", "content": "..." } }
+
+INPUT=$(cat)
+
+# Parse file path -- use jq if available, fall back to grep
+if command -v jq >/dev/null 2>&1; then
+    FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
+else
+    FILE_PATH=$(echo "$INPUT" | grep -oE '"file_path"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"file_path"[[:space:]]*:[[:space:]]*"//;s/"$//')
+fi
+
+# Normalize path separators (Windows backslash to forward slash)
+FILE_PATH=$(echo "$FILE_PATH" | sed 's|\\|/|g')
+
+# Only act on files inside .claude/skills/
+if ! echo "$FILE_PATH" | grep -qE '(^|/)\.claude/skills/'; then
+    exit 0
+fi
+
+# Extract skill name from path (.claude/skills/[skill-name]/SKILL.md)
+SKILL_NAME=$(echo "$FILE_PATH" | grep -oE '\.claude/skills/[^/]+' | sed 's|\.claude/skills/||')
+
+if [ -z "$SKILL_NAME" ]; then
+    exit 0
+fi
+
+echo "=== Skill Modified: $SKILL_NAME ===" >&2
+echo "Run /skill-test static $SKILL_NAME to validate structural compliance." >&2
+echo "====================================" >&2
+
+exit 0
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -74,6 +74,11 @@
            "type": "command",
            "command": "bash .claude/hooks/validate-assets.sh",
            "timeout": 10
+          },
+          {
+            "type": "command",
+            "command": "bash .claude/hooks/validate-skill-change.sh",
+            "timeout": 5
          }
        ]
      }
--- a/.claude/skills/skill-test/SKILL.md
+++ b/.claude/skills/skill-test/SKILL.md
@@ -0,0 +1,290 @@
+---
+name: skill-test
+description: "Validate skill files for structural compliance and behavioral correctness. Three modes: static (linter), spec (behavioral), audit (coverage report)."
+argument-hint: "static [skill-name | all] | spec [skill-name] | audit"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Write
+context: fork
+---
+
+# Skill Test
+
+Validates `.claude/skills/*/SKILL.md` files for structural compliance and
+behavioral correctness. No external dependencies — runs entirely within the
+existing skill/hook/template architecture.
+
+**Three modes:**
+
+| Mode | Command | Purpose | Token Cost |
+|------|---------|---------|------------|
+| `static` | `/skill-test static [name\|all]` | Structural linter — 7 compliance checks per skill | Low (~1k/skill) |
+| `spec` | `/skill-test spec [name]` | Behavioral verifier — evaluates assertions in test spec | Medium (~5k/skill) |
+| `audit` | `/skill-test audit` | Coverage report — which skills have specs, last test dates | Low (~2k total) |
+
+---
+
+## Phase 1: Parse Arguments
+
+Determine mode from the first argument:
+
+- `static [name]` → run 7 structural checks on one skill
+- `static all` → run 7 structural checks on all skills (Glob `.claude/skills/*/SKILL.md`)
+- `spec [name]` → read skill + test spec, evaluate assertions
+- `audit` (or no argument) → read catalog, list all skills, show coverage
+
+If argument is missing or unrecognized, output usage and stop.
+
+---
+
+## Phase 2A: Static Mode — Structural Linter
+
+For each skill being tested, read its `SKILL.md` fully and run all 7 checks:
+
+### Check 1 — Required Frontmatter Fields
+The file must contain all of these in the YAML frontmatter block:
+- `name:`
+- `description:`
+- `argument-hint:`
+- `user-invocable:`
+- `allowed-tools:`
+
+**FAIL** if any are absent.
+
+### Check 2 — Multiple Phases
+The skill must have ≥2 numbered phase headings. Look for patterns like:
+- `## Phase N` or `## Phase N:`
+- `## N.` (numbered top-level sections)
+- At least 2 distinct `##` headings if phases aren't explicitly numbered
+
+**FAIL** if fewer than 2 phase-like headings are found.
+
+### Check 3 — Verdict Keywords
+The skill must contain at least one of: `PASS`, `FAIL`, `CONCERNS`, `APPROVED`,
+`BLOCKED`, `COMPLETE`, `READY`, `COMPLIANT`, `NON-COMPLIANT`
+
+**FAIL** if none are present.
+
+### Check 4 — Collaborative Protocol Language
+The skill must contain ask-before-write language. Look for:
+- `"May I write"` (canonical form)
+- `"before writing"` or `"approval"` near file-write instructions
+- `"ask"` + `"write"` in close proximity (within same section)
+
+**WARN** if absent (some read-only skills legitimately skip this).
+**FAIL** if `allowed-tools` includes `Write` or `Edit` but no ask-before-write language is found.
+
+### Check 5 — Next-Step Handoff
+The skill must end with a recommended next action or follow-up path. Look for:
+- A final section mentioning another skill (e.g., `/story-done`, `/gate-check`)
+- "Recommended next" or "next step" phrasing
+- A "Follow-Up" or "After this" section
+
+**WARN** if absent.
+
+### Check 6 — Fork Context Complexity
+If frontmatter contains `context: fork`, the skill should have ≥5 phase headings
+(`##` level or numbered Phase N headers). Fork context is for complex multi-phase
+skills; simple skills should not use it.
+
+**WARN** if `context: fork` is set but fewer than 5 phases found.
+
+### Check 7 — Argument Hint Plausibility
+`argument-hint` must be non-empty. If the skill body mentions multiple modes
+(e.g., "Mode A | Mode B"), the hint should reflect them. Cross-reference the
+hint against the first phase's "Parse Arguments" section.
+
+**WARN** if hint is `""` or if documented modes don't match hint.
+
+---
+
+### Static Mode Output Format
+
+For a single skill:
+```
+=== Skill Static Check: /[name] ===
+
+Check 1 — Frontmatter Fields:    PASS
+Check 2 — Multiple Phases:       PASS (7 phases found)
+Check 3 — Verdict Keywords:      PASS (PASS, FAIL, CONCERNS)
+Check 4 — Collaborative Protocol: PASS ("May I write" found)
+Check 5 — Next-Step Handoff:     WARN (no follow-up section found)
+Check 6 — Fork Context Complexity: PASS (8 phases, context: fork set)
+Check 7 — Argument Hint:         PASS
+
+Verdict: WARNINGS (1 warning, 0 failures)
+Recommended: Add a "Follow-Up Actions" section at the end of the skill.
+```
+
+For `static all`, produce a summary table then list any non-compliant skills:
+```
+=== Skill Static Check: All 52 Skills ===
+
+Skill                  | Result       | Issues
+-----------------------|--------------|-------
+gate-check             | COMPLIANT    |
+design-review          | COMPLIANT    |
+story-readiness        | WARNINGS     | Check 5: no handoff
+...
+
+Summary: 48 COMPLIANT, 3 WARNINGS, 1 NON-COMPLIANT
+Aggregate Verdict: N WARNINGS / N FAILURES
+```
+
+---
+
+## Phase 2B: Spec Mode — Behavioral Verifier
+
+### Step 1 — Locate Files
+
+Find skill at `.claude/skills/[name]/SKILL.md`.
+Find spec at `tests/skills/[name].md`.
+
+If either is missing:
+- Missing skill: "Skill '[name]' not found in `.claude/skills/`."
+- Missing spec: "No test spec found for '[name]'. Run `/skill-test audit` to see
+  coverage gaps, or create a spec using the template at
+  `.claude/docs/templates/skill-test-spec.md`."
+
+### Step 2 — Read Both Files
+
+Read the skill file and test spec file completely.
+
+### Step 3 — Evaluate Assertions
+
+For each **Test Case** in the spec:
+
+1. Read the **Fixture** description (assumed state of project files)
+2. Read the **Expected behavior** steps
+3. Read each **Assertion** checkbox
+
+For each assertion, evaluate whether the skill's written instructions, if
+followed correctly given the fixture state, would satisfy it. This is a
+Claude-evaluated reasoning check, not code execution.
+
+Mark each assertion:
+- **PASS** — skill instructions clearly satisfy this assertion
+- **PARTIAL** — skill instructions partially address it, but with ambiguity
+- **FAIL** — skill instructions would NOT satisfy this assertion given the fixture
+
+For **Protocol Compliance** assertions (always present):
+- Check whether the skill requires "May I write" before file writes
+- Check whether the skill presents findings before requesting approval
+- Check whether the skill ends with a recommended next step
+- Check whether the skill avoids auto-creating files without approval
+
+### Step 4 — Build Report
+
+```
+=== Skill Spec Test: /[name] ===
+Date: [date]
+Spec: tests/skills/[name].md
+
+Case 1: [Happy Path — name]
+  Fixture: [summary]
+  Assertions:
+    [PASS] [assertion text]
+    [FAIL] [assertion text]
+       Reason: The skill's Phase 3 says "..." but the fixture state means "..."
+  Case Verdict: FAIL
+
+Case 2: [Edge Case — name]
+  ...
+  Case Verdict: PASS
+
+Protocol Compliance:
+  [PASS] Uses "May I write" before file writes
+  [PASS] Presents findings before asking approval
+  [WARN] No explicit next-step handoff at end
+
+Overall Verdict: FAIL (1 case failed, 1 warning)
+```
+
+### Step 5 — Offer to Write Results
+
+"May I write these results to `tests/results/skill-test-spec-[name]-[date].md`
+and update `tests/skills/catalog.yaml`?"
+
+If yes:
+- Write results file to `tests/results/`
+- Update the skill's entry in `tests/skills/catalog.yaml`:
+  - `last_spec: [date]`
+  - `last_spec_result: PASS|PARTIAL|FAIL`
+
+---
+
+## Phase 2C: Audit Mode — Coverage Report
+
+### Step 1 — Read Catalog
+
+Read `tests/skills/catalog.yaml`. If missing, note that catalog doesn't exist
+yet (first-run state).
+
+### Step 2 — Enumerate All Skills
+
+Glob `.claude/skills/*/SKILL.md` to get the complete list of skills.
+Extract skill name from each path (directory name).
+
+### Step 3 — Build Coverage Table
+
+For each skill:
+- Check if a spec file exists at `tests/skills/[name].md`
+- Look up `last_static`, `last_static_result`, `last_spec`, `last_spec_result`
+  from catalog (or mark as "never" if not in catalog)
+- Assign priority:
+  - `critical` — gate-check, design-review, story-readiness, story-done, review-all-gdds, architecture-review
+  - `high` — create-epics-stories, create-control-manifest, propagate-design-change, story-done
+  - `medium` — team-* skills, sprint-plan, sprint-status
+  - `low` — all others
+
+### Step 4 — Output Report
+
+```
+=== Skill Test Coverage Audit ===
+Date: [date]
+Total skills: 52
+Specs written: 4 (7.7%)
+Never tested (static): 48
+
+Coverage Table:
+Skill                  | Has Spec | Last Static      | Static Result | Last Spec        | Spec Result | Priority
+-----------------------|----------|------------------|---------------|------------------|-------------|----------
+gate-check             | YES      | never            | —             | never            | —           | critical
+design-review          | YES      | never            | —             | never            | —           | critical
+story-readiness        | YES      | never            | —             | never            | —           | critical
+story-done             | YES      | never            | —             | never            | —           | critical
+architecture-review    | NO       | never            | —             | never            | —           | critical
+review-all-gdds        | NO       | never            | —             | never            | —           | critical
+...
+
+Top 5 Priority Gaps (no spec, critical/high priority):
+1. /architecture-review — critical, no spec
+2. /review-all-gdds — critical, no spec
+3. /create-epics-stories — high, no spec
+4. /propagate-design-change — high, no spec
+5. /sprint-plan — medium, no spec
+
+Coverage: 4/52 specs (7.7%)
+```
+
+No file writes in audit mode.
+
+Offer: "Would you like to run `/skill-test static all` to check structural
+compliance across all skills? Or `/skill-test spec [name]` to run a specific
+behavioral test?"
+
+---
+
+## Phase 3: Recommended Next Steps
+
+After any mode completes, offer contextual follow-up:
+
+- After `static [name]`: "Run `/skill-test spec [name]` to validate behavioral
+  correctness if a test spec exists."
+- After `static all` with failures: "Address NON-COMPLIANT skills first. Run
+  `/skill-test static [name]` individually for detailed remediation guidance."
+- After `spec [name]` PASS: "Update `tests/skills/catalog.yaml` to record this
+  pass date. Consider running `/skill-test audit` to find the next spec gap."
+- After `spec [name]` FAIL: "Review the failing assertions and update the skill
+  or the test spec to resolve the mismatch."
+- After `audit`: "Start with the critical-priority gaps. Use the spec template
+  at `.claude/docs/templates/skill-test-spec.md` to create new specs."