Files
Claude-Code-Game-Studios/.claude/skills/skill-test/SKILL.md
Donchitos 9f731f3d86 Delete /create-epics-stories and update all references
Removes the deprecated skill entirely and updates every reference across:
- README.md, UPGRADING.md (skill count + file lists)
- docs/WORKFLOW-GUIDE.md (pipeline diagrams, step descriptions, command reference)
- docs/examples/skill-flow-diagrams.md, session-*.md
- docs/architecture/tr-registry.yaml
- tests/skills/catalog.yaml
- .claude/skills: adopt, content-audit, create-control-manifest,
  skill-test, story-readiness
- .claude/docs: skills-reference.md, quick-start.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 15:04:15 +11:00

10 KiB

name, description, argument-hint, user-invocable, allowed-tools, context
name description argument-hint user-invocable allowed-tools context
skill-test Validate skill files for structural compliance and behavioral correctness. Three modes: static (linter), spec (behavioral), audit (coverage report). static [skill-name | all] | spec [skill-name] | audit true Read, Glob, Grep, Write fork

Skill Test

Validates .claude/skills/*/SKILL.md files for structural compliance and behavioral correctness. No external dependencies — runs entirely within the existing skill/hook/template architecture.

Three modes:

Mode Command Purpose Token Cost
static /skill-test static [name|all] Structural linter — 7 compliance checks per skill Low (~1k/skill)
spec /skill-test spec [name] Behavioral verifier — evaluates assertions in test spec Medium (~5k/skill)
audit /skill-test audit Coverage report — which skills have specs, last test dates Low (~2k total)

Phase 1: Parse Arguments

Determine mode from the first argument:

  • static [name] → run 7 structural checks on one skill
  • static all → run 7 structural checks on all skills (Glob .claude/skills/*/SKILL.md)
  • spec [name] → read skill + test spec, evaluate assertions
  • audit (or no argument) → read catalog, list all skills, show coverage

If argument is missing or unrecognized, output usage and stop.


Phase 2A: Static Mode — Structural Linter

For each skill being tested, read its SKILL.md fully and run all 7 checks:

Check 1 — Required Frontmatter Fields

The file must contain all of these in the YAML frontmatter block:

  • name:
  • description:
  • argument-hint:
  • user-invocable:
  • allowed-tools:

FAIL if any are absent.

Check 2 — Multiple Phases

The skill must have ≥2 numbered phase headings. Look for patterns like:

  • ## Phase N or ## Phase N:
  • ## N. (numbered top-level sections)
  • At least 2 distinct ## headings if phases aren't explicitly numbered

FAIL if fewer than 2 phase-like headings are found.

Check 3 — Verdict Keywords

The skill must contain at least one of: PASS, FAIL, CONCERNS, APPROVED, BLOCKED, COMPLETE, READY, COMPLIANT, NON-COMPLIANT

FAIL if none are present.

Check 4 — Collaborative Protocol Language

The skill must contain ask-before-write language. Look for:

  • "May I write" (canonical form)
  • "before writing" or "approval" near file-write instructions
  • "ask" + "write" in close proximity (within same section)

WARN if absent (some read-only skills legitimately skip this). FAIL if allowed-tools includes Write or Edit but no ask-before-write language is found.

Check 5 — Next-Step Handoff

The skill must end with a recommended next action or follow-up path. Look for:

  • A final section mentioning another skill (e.g., /story-done, /gate-check)
  • "Recommended next" or "next step" phrasing
  • A "Follow-Up" or "After this" section

WARN if absent.

Check 6 — Fork Context Complexity

If frontmatter contains context: fork, the skill should have ≥5 phase headings (## level or numbered Phase N headers). Fork context is for complex multi-phase skills; simple skills should not use it.

WARN if context: fork is set but fewer than 5 phases found.

Check 7 — Argument Hint Plausibility

argument-hint must be non-empty. If the skill body mentions multiple modes (e.g., "Mode A | Mode B"), the hint should reflect them. Cross-reference the hint against the first phase's "Parse Arguments" section.

WARN if hint is "" or if documented modes don't match hint.


Static Mode Output Format

For a single skill:

=== Skill Static Check: /[name] ===

Check 1 — Frontmatter Fields:    PASS
Check 2 — Multiple Phases:       PASS (7 phases found)
Check 3 — Verdict Keywords:      PASS (PASS, FAIL, CONCERNS)
Check 4 — Collaborative Protocol: PASS ("May I write" found)
Check 5 — Next-Step Handoff:     WARN (no follow-up section found)
Check 6 — Fork Context Complexity: PASS (8 phases, context: fork set)
Check 7 — Argument Hint:         PASS

Verdict: WARNINGS (1 warning, 0 failures)
Recommended: Add a "Follow-Up Actions" section at the end of the skill.

For static all, produce a summary table then list any non-compliant skills:

=== Skill Static Check: All 52 Skills ===

Skill                  | Result       | Issues
-----------------------|--------------|-------
gate-check             | COMPLIANT    |
design-review          | COMPLIANT    |
story-readiness        | WARNINGS     | Check 5: no handoff
...

Summary: 48 COMPLIANT, 3 WARNINGS, 1 NON-COMPLIANT
Aggregate Verdict: N WARNINGS / N FAILURES

Phase 2B: Spec Mode — Behavioral Verifier

Step 1 — Locate Files

Find skill at .claude/skills/[name]/SKILL.md. Find spec at tests/skills/[name].md.

If either is missing:

  • Missing skill: "Skill '[name]' not found in .claude/skills/."
  • Missing spec: "No test spec found for '[name]'. Run /skill-test audit to see coverage gaps, or create a spec using the template at .claude/docs/templates/skill-test-spec.md."

Step 2 — Read Both Files

Read the skill file and test spec file completely.

Step 3 — Evaluate Assertions

For each Test Case in the spec:

  1. Read the Fixture description (assumed state of project files)
  2. Read the Expected behavior steps
  3. Read each Assertion checkbox

For each assertion, evaluate whether the skill's written instructions, if followed correctly given the fixture state, would satisfy it. This is a Claude-evaluated reasoning check, not code execution.

Mark each assertion:

  • PASS — skill instructions clearly satisfy this assertion
  • PARTIAL — skill instructions partially address it, but with ambiguity
  • FAIL — skill instructions would NOT satisfy this assertion given the fixture

For Protocol Compliance assertions (always present):

  • Check whether the skill requires "May I write" before file writes
  • Check whether the skill presents findings before requesting approval
  • Check whether the skill ends with a recommended next step
  • Check whether the skill avoids auto-creating files without approval

Step 4 — Build Report

=== Skill Spec Test: /[name] ===
Date: [date]
Spec: tests/skills/[name].md

Case 1: [Happy Path — name]
  Fixture: [summary]
  Assertions:
    [PASS] [assertion text]
    [FAIL] [assertion text]
       Reason: The skill's Phase 3 says "..." but the fixture state means "..."
  Case Verdict: FAIL

Case 2: [Edge Case — name]
  ...
  Case Verdict: PASS

Protocol Compliance:
  [PASS] Uses "May I write" before file writes
  [PASS] Presents findings before asking approval
  [WARN] No explicit next-step handoff at end

Overall Verdict: FAIL (1 case failed, 1 warning)

Step 5 — Offer to Write Results

"May I write these results to tests/results/skill-test-spec-[name]-[date].md and update tests/skills/catalog.yaml?"

If yes:

  • Write results file to tests/results/
  • Update the skill's entry in tests/skills/catalog.yaml:
    • last_spec: [date]
    • last_spec_result: PASS|PARTIAL|FAIL

Phase 2C: Audit Mode — Coverage Report

Step 1 — Read Catalog

Read tests/skills/catalog.yaml. If missing, note that catalog doesn't exist yet (first-run state).

Step 2 — Enumerate All Skills

Glob .claude/skills/*/SKILL.md to get the complete list of skills. Extract skill name from each path (directory name).

Step 3 — Build Coverage Table

For each skill:

  • Check if a spec file exists at tests/skills/[name].md
  • Look up last_static, last_static_result, last_spec, last_spec_result from catalog (or mark as "never" if not in catalog)
  • Assign priority:
    • critical — gate-check, design-review, story-readiness, story-done, review-all-gdds, architecture-review
    • high — create-epics, create-stories, dev-story, create-control-manifest, propagate-design-change, story-done
    • medium — team-* skills, sprint-plan, sprint-status
    • low — all others

Step 4 — Output Report

=== Skill Test Coverage Audit ===
Date: [date]
Total skills: 52
Specs written: 4 (7.7%)
Never tested (static): 48

Coverage Table:
Skill                  | Has Spec | Last Static      | Static Result | Last Spec        | Spec Result | Priority
-----------------------|----------|------------------|---------------|------------------|-------------|----------
gate-check             | YES      | never            | —             | never            | —           | critical
design-review          | YES      | never            | —             | never            | —           | critical
story-readiness        | YES      | never            | —             | never            | —           | critical
story-done             | YES      | never            | —             | never            | —           | critical
architecture-review    | NO       | never            | —             | never            | —           | critical
review-all-gdds        | NO       | never            | —             | never            | —           | critical
...

Top 5 Priority Gaps (no spec, critical/high priority):
1. /architecture-review — critical, no spec
2. /review-all-gdds — critical, no spec
3. /create-epics — high, no spec
4. /create-stories — high, no spec
5. /dev-story — high, no spec
4. /propagate-design-change — high, no spec
5. /sprint-plan — medium, no spec

Coverage: 4/52 specs (7.7%)

No file writes in audit mode.

Offer: "Would you like to run /skill-test static all to check structural compliance across all skills? Or /skill-test spec [name] to run a specific behavioral test?"


After any mode completes, offer contextual follow-up:

  • After static [name]: "Run /skill-test spec [name] to validate behavioral correctness if a test spec exists."
  • After static all with failures: "Address NON-COMPLIANT skills first. Run /skill-test static [name] individually for detailed remediation guidance."
  • After spec [name] PASS: "Update tests/skills/catalog.yaml to record this pass date. Consider running /skill-test audit to find the next spec gap."
  • After spec [name] FAIL: "Review the failing assertions and update the skill or the test spec to resolve the mismatch."
  • After audit: "Start with the critical-priority gaps. Use the spec template at .claude/docs/templates/skill-test-spec.md to create new specs."