Files
Donchitos 984023ddac Release v1.0.0 — concept-prototype/vertical-slice split, workflow restructure, polish (#50)
* Add /vertical-slice skill, prototype overhaul, and workflow integration

- Add /vertical-slice skill for pre-production validation (Phase 4 gate)
- Overhaul /prototype skill with two-mode design: concept prototype (Phase 1)
  vs vertical slice (Phase 4), with clearer differentiation and higher standards for VS
- Update prototyper agent to own both prototype and vertical-slice workflows
- Add prototype-report.md and vertical-slice-report.md output templates
- Update WORKFLOW-GUIDE, quick-start, skills-reference, agent-coordination-map,
  and skill-flow-diagrams to fully integrate both skills into the 7-phase pipeline
- Remove orphaned empty quick-prototype/ directory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sync v1 counts + polish

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add entity inventory flow, relax vertical-slice gate, improve UX authoring prompts

- /asset-spec: new Phase 0b entity & screen inventory when no argument and no
  existing inventory — reads GDDs/art-bible, proposes categorized list, writes
  design/assets/entity-inventory.md collaboratively
- /asset-spec: entity/character target falls back to inline user description
  when no source doc exists, rather than failing
- /gate-check: vertical slice changed from blocking to CONCERNS-only when
  absent; built-but-broken slice still fails; adds entity inventory as gate artifact
- /ux-design: convert inline approval prompts to AskUserQuestion for structured
  option capture at key authoring decision points
- workflow-catalog.yaml: entity-inventory step added to pre-production; UX spec
  min_count raised to 3; vertical-slice and prototype marked required: false with
  updated descriptions
- .gitignore: exclude marrow/ eval tooling directory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add missing AskUserQuestion widgets to 7 skills

Audit found 11 decision points across 7 skills where structured option
prompts were missing — using plain text, auto-selection, or no gate at all.

Skills patched:
- create-epics: per-epic approval + producer CONCERNS verdict
- sprint-plan: producer CONCERNS verdict with scope/timeline options
- milestone-review: AT RISK / OFF TRACK producer verdicts require acknowledgement
- retrospective: existing-retro handling converted from plain text [A]/[B]
- quick-design: classification confirmation + draft approve/revise/redirect
- tech-debt add mode: category (6 options) + effort (S/M/L/XL) structured capture
- regression-suite: no-arg mode selection instead of silent auto-detect
- hotfix: severity confirmation gate before workflow begins

Also added AskUserQuestion to allowed-tools headers for retrospective,
quick-design, tech-debt, regression-suite, and hotfix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Prep v1 stable: fix WORKFLOW-GUIDE counts, stale agent names, and skill model fields

- WORKFLOW-GUIDE.md: correct agent count (48→49), skill count (66/68→73),
  add 6 missing skills to Appendix B, fix Creative category count (2→4),
  replace 3 non-existent agent names with correct ue-*/unity-* specialists,
  add missing godot-csharp/gdextension specialists to hierarchy,
  fix production/stories/ paths → production/epics/
- coordination-rules.md: replace "not yet used" with opt-in env var note
- quick-start.md: rename duplicate "Validate the concept" label → "Prototype the mechanic"
- skill-flow-diagrams.md: remove duplicate legacy UX pipeline section
- All 62 skills missing model: field now have explicit model: sonnet

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: comprehensive skill audit — consistency, UX, and flow gaps

Two-pass audit fixing ~35 bugs across 41 files.

Pre-production flow:
- Brainstorm next-steps split into Path A (design-first) and Path B
  (prototype-first) — eliminates "prototype after architecture" confusion
- /architecture-review added to pre-production flow in brainstorm and
  create-architecture handoffs
- gate-check traceability check corrected to requirements-traceability.md
- dev-story TR registry error now points to /architecture-review (not /create-epics)
- start now writes production/stage.txt on first onboarding

AskUserQuestion gaps filled:
- balance-check, code-review, hotfix, day-one-patch, consistency-check
  all gain closing widgets and/or missing allowed-tools declarations
- hotfix git branch creation now requires user confirmation
- sprint-plan review-mode setup moved to Phase 0 (before gates run)
- team-combat gains architecture→implementation approval gate
- design-review APPROVED path consolidated from 3 widgets to 1 multiSelect

All 9 team-* skills:
- Phase 0 review-mode resolution added (solo/lean/full now respected)
- team-audio output path fixed (design/gdd/ → design/audio/)
- team-level final doc compilation delegated to level-designer subagent
- team-narrative localization-lead added to composition list
- team-qa sprint path fixed (flat files, not directories)
- team-release NO-GO override captures written justification
- team-live-ops Cancel verdict now explicitly BLOCKED

Other fixes:
- Art bible path standardized to design/art/art-bible.md (3 wrong refs)
- AD-PHASE-GATE added to lean-mode skip list in director-gates.md
- design-system duplicate 5d heading fixed; skeleton decline path added;
  mandatory agent spawns now respect review mode
- story-readiness acceptance criteria thresholds now type-aware
- create-stories gains multi-ADR and no-ADR handling guidance
- consistency-check creates docs/consistency-failures.md on first run
- retrospective frontmatter bash injection replaced with explicit Bash call
- smoke-check ls -t gains PowerShell fallback
- Conventional Commits format documented in coding-standards.md
- gate-check: ADR acceptance gate, QA plan check, chain-of-verification
  tool-action requirement all added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: expose --review flag in argument-hints for all team-* skills

All 9 team-* skills already implement Phase 0 review-mode resolution
internally (full/lean/solo), but none advertised [--review full|lean|solo]
in their argument-hint. Users had no way to discover the per-run override.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add SECURITY.md with coordinated disclosure policy

Defines scope, reporting process (GitHub private vulnerability reporting),
contributor security guidelines for hooks/skills/agents, and 90-day
coordinated disclosure timeline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add CONTRIBUTING.md with framework contribution guidelines

Covers what PRs are welcome, skill/hook/agent technical requirements,
the collaborative principle, testing expectations, commit format,
and platform compatibility requirements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add v1.0.0-beta → v1.0 upgrade section to UPGRADING.md

Documents the 17 commits since the beta tag: new /vertical-slice gate,
entity inventory flow in /map-systems, AskUserQuestion widgets across
7 skills, --review flag exposure on team-* skills, bug fixes
(#21, #36, #42, #43, #45), and the new CONTRIBUTING.md and SECURITY.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 20:15:08 +10:00

13 KiB

name, description, argument-hint, user-invocable, allowed-tools, model
name description argument-hint user-invocable allowed-tools model
skill-test Validate skill files for structural compliance and behavioral correctness. Three modes: static (linter), spec (behavioral), audit (coverage report). static [skill-name | all] | spec [skill-name] | category [skill-name | all] | audit true Read, Glob, Grep, Write sonnet

Skill Test

Validates .claude/skills/*/SKILL.md files for structural compliance and behavioral correctness. No external dependencies — runs entirely within the existing skill/hook/template architecture.

Four modes:

Mode Command Purpose Token Cost
static /skill-test static [name|all] Structural linter — 7 compliance checks per skill Low (~1k/skill)
spec /skill-test spec [name] Behavioral verifier — evaluates assertions in test spec Medium (~5k/skill)
category /skill-test category [name|all] Category rubric — checks skill against its category-specific metrics Low (~2k/skill)
audit /skill-test audit Coverage report — skills, agent specs, last test dates Low (~3k total)

Phase 1: Parse Arguments

Determine mode from the first argument:

  • static [name] → run 7 structural checks on one skill
  • static all → run 7 structural checks on all skills (Glob .claude/skills/*/SKILL.md)
  • spec [name] → read skill + test spec, evaluate assertions
  • category [name] → run category-specific rubric from CCGS Skill Testing Framework/quality-rubric.md
  • category all → run category rubric for every skill that has a category: in catalog
  • audit (or no argument) → read catalog, list all skills and agents, show coverage

If argument is missing or unrecognized, output usage and stop.


Phase 2A: Static Mode — Structural Linter

For each skill being tested, read its SKILL.md fully and run all 7 checks:

Check 1 — Required Frontmatter Fields

The file must contain all of these in the YAML frontmatter block:

  • name:
  • description:
  • argument-hint:
  • user-invocable:
  • allowed-tools:

FAIL if any are absent.

Check 2 — Multiple Phases

The skill must have ≥2 numbered phase headings. Look for patterns like:

  • ## Phase N or ## Phase N:
  • ## N. (numbered top-level sections)
  • At least 2 distinct ## headings if phases aren't explicitly numbered

FAIL if fewer than 2 phase-like headings are found.

Check 3 — Verdict Keywords

The skill must contain at least one of: PASS, FAIL, CONCERNS, APPROVED, BLOCKED, COMPLETE, READY, COMPLIANT, NON-COMPLIANT

FAIL if none are present.

Check 4 — Collaborative Protocol Language

The skill must contain ask-before-write language. Look for:

  • "May I write" (canonical form)
  • "before writing" or "approval" near file-write instructions
  • "ask" + "write" in close proximity (within same section)

WARN if absent (some read-only skills legitimately skip this). FAIL if allowed-tools includes Write or Edit but no ask-before-write language is found.

Check 5 — Next-Step Handoff

The skill must end with a recommended next action or follow-up path. Look for:

  • A final section mentioning another skill (e.g., /story-done, /gate-check)
  • "Recommended next" or "next step" phrasing
  • A "Follow-Up" or "After this" section

WARN if absent.

Check 6 — Fork Context Complexity

If frontmatter contains context: fork, the skill should have ≥5 phase headings (## level or numbered Phase N headers). Fork context is for complex multi-phase skills; simple skills should not use it.

WARN if context: fork is set but fewer than 5 phases found.

Check 7 — Argument Hint Plausibility

argument-hint must be non-empty. If the skill body mentions multiple modes (e.g., "Mode A | Mode B"), the hint should reflect them. Cross-reference the hint against the first phase's "Parse Arguments" section.

WARN if hint is "" or if documented modes don't match hint.


Static Mode Output Format

For a single skill:

=== Skill Static Check: /[name] ===

Check 1 — Frontmatter Fields:    PASS
Check 2 — Multiple Phases:       PASS (7 phases found)
Check 3 — Verdict Keywords:      PASS (PASS, FAIL, CONCERNS)
Check 4 — Collaborative Protocol: PASS ("May I write" found)
Check 5 — Next-Step Handoff:     WARN (no follow-up section found)
Check 6 — Fork Context Complexity: PASS (8 phases, context: fork set)
Check 7 — Argument Hint:         PASS

Verdict: WARNINGS (1 warning, 0 failures)
Recommended: Add a "Follow-Up Actions" section at the end of the skill.

For static all, produce a summary table then list any non-compliant skills:

=== Skill Static Check: All 52 Skills ===

Skill                  | Result       | Issues
-----------------------|--------------|-------
gate-check             | COMPLIANT    |
design-review          | COMPLIANT    |
story-readiness        | WARNINGS     | Check 5: no handoff
...

Summary: 48 COMPLIANT, 3 WARNINGS, 1 NON-COMPLIANT
Aggregate Verdict: N WARNINGS / N FAILURES

Phase 2B: Spec Mode — Behavioral Verifier

Step 1 — Locate Files

Find skill at .claude/skills/[name]/SKILL.md. Look up the spec path from CCGS Skill Testing Framework/catalog.yaml — use the spec: field for the matching skill entry.

If either is missing:

  • Missing skill: "Skill '[name]' not found in .claude/skills/."
  • Missing spec path in catalog: "No spec path set for '[name]' in catalog.yaml."
  • Spec file not found at path: "Spec file missing at [path]. Run /skill-test audit to see coverage gaps."

Step 2 — Read Both Files

Read the skill file and test spec file completely.

Step 3 — Evaluate Assertions

For each Test Case in the spec:

  1. Read the Fixture description (assumed state of project files)
  2. Read the Expected behavior steps
  3. Read each Assertion checkbox

For each assertion, evaluate whether the skill's written instructions, if followed correctly given the fixture state, would satisfy it. This is a Claude-evaluated reasoning check, not code execution.

Mark each assertion:

  • PASS — skill instructions clearly satisfy this assertion
  • PARTIAL — skill instructions partially address it, but with ambiguity
  • FAIL — skill instructions would NOT satisfy this assertion given the fixture

For Protocol Compliance assertions (always present):

  • Check whether the skill requires "May I write" before file writes
  • Check whether the skill presents findings before requesting approval
  • Check whether the skill ends with a recommended next step
  • Check whether the skill avoids auto-creating files without approval

Step 4 — Build Report

=== Skill Spec Test: /[name] ===
Date: [date]
Spec: CCGS Skill Testing Framework/skills/[category]/[name].md

Case 1: [Happy Path — name]
  Fixture: [summary]
  Assertions:
    [PASS] [assertion text]
    [FAIL] [assertion text]
       Reason: The skill's Phase 3 says "..." but the fixture state means "..."
  Case Verdict: FAIL

Case 2: [Edge Case — name]
  ...
  Case Verdict: PASS

Protocol Compliance:
  [PASS] Uses "May I write" before file writes
  [PASS] Presents findings before asking approval
  [WARN] No explicit next-step handoff at end

Overall Verdict: FAIL (1 case failed, 1 warning)

Step 5 — Offer to Write Results

"May I write these results to CCGS Skill Testing Framework/results/skill-test-spec-[name]-[date].md and update CCGS Skill Testing Framework/catalog.yaml?"

If yes:

  • Write results file to CCGS Skill Testing Framework/results/
  • Update the skill's entry in CCGS Skill Testing Framework/catalog.yaml:
    • last_spec: [date]
    • last_spec_result: PASS|PARTIAL|FAIL

Phase 2D: Category Mode — Rubric Evaluation

Step 1 — Locate Skill and Category

Find skill at .claude/skills/[name]/SKILL.md. Look up category: field in CCGS Skill Testing Framework/catalog.yaml.

If skill not found: "Skill '[name]' not found." If no category: field: "No category assigned for '[name]' in catalog.yaml. Add category: [name] to the skill entry first."

For category all: collect all skills with a category: field and process each. category: utility skills are evaluated against U1 (static checks pass) and U2 (gate mode correct if applicable) only — skip to the static mode for U1.

Step 2 — Read Rubric Section

Read CCGS Skill Testing Framework/quality-rubric.md. Extract the section matching the skill's category (e.g., ### gate, ### team).

Step 3 — Read Skill

Read the skill's SKILL.md fully.

Step 4 — Evaluate Rubric Metrics

For each metric in the category's rubric table:

  1. Check whether the skill's written instructions clearly satisfy the criterion
  2. Mark PASS, FAIL, or WARN
  3. For FAIL/WARN, identify the exact gap in the skill text (quote the relevant section or note its absence)

Step 5 — Output Report

=== Skill Category Check: /[name] ([category]) ===

Metric G1 — Review mode read:      PASS
Metric G2 — Full mode directors:   FAIL
  Gap: Phase 3 spawns only CD-PHASE-GATE; TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE absent
Metric G3 — Lean mode: PHASE-GATE only: PASS
Metric G4 — Solo mode: no directors:    PASS
Metric G5 — No auto-advance:       PASS

Verdict: FAIL (1 failure, 0 warnings)
Fix: Add TD-PHASE-GATE, PR-PHASE-GATE, and AD-PHASE-GATE to the full-mode director
     panel in Phase 3.

Step 6 — Offer to Update Catalog

"May I update CCGS Skill Testing Framework/catalog.yaml to record this category check (last_category, last_category_result) for [name]?"


Phase 2C: Audit Mode — Coverage Report

Step 1 — Read Catalog

Read CCGS Skill Testing Framework/catalog.yaml. If missing, note that catalog doesn't exist yet (first-run state).

Step 2 — Enumerate All Skills and Agents

Glob .claude/skills/*/SKILL.md to get the complete list of skills. Extract skill name from each path (directory name).

Also read the agents: section from CCGS Skill Testing Framework/catalog.yaml to get the complete list of agents.

Step 3 — Build Skill Coverage Table

For each skill:

  • Check if a spec file exists (use the spec: path from catalog, or glob CCGS Skill Testing Framework/skills/*/[name].md)
  • Look up last_static, last_static_result, last_spec, last_spec_result, last_category, last_category_result, category from catalog (or mark as "never" / "—" if not in catalog)
  • Priority comes from catalog priority: field (critical/high/medium/low)

Step 3b — Build Agent Coverage Table

For each agent in catalog's agents: section:

  • Check if a spec file exists (use the spec: path from catalog, or glob CCGS Skill Testing Framework/agents/*/[name].md)
  • Look up last_spec, last_spec_result, category from catalog

Step 4 — Output Report

=== Skill Test Coverage Audit ===
Date: [date]

SKILLS (72 total)
Specs written: 72 (100%) | Never static tested: 72 | Never category tested: 72

Skill                  | Cat      | Has Spec | Last Static | S.Result | Last Cat | C.Result | Priority
-----------------------|----------|----------|-------------|----------|----------|----------|----------
gate-check             | gate     | YES      | never       | —        | never    | —        | critical
design-review          | review   | YES      | never       | —        | never    | —        | critical
...

AGENTS (49 total)
Agent specs written: 49 (100%)

Agent                  | Category   | Has Spec | Last Spec   | Result
-----------------------|------------|----------|-------------|--------
creative-director      | director   | YES      | never       | —
technical-director     | director   | YES      | never       | —
...

Top 5 Priority Gaps (skills with no spec, critical/high priority):
(none if all specs are written)

Skill coverage:  72/72 specs (100%)
Agent coverage:  49/49 specs (100%)

No file writes in audit mode.

Offer: "Would you like to run /skill-test static all to check structural compliance across all skills? /skill-test category all to run category rubric checks? Or /skill-test spec [name] to run a specific behavioral test?"


After any mode completes, offer contextual follow-up:

  • After static [name]: "Run /skill-test spec [name] to validate behavioral correctness if a test spec exists."
  • After static all with failures: "Address NON-COMPLIANT skills first. Run /skill-test static [name] individually for detailed remediation guidance."
  • After spec [name] PASS: "Update CCGS Skill Testing Framework/catalog.yaml to record this pass date. Consider running /skill-test audit to find the next spec gap."
  • After spec [name] FAIL: "Review the failing assertions and update the skill or the test spec to resolve the mismatch."
  • After audit: "Start with the critical-priority gaps. Use the spec template at CCGS Skill Testing Framework/templates/skill-test-spec.md to create new specs."