* Add /vertical-slice skill, prototype overhaul, and workflow integration - Add /vertical-slice skill for pre-production validation (Phase 4 gate) - Overhaul /prototype skill with two-mode design: concept prototype (Phase 1) vs vertical slice (Phase 4), with clearer differentiation and higher standards for VS - Update prototyper agent to own both prototype and vertical-slice workflows - Add prototype-report.md and vertical-slice-report.md output templates - Update WORKFLOW-GUIDE, quick-start, skills-reference, agent-coordination-map, and skill-flow-diagrams to fully integrate both skills into the 7-phase pipeline - Remove orphaned empty quick-prototype/ directory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sync v1 counts + polish Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add entity inventory flow, relax vertical-slice gate, improve UX authoring prompts - /asset-spec: new Phase 0b entity & screen inventory when no argument and no existing inventory — reads GDDs/art-bible, proposes categorized list, writes design/assets/entity-inventory.md collaboratively - /asset-spec: entity/character target falls back to inline user description when no source doc exists, rather than failing - /gate-check: vertical slice changed from blocking to CONCERNS-only when absent; built-but-broken slice still fails; adds entity inventory as gate artifact - /ux-design: convert inline approval prompts to AskUserQuestion for structured option capture at key authoring decision points - workflow-catalog.yaml: entity-inventory step added to pre-production; UX spec min_count raised to 3; vertical-slice and prototype marked required: false with updated descriptions - .gitignore: exclude marrow/ eval tooling directory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add missing AskUserQuestion widgets to 7 skills Audit found 11 decision points across 7 skills where structured option prompts were missing — using plain text, auto-selection, or no gate at all. Skills patched: - create-epics: per-epic approval + producer CONCERNS verdict - sprint-plan: producer CONCERNS verdict with scope/timeline options - milestone-review: AT RISK / OFF TRACK producer verdicts require acknowledgement - retrospective: existing-retro handling converted from plain text [A]/[B] - quick-design: classification confirmation + draft approve/revise/redirect - tech-debt add mode: category (6 options) + effort (S/M/L/XL) structured capture - regression-suite: no-arg mode selection instead of silent auto-detect - hotfix: severity confirmation gate before workflow begins Also added AskUserQuestion to allowed-tools headers for retrospective, quick-design, tech-debt, regression-suite, and hotfix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Prep v1 stable: fix WORKFLOW-GUIDE counts, stale agent names, and skill model fields - WORKFLOW-GUIDE.md: correct agent count (48→49), skill count (66/68→73), add 6 missing skills to Appendix B, fix Creative category count (2→4), replace 3 non-existent agent names with correct ue-*/unity-* specialists, add missing godot-csharp/gdextension specialists to hierarchy, fix production/stories/ paths → production/epics/ - coordination-rules.md: replace "not yet used" with opt-in env var note - quick-start.md: rename duplicate "Validate the concept" label → "Prototype the mechanic" - skill-flow-diagrams.md: remove duplicate legacy UX pipeline section - All 62 skills missing model: field now have explicit model: sonnet Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: comprehensive skill audit — consistency, UX, and flow gaps Two-pass audit fixing ~35 bugs across 41 files. Pre-production flow: - Brainstorm next-steps split into Path A (design-first) and Path B (prototype-first) — eliminates "prototype after architecture" confusion - /architecture-review added to pre-production flow in brainstorm and create-architecture handoffs - gate-check traceability check corrected to requirements-traceability.md - dev-story TR registry error now points to /architecture-review (not /create-epics) - start now writes production/stage.txt on first onboarding AskUserQuestion gaps filled: - balance-check, code-review, hotfix, day-one-patch, consistency-check all gain closing widgets and/or missing allowed-tools declarations - hotfix git branch creation now requires user confirmation - sprint-plan review-mode setup moved to Phase 0 (before gates run) - team-combat gains architecture→implementation approval gate - design-review APPROVED path consolidated from 3 widgets to 1 multiSelect All 9 team-* skills: - Phase 0 review-mode resolution added (solo/lean/full now respected) - team-audio output path fixed (design/gdd/ → design/audio/) - team-level final doc compilation delegated to level-designer subagent - team-narrative localization-lead added to composition list - team-qa sprint path fixed (flat files, not directories) - team-release NO-GO override captures written justification - team-live-ops Cancel verdict now explicitly BLOCKED Other fixes: - Art bible path standardized to design/art/art-bible.md (3 wrong refs) - AD-PHASE-GATE added to lean-mode skip list in director-gates.md - design-system duplicate 5d heading fixed; skeleton decline path added; mandatory agent spawns now respect review mode - story-readiness acceptance criteria thresholds now type-aware - create-stories gains multi-ADR and no-ADR handling guidance - consistency-check creates docs/consistency-failures.md on first run - retrospective frontmatter bash injection replaced with explicit Bash call - smoke-check ls -t gains PowerShell fallback - Conventional Commits format documented in coding-standards.md - gate-check: ADR acceptance gate, QA plan check, chain-of-verification tool-action requirement all added Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: expose --review flag in argument-hints for all team-* skills All 9 team-* skills already implement Phase 0 review-mode resolution internally (full/lean/solo), but none advertised [--review full|lean|solo] in their argument-hint. Users had no way to discover the per-run override. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add SECURITY.md with coordinated disclosure policy Defines scope, reporting process (GitHub private vulnerability reporting), contributor security guidelines for hooks/skills/agents, and 90-day coordinated disclosure timeline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add CONTRIBUTING.md with framework contribution guidelines Covers what PRs are welcome, skill/hook/agent technical requirements, the collaborative principle, testing expectations, commit format, and platform compatibility requirements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add v1.0.0-beta → v1.0 upgrade section to UPGRADING.md Documents the 17 commits since the beta tag: new /vertical-slice gate, entity inventory flow in /map-systems, AskUserQuestion widgets across 7 skills, --review flag exposure on team-* skills, bug fixes (#21, #36, #42, #43, #45), and the new CONTRIBUTING.md and SECURITY.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
13 KiB
name, description, argument-hint, user-invocable, allowed-tools, model
| name | description | argument-hint | user-invocable | allowed-tools | model |
|---|---|---|---|---|---|
| skill-test | Validate skill files for structural compliance and behavioral correctness. Three modes: static (linter), spec (behavioral), audit (coverage report). | static [skill-name | all] | spec [skill-name] | category [skill-name | all] | audit | true | Read, Glob, Grep, Write | sonnet |
Skill Test
Validates .claude/skills/*/SKILL.md files for structural compliance and
behavioral correctness. No external dependencies — runs entirely within the
existing skill/hook/template architecture.
Four modes:
| Mode | Command | Purpose | Token Cost |
|---|---|---|---|
static |
/skill-test static [name|all] |
Structural linter — 7 compliance checks per skill | Low (~1k/skill) |
spec |
/skill-test spec [name] |
Behavioral verifier — evaluates assertions in test spec | Medium (~5k/skill) |
category |
/skill-test category [name|all] |
Category rubric — checks skill against its category-specific metrics | Low (~2k/skill) |
audit |
/skill-test audit |
Coverage report — skills, agent specs, last test dates | Low (~3k total) |
Phase 1: Parse Arguments
Determine mode from the first argument:
static [name]→ run 7 structural checks on one skillstatic all→ run 7 structural checks on all skills (Glob.claude/skills/*/SKILL.md)spec [name]→ read skill + test spec, evaluate assertionscategory [name]→ run category-specific rubric fromCCGS Skill Testing Framework/quality-rubric.mdcategory all→ run category rubric for every skill that has acategory:in catalogaudit(or no argument) → read catalog, list all skills and agents, show coverage
If argument is missing or unrecognized, output usage and stop.
Phase 2A: Static Mode — Structural Linter
For each skill being tested, read its SKILL.md fully and run all 7 checks:
Check 1 — Required Frontmatter Fields
The file must contain all of these in the YAML frontmatter block:
name:description:argument-hint:user-invocable:allowed-tools:
FAIL if any are absent.
Check 2 — Multiple Phases
The skill must have ≥2 numbered phase headings. Look for patterns like:
## Phase Nor## Phase N:## N.(numbered top-level sections)- At least 2 distinct
##headings if phases aren't explicitly numbered
FAIL if fewer than 2 phase-like headings are found.
Check 3 — Verdict Keywords
The skill must contain at least one of: PASS, FAIL, CONCERNS, APPROVED,
BLOCKED, COMPLETE, READY, COMPLIANT, NON-COMPLIANT
FAIL if none are present.
Check 4 — Collaborative Protocol Language
The skill must contain ask-before-write language. Look for:
"May I write"(canonical form)"before writing"or"approval"near file-write instructions"ask"+"write"in close proximity (within same section)
WARN if absent (some read-only skills legitimately skip this).
FAIL if allowed-tools includes Write or Edit but no ask-before-write language is found.
Check 5 — Next-Step Handoff
The skill must end with a recommended next action or follow-up path. Look for:
- A final section mentioning another skill (e.g.,
/story-done,/gate-check) - "Recommended next" or "next step" phrasing
- A "Follow-Up" or "After this" section
WARN if absent.
Check 6 — Fork Context Complexity
If frontmatter contains context: fork, the skill should have ≥5 phase headings
(## level or numbered Phase N headers). Fork context is for complex multi-phase
skills; simple skills should not use it.
WARN if context: fork is set but fewer than 5 phases found.
Check 7 — Argument Hint Plausibility
argument-hint must be non-empty. If the skill body mentions multiple modes
(e.g., "Mode A | Mode B"), the hint should reflect them. Cross-reference the
hint against the first phase's "Parse Arguments" section.
WARN if hint is "" or if documented modes don't match hint.
Static Mode Output Format
For a single skill:
=== Skill Static Check: /[name] ===
Check 1 — Frontmatter Fields: PASS
Check 2 — Multiple Phases: PASS (7 phases found)
Check 3 — Verdict Keywords: PASS (PASS, FAIL, CONCERNS)
Check 4 — Collaborative Protocol: PASS ("May I write" found)
Check 5 — Next-Step Handoff: WARN (no follow-up section found)
Check 6 — Fork Context Complexity: PASS (8 phases, context: fork set)
Check 7 — Argument Hint: PASS
Verdict: WARNINGS (1 warning, 0 failures)
Recommended: Add a "Follow-Up Actions" section at the end of the skill.
For static all, produce a summary table then list any non-compliant skills:
=== Skill Static Check: All 52 Skills ===
Skill | Result | Issues
-----------------------|--------------|-------
gate-check | COMPLIANT |
design-review | COMPLIANT |
story-readiness | WARNINGS | Check 5: no handoff
...
Summary: 48 COMPLIANT, 3 WARNINGS, 1 NON-COMPLIANT
Aggregate Verdict: N WARNINGS / N FAILURES
Phase 2B: Spec Mode — Behavioral Verifier
Step 1 — Locate Files
Find skill at .claude/skills/[name]/SKILL.md.
Look up the spec path from CCGS Skill Testing Framework/catalog.yaml — use the
spec: field for the matching skill entry.
If either is missing:
- Missing skill: "Skill '[name]' not found in
.claude/skills/." - Missing spec path in catalog: "No spec path set for '[name]' in catalog.yaml."
- Spec file not found at path: "Spec file missing at [path]. Run
/skill-test auditto see coverage gaps."
Step 2 — Read Both Files
Read the skill file and test spec file completely.
Step 3 — Evaluate Assertions
For each Test Case in the spec:
- Read the Fixture description (assumed state of project files)
- Read the Expected behavior steps
- Read each Assertion checkbox
For each assertion, evaluate whether the skill's written instructions, if followed correctly given the fixture state, would satisfy it. This is a Claude-evaluated reasoning check, not code execution.
Mark each assertion:
- PASS — skill instructions clearly satisfy this assertion
- PARTIAL — skill instructions partially address it, but with ambiguity
- FAIL — skill instructions would NOT satisfy this assertion given the fixture
For Protocol Compliance assertions (always present):
- Check whether the skill requires "May I write" before file writes
- Check whether the skill presents findings before requesting approval
- Check whether the skill ends with a recommended next step
- Check whether the skill avoids auto-creating files without approval
Step 4 — Build Report
=== Skill Spec Test: /[name] ===
Date: [date]
Spec: CCGS Skill Testing Framework/skills/[category]/[name].md
Case 1: [Happy Path — name]
Fixture: [summary]
Assertions:
[PASS] [assertion text]
[FAIL] [assertion text]
Reason: The skill's Phase 3 says "..." but the fixture state means "..."
Case Verdict: FAIL
Case 2: [Edge Case — name]
...
Case Verdict: PASS
Protocol Compliance:
[PASS] Uses "May I write" before file writes
[PASS] Presents findings before asking approval
[WARN] No explicit next-step handoff at end
Overall Verdict: FAIL (1 case failed, 1 warning)
Step 5 — Offer to Write Results
"May I write these results to CCGS Skill Testing Framework/results/skill-test-spec-[name]-[date].md
and update CCGS Skill Testing Framework/catalog.yaml?"
If yes:
- Write results file to
CCGS Skill Testing Framework/results/ - Update the skill's entry in
CCGS Skill Testing Framework/catalog.yaml:last_spec: [date]last_spec_result: PASS|PARTIAL|FAIL
Phase 2D: Category Mode — Rubric Evaluation
Step 1 — Locate Skill and Category
Find skill at .claude/skills/[name]/SKILL.md.
Look up category: field in CCGS Skill Testing Framework/catalog.yaml.
If skill not found: "Skill '[name]' not found."
If no category: field: "No category assigned for '[name]' in catalog.yaml.
Add category: [name] to the skill entry first."
For category all: collect all skills with a category: field and process each.
category: utility skills are evaluated against U1 (static checks pass) and U2
(gate mode correct if applicable) only — skip to the static mode for U1.
Step 2 — Read Rubric Section
Read CCGS Skill Testing Framework/quality-rubric.md.
Extract the section matching the skill's category (e.g., ### gate, ### team).
Step 3 — Read Skill
Read the skill's SKILL.md fully.
Step 4 — Evaluate Rubric Metrics
For each metric in the category's rubric table:
- Check whether the skill's written instructions clearly satisfy the criterion
- Mark PASS, FAIL, or WARN
- For FAIL/WARN, identify the exact gap in the skill text (quote the relevant section or note its absence)
Step 5 — Output Report
=== Skill Category Check: /[name] ([category]) ===
Metric G1 — Review mode read: PASS
Metric G2 — Full mode directors: FAIL
Gap: Phase 3 spawns only CD-PHASE-GATE; TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE absent
Metric G3 — Lean mode: PHASE-GATE only: PASS
Metric G4 — Solo mode: no directors: PASS
Metric G5 — No auto-advance: PASS
Verdict: FAIL (1 failure, 0 warnings)
Fix: Add TD-PHASE-GATE, PR-PHASE-GATE, and AD-PHASE-GATE to the full-mode director
panel in Phase 3.
Step 6 — Offer to Update Catalog
"May I update CCGS Skill Testing Framework/catalog.yaml to record this category check
(last_category, last_category_result) for [name]?"
Phase 2C: Audit Mode — Coverage Report
Step 1 — Read Catalog
Read CCGS Skill Testing Framework/catalog.yaml. If missing, note that catalog doesn't exist
yet (first-run state).
Step 2 — Enumerate All Skills and Agents
Glob .claude/skills/*/SKILL.md to get the complete list of skills.
Extract skill name from each path (directory name).
Also read the agents: section from CCGS Skill Testing Framework/catalog.yaml to get the
complete list of agents.
Step 3 — Build Skill Coverage Table
For each skill:
- Check if a spec file exists (use the
spec:path from catalog, or globCCGS Skill Testing Framework/skills/*/[name].md) - Look up
last_static,last_static_result,last_spec,last_spec_result,last_category,last_category_result,categoryfrom catalog (or mark as "never" / "—" if not in catalog) - Priority comes from catalog
priority:field (critical/high/medium/low)
Step 3b — Build Agent Coverage Table
For each agent in catalog's agents: section:
- Check if a spec file exists (use the
spec:path from catalog, or globCCGS Skill Testing Framework/agents/*/[name].md) - Look up
last_spec,last_spec_result,categoryfrom catalog
Step 4 — Output Report
=== Skill Test Coverage Audit ===
Date: [date]
SKILLS (72 total)
Specs written: 72 (100%) | Never static tested: 72 | Never category tested: 72
Skill | Cat | Has Spec | Last Static | S.Result | Last Cat | C.Result | Priority
-----------------------|----------|----------|-------------|----------|----------|----------|----------
gate-check | gate | YES | never | — | never | — | critical
design-review | review | YES | never | — | never | — | critical
...
AGENTS (49 total)
Agent specs written: 49 (100%)
Agent | Category | Has Spec | Last Spec | Result
-----------------------|------------|----------|-------------|--------
creative-director | director | YES | never | —
technical-director | director | YES | never | —
...
Top 5 Priority Gaps (skills with no spec, critical/high priority):
(none if all specs are written)
Skill coverage: 72/72 specs (100%)
Agent coverage: 49/49 specs (100%)
No file writes in audit mode.
Offer: "Would you like to run /skill-test static all to check structural
compliance across all skills? /skill-test category all to run category rubric
checks? Or /skill-test spec [name] to run a specific behavioral test?"
Phase 3: Recommended Next Steps
After any mode completes, offer contextual follow-up:
- After
static [name]: "Run/skill-test spec [name]to validate behavioral correctness if a test spec exists." - After
static allwith failures: "Address NON-COMPLIANT skills first. Run/skill-test static [name]individually for detailed remediation guidance." - After
spec [name]PASS: "UpdateCCGS Skill Testing Framework/catalog.yamlto record this pass date. Consider running/skill-test auditto find the next spec gap." - After
spec [name]FAIL: "Review the failing assertions and update the skill or the test spec to resolve the mismatch." - After
audit: "Start with the critical-priority gaps. Use the spec template atCCGS Skill Testing Framework/templates/skill-test-spec.mdto create new specs."