mirror of https://github.com/Donchitos/Claude-Code-Game-Studios.git synced 2026-06-27 04:51:46 +00:00

Files

Donchitos 984023ddac Release v1.0.0 — concept-prototype/vertical-slice split, workflow restructure, polish (#50 )

* Add /vertical-slice skill, prototype overhaul, and workflow integration

- Add /vertical-slice skill for pre-production validation (Phase 4 gate)
- Overhaul /prototype skill with two-mode design: concept prototype (Phase 1)
  vs vertical slice (Phase 4), with clearer differentiation and higher standards for VS
- Update prototyper agent to own both prototype and vertical-slice workflows
- Add prototype-report.md and vertical-slice-report.md output templates
- Update WORKFLOW-GUIDE, quick-start, skills-reference, agent-coordination-map,
  and skill-flow-diagrams to fully integrate both skills into the 7-phase pipeline
- Remove orphaned empty quick-prototype/ directory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sync v1 counts + polish

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add entity inventory flow, relax vertical-slice gate, improve UX authoring prompts

- /asset-spec: new Phase 0b entity & screen inventory when no argument and no
  existing inventory — reads GDDs/art-bible, proposes categorized list, writes
  design/assets/entity-inventory.md collaboratively
- /asset-spec: entity/character target falls back to inline user description
  when no source doc exists, rather than failing
- /gate-check: vertical slice changed from blocking to CONCERNS-only when
  absent; built-but-broken slice still fails; adds entity inventory as gate artifact
- /ux-design: convert inline approval prompts to AskUserQuestion for structured
  option capture at key authoring decision points
- workflow-catalog.yaml: entity-inventory step added to pre-production; UX spec
  min_count raised to 3; vertical-slice and prototype marked required: false with
  updated descriptions
- .gitignore: exclude marrow/ eval tooling directory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add missing AskUserQuestion widgets to 7 skills

Audit found 11 decision points across 7 skills where structured option
prompts were missing — using plain text, auto-selection, or no gate at all.

Skills patched:
- create-epics: per-epic approval + producer CONCERNS verdict
- sprint-plan: producer CONCERNS verdict with scope/timeline options
- milestone-review: AT RISK / OFF TRACK producer verdicts require acknowledgement
- retrospective: existing-retro handling converted from plain text [A]/[B]
- quick-design: classification confirmation + draft approve/revise/redirect
- tech-debt add mode: category (6 options) + effort (S/M/L/XL) structured capture
- regression-suite: no-arg mode selection instead of silent auto-detect
- hotfix: severity confirmation gate before workflow begins

Also added AskUserQuestion to allowed-tools headers for retrospective,
quick-design, tech-debt, regression-suite, and hotfix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Prep v1 stable: fix WORKFLOW-GUIDE counts, stale agent names, and skill model fields

- WORKFLOW-GUIDE.md: correct agent count (48→49), skill count (66/68→73),
  add 6 missing skills to Appendix B, fix Creative category count (2→4),
  replace 3 non-existent agent names with correct ue-*/unity-* specialists,
  add missing godot-csharp/gdextension specialists to hierarchy,
  fix production/stories/ paths → production/epics/
- coordination-rules.md: replace "not yet used" with opt-in env var note
- quick-start.md: rename duplicate "Validate the concept" label → "Prototype the mechanic"
- skill-flow-diagrams.md: remove duplicate legacy UX pipeline section
- All 62 skills missing model: field now have explicit model: sonnet

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: comprehensive skill audit — consistency, UX, and flow gaps

Two-pass audit fixing ~35 bugs across 41 files.

Pre-production flow:
- Brainstorm next-steps split into Path A (design-first) and Path B
  (prototype-first) — eliminates "prototype after architecture" confusion
- /architecture-review added to pre-production flow in brainstorm and
  create-architecture handoffs
- gate-check traceability check corrected to requirements-traceability.md
- dev-story TR registry error now points to /architecture-review (not /create-epics)
- start now writes production/stage.txt on first onboarding

AskUserQuestion gaps filled:
- balance-check, code-review, hotfix, day-one-patch, consistency-check
  all gain closing widgets and/or missing allowed-tools declarations
- hotfix git branch creation now requires user confirmation
- sprint-plan review-mode setup moved to Phase 0 (before gates run)
- team-combat gains architecture→implementation approval gate
- design-review APPROVED path consolidated from 3 widgets to 1 multiSelect

All 9 team-* skills:
- Phase 0 review-mode resolution added (solo/lean/full now respected)
- team-audio output path fixed (design/gdd/ → design/audio/)
- team-level final doc compilation delegated to level-designer subagent
- team-narrative localization-lead added to composition list
- team-qa sprint path fixed (flat files, not directories)
- team-release NO-GO override captures written justification
- team-live-ops Cancel verdict now explicitly BLOCKED

Other fixes:
- Art bible path standardized to design/art/art-bible.md (3 wrong refs)
- AD-PHASE-GATE added to lean-mode skip list in director-gates.md
- design-system duplicate 5d heading fixed; skeleton decline path added;
  mandatory agent spawns now respect review mode
- story-readiness acceptance criteria thresholds now type-aware
- create-stories gains multi-ADR and no-ADR handling guidance
- consistency-check creates docs/consistency-failures.md on first run
- retrospective frontmatter bash injection replaced with explicit Bash call
- smoke-check ls -t gains PowerShell fallback
- Conventional Commits format documented in coding-standards.md
- gate-check: ADR acceptance gate, QA plan check, chain-of-verification
  tool-action requirement all added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: expose --review flag in argument-hints for all team-* skills

All 9 team-* skills already implement Phase 0 review-mode resolution
internally (full/lean/solo), but none advertised [--review full|lean|solo]
in their argument-hint. Users had no way to discover the per-run override.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add SECURITY.md with coordinated disclosure policy

Defines scope, reporting process (GitHub private vulnerability reporting),
contributor security guidelines for hooks/skills/agents, and 90-day
coordinated disclosure timeline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add CONTRIBUTING.md with framework contribution guidelines

Covers what PRs are welcome, skill/hook/agent technical requirements,
the collaborative principle, testing expectations, commit format,
and platform compatibility requirements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add v1.0.0-beta → v1.0 upgrade section to UPGRADING.md

Documents the 17 commits since the beta tag: new /vertical-slice gate,
entity inventory flow in /map-systems, AskUserQuestion widgets across
7 skills, --review flag exposure on team-* skills, bug fixes
(#21, #36, #42, #43, #45), and the new CONTRIBUTING.md and SECURITY.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-13 20:15:08 +10:00

8.8 KiB

Raw Permalink Blame History

name, description, argument-hint, user-invocable, allowed-tools, model

name	description	argument-hint	user-invocable	allowed-tools	model
test-evidence-review	Quality review of test files and manual evidence documents. Goes beyond existence checks — evaluates assertion coverage, edge case handling, naming conventions, and evidence completeness. Produces ADEQUATE/INCOMPLETE/MISSING verdict per story. Run before QA sign-off or on demand.	[story-path \| sprint \| system-name]	true	Read, Glob, Grep, Write	sonnet

Test Evidence Review

/smoke-check verifies that test files exist and pass. This skill goes further — it reviews the quality of those tests and evidence documents. A test file that exists and passes may still leave critical behaviour uncovered. A manual evidence doc that exists may lack the sign-offs required for closure.

Output: Summary report (in conversation) + optional production/qa/evidence-review-[date].md

When to run:

Before QA hand-off sign-off (/team-qa Phase 5)
On any story where test quality is in question
As part of milestone review for Logic and Integration story quality audit

1. Parse Arguments

Modes:

/test-evidence-review [story-path] — review a single story's evidence
/test-evidence-review sprint — review all stories in the current sprint
/test-evidence-review [system-name] — review all stories in an epic/system
No argument — ask which scope: "Single story", "Current sprint", "A system"

2. Load Stories in Scope

Based on the argument:

Single story: Read the story file directly. Extract: Story Type, Test Evidence section, story slug, system name.

Sprint: Read the most recently modified file in production/sprints/. Extract the list of story file paths from the sprint plan. Read each story file.

System: Glob production/epics/[system-name]/story-*.md. Read each.

For each story, collect:

Type: field (Logic / Integration / Visual/Feel / UI / Config/Data)
## Test Evidence section — the stated expected test file path or evidence doc
Story slug (from file name)
System name (from directory path)
Acceptance Criteria list (all checkbox items)

3. Locate Evidence Files

For each story, find the evidence:

Logic stories: Glob tests/unit/[system]/[story-slug]_test.*

If not found, also try: Grep in tests/unit/[system]/ for files containing the story slug

Integration stories: Glob tests/integration/[system]/[story-slug]_test.*

Also check production/session-logs/ for playtest records mentioning the story

Visual/Feel and UI stories: Glob production/qa/evidence/[story-slug]-evidence.*

Config/Data stories: Glob production/qa/smoke-*.md (any smoke check report)

Note what was found (path) or not found (gap) for each story.

4. Review Automated Test Quality (Logic / Integration)

For each test file found, read it and evaluate:

Assertion coverage

Count the number of distinct assertions (lines containing assert, expect, check, verify, or engine-specific assertion patterns). Low assertion count is a quality signal — a test that makes only 1 assertion per test function may not cover the range of expected behaviour.

Thresholds:

3+ assertions per test function → normal
1-2 assertions per test function → note as potentially thin
0 assertions (test exists but no asserts) → flag as BLOCKING — the test passes vacuously and proves nothing

Edge case coverage

For each acceptance criterion in the story that contains a number, threshold, or "when X happens" conditional: check whether a test function name or test body references that specific case.

Heuristics:

Grep test file for "zero", "max", "null", "empty", "min", "invalid", "boundary", "edge" — presence of any is a positive signal
If the story has a Formulas section with specific bounds: check whether tests exercise at minimum/maximum values

Naming quality

Test function names should describe: the scenario + the expected result. Pattern: test_[scenario]_[expected_outcome]

Flag functions named generically (test_1, test_run, testBasic) as naming issues — they make failures harder to diagnose.

Formula traceability

For Logic stories where the GDD has a Formulas section: check that the test file contains at least one test whose name or comment references the formula name or a formula value. A test that exercises a formula without mentioning it by name is harder to maintain when the formula changes.

5. Review Manual Evidence Quality (Visual/Feel / UI)

For each evidence document found, read it and evaluate:

Criterion linkage

The evidence doc should reference each acceptance criterion from the story. Check: does the evidence doc contain each criterion (or a clear rephrasing)? Missing criteria mean a criterion was never verified.

Sign-off completeness

Check for three sign-off lines (or equivalent fields):

Developer sign-off
Designer / art-lead sign-off (for Visual/Feel)
QA lead sign-off

If any are missing or blank: flag as INCOMPLETE — the story cannot be fully closed without all required sign-offs.

Screenshot / artefact completeness

For Visual/Feel stories: check whether screenshot file paths are referenced in the evidence doc. If referenced, Glob for them to confirm they exist.

For UI stories: check whether a walkthrough sequence (step-by-step interaction log) is present.

Date coverage

Evidence doc should have a date. If the date is earlier than the story's last major change (heuristic: compare against sprint start date from the sprint plan), flag as POTENTIALLY STALE — the evidence may not cover the final implementation.

6. Build the Review Report

For each story, assign a verdict:

Verdict	Meaning
ADEQUATE	Test/evidence exists, passes quality checks, all criteria covered
INCOMPLETE	Test/evidence exists but has quality gaps (thin assertions, missing sign-offs)
MISSING	No test or evidence found for a story type that requires it

The overall sprint/system verdict is the worst story verdict present.

## Test Evidence Review

> **Date**: [date]
> **Scope**: [single story path | Sprint [N] | [system name]]
> **Stories reviewed**: [N]
> **Overall verdict**: ADEQUATE / INCOMPLETE / MISSING

---

### Story-by-Story Results

#### [Story Title] — [Type] — [ADEQUATE/INCOMPLETE/MISSING]

**Test/evidence path**: `[path]` (found) / (not found)

**Automated test quality** *(Logic/Integration only)*:
- Assertion coverage: [N per function on average] — [adequate / thin / none]
- Edge cases: [covered / partial / not found]
- Naming: [consistent / [N] generic names flagged]
- Formula traceability: [yes / no — formula names not referenced in tests]

**Manual evidence quality** *(Visual/Feel/UI only)*:
- Criterion linkage: [N/M criteria referenced]
- Sign-offs: [Developer ✓ | Designer ✗ | QA Lead ✗]
- Artefacts: [screenshots present / missing / N/A]
- Freshness: [dated [date] — current / potentially stale]

**Issues**:
- BLOCKING: [description] *(prevents story-done)*
- ADVISORY: [description] *(should fix before release)*

---

### Summary

| Story | Type | Verdict | Issues |
|-------|------|---------|--------|
| [title] | Logic | ADEQUATE | None |
| [title] | Integration | INCOMPLETE | Thin assertions (avg 1.2/function) |
| [title] | Visual/Feel | INCOMPLETE | QA lead sign-off missing |
| [title] | Logic | MISSING | No test file found |

**BLOCKING items** (must resolve before story can be closed): [N]
**ADVISORY items** (should address before release): [N]

7. Write Output (Optional)

Present the report in conversation.

Ask: "May I write this test evidence review to production/qa/evidence-review-[date].md?"

This is optional — the report is useful standalone. Write only if the user wants a persistent record.

After the report:

For BLOCKING items: "These must be resolved before /story-done can mark the story Complete. Would you like to address any of them now?"
For thin assertions: "Consider running /test-helpers [system] to see scaffolded assertion patterns for common cases."
For missing sign-offs: "Manual sign-off is required from [role]. Share [evidence-path] with them to complete sign-off."

Verdict: COMPLETE — evidence review finished. Use CONCERNS if BLOCKING items were found.

Collaborative Protocol

Report quality issues, do not fix them — this skill reads and evaluates; it does not modify test files or evidence documents
ADEQUATE means adequate for shipping, not perfect — avoid nitpicking tests that are functioning and comprehensive enough to give confidence
BLOCKING vs. ADVISORY distinction is important — only flag BLOCKING when the gap leaves a story criterion genuinely unverified
Ask before writing — the report file is optional; always confirm before writing

8.8 KiB Raw Permalink Blame History