mirror of
https://github.com/Donchitos/Claude-Code-Game-Studios.git
synced 2026-06-27 13:01:50 +00:00
* Add /vertical-slice skill, prototype overhaul, and workflow integration - Add /vertical-slice skill for pre-production validation (Phase 4 gate) - Overhaul /prototype skill with two-mode design: concept prototype (Phase 1) vs vertical slice (Phase 4), with clearer differentiation and higher standards for VS - Update prototyper agent to own both prototype and vertical-slice workflows - Add prototype-report.md and vertical-slice-report.md output templates - Update WORKFLOW-GUIDE, quick-start, skills-reference, agent-coordination-map, and skill-flow-diagrams to fully integrate both skills into the 7-phase pipeline - Remove orphaned empty quick-prototype/ directory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * sync v1 counts + polish Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add entity inventory flow, relax vertical-slice gate, improve UX authoring prompts - /asset-spec: new Phase 0b entity & screen inventory when no argument and no existing inventory — reads GDDs/art-bible, proposes categorized list, writes design/assets/entity-inventory.md collaboratively - /asset-spec: entity/character target falls back to inline user description when no source doc exists, rather than failing - /gate-check: vertical slice changed from blocking to CONCERNS-only when absent; built-but-broken slice still fails; adds entity inventory as gate artifact - /ux-design: convert inline approval prompts to AskUserQuestion for structured option capture at key authoring decision points - workflow-catalog.yaml: entity-inventory step added to pre-production; UX spec min_count raised to 3; vertical-slice and prototype marked required: false with updated descriptions - .gitignore: exclude marrow/ eval tooling directory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add missing AskUserQuestion widgets to 7 skills Audit found 11 decision points across 7 skills where structured option prompts were missing — using plain text, auto-selection, or no gate at all. Skills patched: - create-epics: per-epic approval + producer CONCERNS verdict - sprint-plan: producer CONCERNS verdict with scope/timeline options - milestone-review: AT RISK / OFF TRACK producer verdicts require acknowledgement - retrospective: existing-retro handling converted from plain text [A]/[B] - quick-design: classification confirmation + draft approve/revise/redirect - tech-debt add mode: category (6 options) + effort (S/M/L/XL) structured capture - regression-suite: no-arg mode selection instead of silent auto-detect - hotfix: severity confirmation gate before workflow begins Also added AskUserQuestion to allowed-tools headers for retrospective, quick-design, tech-debt, regression-suite, and hotfix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Prep v1 stable: fix WORKFLOW-GUIDE counts, stale agent names, and skill model fields - WORKFLOW-GUIDE.md: correct agent count (48→49), skill count (66/68→73), add 6 missing skills to Appendix B, fix Creative category count (2→4), replace 3 non-existent agent names with correct ue-*/unity-* specialists, add missing godot-csharp/gdextension specialists to hierarchy, fix production/stories/ paths → production/epics/ - coordination-rules.md: replace "not yet used" with opt-in env var note - quick-start.md: rename duplicate "Validate the concept" label → "Prototype the mechanic" - skill-flow-diagrams.md: remove duplicate legacy UX pipeline section - All 62 skills missing model: field now have explicit model: sonnet Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: comprehensive skill audit — consistency, UX, and flow gaps Two-pass audit fixing ~35 bugs across 41 files. Pre-production flow: - Brainstorm next-steps split into Path A (design-first) and Path B (prototype-first) — eliminates "prototype after architecture" confusion - /architecture-review added to pre-production flow in brainstorm and create-architecture handoffs - gate-check traceability check corrected to requirements-traceability.md - dev-story TR registry error now points to /architecture-review (not /create-epics) - start now writes production/stage.txt on first onboarding AskUserQuestion gaps filled: - balance-check, code-review, hotfix, day-one-patch, consistency-check all gain closing widgets and/or missing allowed-tools declarations - hotfix git branch creation now requires user confirmation - sprint-plan review-mode setup moved to Phase 0 (before gates run) - team-combat gains architecture→implementation approval gate - design-review APPROVED path consolidated from 3 widgets to 1 multiSelect All 9 team-* skills: - Phase 0 review-mode resolution added (solo/lean/full now respected) - team-audio output path fixed (design/gdd/ → design/audio/) - team-level final doc compilation delegated to level-designer subagent - team-narrative localization-lead added to composition list - team-qa sprint path fixed (flat files, not directories) - team-release NO-GO override captures written justification - team-live-ops Cancel verdict now explicitly BLOCKED Other fixes: - Art bible path standardized to design/art/art-bible.md (3 wrong refs) - AD-PHASE-GATE added to lean-mode skip list in director-gates.md - design-system duplicate 5d heading fixed; skeleton decline path added; mandatory agent spawns now respect review mode - story-readiness acceptance criteria thresholds now type-aware - create-stories gains multi-ADR and no-ADR handling guidance - consistency-check creates docs/consistency-failures.md on first run - retrospective frontmatter bash injection replaced with explicit Bash call - smoke-check ls -t gains PowerShell fallback - Conventional Commits format documented in coding-standards.md - gate-check: ADR acceptance gate, QA plan check, chain-of-verification tool-action requirement all added Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: expose --review flag in argument-hints for all team-* skills All 9 team-* skills already implement Phase 0 review-mode resolution internally (full/lean/solo), but none advertised [--review full|lean|solo] in their argument-hint. Users had no way to discover the per-run override. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add SECURITY.md with coordinated disclosure policy Defines scope, reporting process (GitHub private vulnerability reporting), contributor security guidelines for hooks/skills/agents, and 90-day coordinated disclosure timeline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add CONTRIBUTING.md with framework contribution guidelines Covers what PRs are welcome, skill/hook/agent technical requirements, the collaborative principle, testing expectations, commit format, and platform compatibility requirements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add v1.0.0-beta → v1.0 upgrade section to UPGRADING.md Documents the 17 commits since the beta tag: new /vertical-slice gate, entity inventory flow in /map-systems, AskUserQuestion widgets across 7 skills, --review flag exposure on team-* skills, bug fixes (#21, #36, #42, #43, #45), and the new CONTRIBUTING.md and SECURITY.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
252 lines
8.8 KiB
Markdown
252 lines
8.8 KiB
Markdown
---
|
|
name: test-evidence-review
|
|
description: "Quality review of test files and manual evidence documents. Goes beyond existence checks — evaluates assertion coverage, edge case handling, naming conventions, and evidence completeness. Produces ADEQUATE/INCOMPLETE/MISSING verdict per story. Run before QA sign-off or on demand."
|
|
argument-hint: "[story-path | sprint | system-name]"
|
|
user-invocable: true
|
|
allowed-tools: Read, Glob, Grep, Write
|
|
model: sonnet
|
|
---
|
|
|
|
# Test Evidence Review
|
|
|
|
`/smoke-check` verifies that test files **exist** and **pass**. This skill
|
|
goes further — it reviews the **quality** of those tests and evidence documents.
|
|
A test file that exists and passes may still leave critical behaviour uncovered.
|
|
A manual evidence doc that exists may lack the sign-offs required for closure.
|
|
|
|
**Output:** Summary report (in conversation) + optional `production/qa/evidence-review-[date].md`
|
|
|
|
**When to run:**
|
|
- Before QA hand-off sign-off (`/team-qa` Phase 5)
|
|
- On any story where test quality is in question
|
|
- As part of milestone review for Logic and Integration story quality audit
|
|
|
|
---
|
|
|
|
## 1. Parse Arguments
|
|
|
|
**Modes:**
|
|
- `/test-evidence-review [story-path]` — review a single story's evidence
|
|
- `/test-evidence-review sprint` — review all stories in the current sprint
|
|
- `/test-evidence-review [system-name]` — review all stories in an epic/system
|
|
- No argument — ask which scope: "Single story", "Current sprint", "A system"
|
|
|
|
---
|
|
|
|
## 2. Load Stories in Scope
|
|
|
|
Based on the argument:
|
|
|
|
**Single story**: Read the story file directly. Extract: Story Type, Test
|
|
Evidence section, story slug, system name.
|
|
|
|
**Sprint**: Read the most recently modified file in `production/sprints/`.
|
|
Extract the list of story file paths from the sprint plan. Read each story file.
|
|
|
|
**System**: Glob `production/epics/[system-name]/story-*.md`. Read each.
|
|
|
|
For each story, collect:
|
|
- `Type:` field (Logic / Integration / Visual/Feel / UI / Config/Data)
|
|
- `## Test Evidence` section — the stated expected test file path or evidence doc
|
|
- Story slug (from file name)
|
|
- System name (from directory path)
|
|
- Acceptance Criteria list (all checkbox items)
|
|
|
|
---
|
|
|
|
## 3. Locate Evidence Files
|
|
|
|
For each story, find the evidence:
|
|
|
|
**Logic stories**: Glob `tests/unit/[system]/[story-slug]_test.*`
|
|
- If not found, also try: Grep in `tests/unit/[system]/` for files
|
|
containing the story slug
|
|
|
|
**Integration stories**: Glob `tests/integration/[system]/[story-slug]_test.*`
|
|
- Also check `production/session-logs/` for playtest records mentioning the story
|
|
|
|
**Visual/Feel and UI stories**: Glob `production/qa/evidence/[story-slug]-evidence.*`
|
|
|
|
**Config/Data stories**: Glob `production/qa/smoke-*.md` (any smoke check report)
|
|
|
|
Note what was found (path) or not found (gap) for each story.
|
|
|
|
---
|
|
|
|
## 4. Review Automated Test Quality (Logic / Integration)
|
|
|
|
For each test file found, read it and evaluate:
|
|
|
|
### Assertion coverage
|
|
|
|
Count the number of distinct assertions (lines containing assert, expect,
|
|
check, verify, or engine-specific assertion patterns). Low assertion count is
|
|
a quality signal — a test that makes only 1 assertion per test function may
|
|
not cover the range of expected behaviour.
|
|
|
|
Thresholds:
|
|
- **3+ assertions per test function** → normal
|
|
- **1-2 assertions per test function** → note as potentially thin
|
|
- **0 assertions** (test exists but no asserts) → flag as BLOCKING — the
|
|
test passes vacuously and proves nothing
|
|
|
|
### Edge case coverage
|
|
|
|
For each acceptance criterion in the story that contains a number, threshold,
|
|
or "when X happens" conditional: check whether a test function name or
|
|
test body references that specific case.
|
|
|
|
Heuristics:
|
|
- Grep test file for "zero", "max", "null", "empty", "min", "invalid",
|
|
"boundary", "edge" — presence of any is a positive signal
|
|
- If the story has a Formulas section with specific bounds: check whether
|
|
tests exercise at minimum/maximum values
|
|
|
|
### Naming quality
|
|
|
|
Test function names should describe: the scenario + the expected result.
|
|
Pattern: `test_[scenario]_[expected_outcome]`
|
|
|
|
Flag functions named generically (`test_1`, `test_run`, `testBasic`) as
|
|
**naming issues** — they make failures harder to diagnose.
|
|
|
|
### Formula traceability
|
|
|
|
For Logic stories where the GDD has a Formulas section: check that the test
|
|
file contains at least one test whose name or comment references the formula
|
|
name or a formula value. A test that exercises a formula without mentioning
|
|
it by name is harder to maintain when the formula changes.
|
|
|
|
---
|
|
|
|
## 5. Review Manual Evidence Quality (Visual/Feel / UI)
|
|
|
|
For each evidence document found, read it and evaluate:
|
|
|
|
### Criterion linkage
|
|
|
|
The evidence doc should reference each acceptance criterion from the story.
|
|
Check: does the evidence doc contain each criterion (or a clear rephrasing)?
|
|
Missing criteria mean a criterion was never verified.
|
|
|
|
### Sign-off completeness
|
|
|
|
Check for three sign-off lines (or equivalent fields):
|
|
- Developer sign-off
|
|
- Designer / art-lead sign-off (for Visual/Feel)
|
|
- QA lead sign-off
|
|
|
|
If any are missing or blank: flag as INCOMPLETE — the story cannot be fully
|
|
closed without all required sign-offs.
|
|
|
|
### Screenshot / artefact completeness
|
|
|
|
For Visual/Feel stories: check whether screenshot file paths are referenced
|
|
in the evidence doc. If referenced, Glob for them to confirm they exist.
|
|
|
|
For UI stories: check whether a walkthrough sequence (step-by-step interaction
|
|
log) is present.
|
|
|
|
### Date coverage
|
|
|
|
Evidence doc should have a date. If the date is earlier than the story's
|
|
last major change (heuristic: compare against sprint start date from the sprint
|
|
plan), flag as POTENTIALLY STALE — the evidence may not cover the final
|
|
implementation.
|
|
|
|
---
|
|
|
|
## 6. Build the Review Report
|
|
|
|
For each story, assign a verdict:
|
|
|
|
| Verdict | Meaning |
|
|
|---------|---------|
|
|
| **ADEQUATE** | Test/evidence exists, passes quality checks, all criteria covered |
|
|
| **INCOMPLETE** | Test/evidence exists but has quality gaps (thin assertions, missing sign-offs) |
|
|
| **MISSING** | No test or evidence found for a story type that requires it |
|
|
|
|
The overall sprint/system verdict is the worst story verdict present.
|
|
|
|
```markdown
|
|
## Test Evidence Review
|
|
|
|
> **Date**: [date]
|
|
> **Scope**: [single story path | Sprint [N] | [system name]]
|
|
> **Stories reviewed**: [N]
|
|
> **Overall verdict**: ADEQUATE / INCOMPLETE / MISSING
|
|
|
|
---
|
|
|
|
### Story-by-Story Results
|
|
|
|
#### [Story Title] — [Type] — [ADEQUATE/INCOMPLETE/MISSING]
|
|
|
|
**Test/evidence path**: `[path]` (found) / (not found)
|
|
|
|
**Automated test quality** *(Logic/Integration only)*:
|
|
- Assertion coverage: [N per function on average] — [adequate / thin / none]
|
|
- Edge cases: [covered / partial / not found]
|
|
- Naming: [consistent / [N] generic names flagged]
|
|
- Formula traceability: [yes / no — formula names not referenced in tests]
|
|
|
|
**Manual evidence quality** *(Visual/Feel/UI only)*:
|
|
- Criterion linkage: [N/M criteria referenced]
|
|
- Sign-offs: [Developer ✓ | Designer ✗ | QA Lead ✗]
|
|
- Artefacts: [screenshots present / missing / N/A]
|
|
- Freshness: [dated [date] — current / potentially stale]
|
|
|
|
**Issues**:
|
|
- BLOCKING: [description] *(prevents story-done)*
|
|
- ADVISORY: [description] *(should fix before release)*
|
|
|
|
---
|
|
|
|
### Summary
|
|
|
|
| Story | Type | Verdict | Issues |
|
|
|-------|------|---------|--------|
|
|
| [title] | Logic | ADEQUATE | None |
|
|
| [title] | Integration | INCOMPLETE | Thin assertions (avg 1.2/function) |
|
|
| [title] | Visual/Feel | INCOMPLETE | QA lead sign-off missing |
|
|
| [title] | Logic | MISSING | No test file found |
|
|
|
|
**BLOCKING items** (must resolve before story can be closed): [N]
|
|
**ADVISORY items** (should address before release): [N]
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Write Output (Optional)
|
|
|
|
Present the report in conversation.
|
|
|
|
Ask: "May I write this test evidence review to
|
|
`production/qa/evidence-review-[date].md`?"
|
|
|
|
This is optional — the report is useful standalone. Write only if the user
|
|
wants a persistent record.
|
|
|
|
After the report:
|
|
|
|
- For BLOCKING items: "These must be resolved before `/story-done` can mark the
|
|
story Complete. Would you like to address any of them now?"
|
|
- For thin assertions: "Consider running `/test-helpers [system]` to see
|
|
scaffolded assertion patterns for common cases."
|
|
- For missing sign-offs: "Manual sign-off is required from [role]. Share
|
|
`[evidence-path]` with them to complete sign-off."
|
|
|
|
Verdict: **COMPLETE** — evidence review finished. Use CONCERNS if BLOCKING items were found.
|
|
|
|
---
|
|
|
|
## Collaborative Protocol
|
|
|
|
- **Report quality issues, do not fix them** — this skill reads and evaluates;
|
|
it does not modify test files or evidence documents
|
|
- **ADEQUATE means adequate for shipping, not perfect** — avoid nitpicking
|
|
tests that are functioning and comprehensive enough to give confidence
|
|
- **BLOCKING vs. ADVISORY distinction is important** — only flag BLOCKING when
|
|
the gap leaves a story criterion genuinely unverified
|
|
- **Ask before writing** — the report file is optional; always confirm before writing
|