Add v0.6.0: full skill/agent QA pass, 3 new agents tested, template cleanup

Skills fixed: sprint-status (stale escalation, threshold), retrospective
(existing file detection, missing data fallback), changelog (misc category,
task-ref count), patch-notes (BLOCKED on missing changelog, tone/template
paths), story-readiness (Phase 0 mode resolution, QL-STORY-READY gate),
art-bible, brainstorm, design-system, ux-design, dev-story, story-done,
create-architecture, create-control-manifest, map-systems, propagate-design-change,
quick-design, prototype, asset-spec.

Agents fixed: all 4 directors (gate verdict token format), engine-programmer,
ui-programmer, tools-programmer, technical-artist (engine version safety),
gameplay-programmer (ADR compliance), godot-gdextension-specialist (ABI warning),
systems-designer (escalation path to creative-director), accessibility-specialist
(model, tools, WCAG criterion format, findings template), live-ops-designer
(escalation paths, battle pass value language), qa-tester (model, test case
format, evidence routing, ambiguous criteria, regression scope).

Specs updated: smoke-check and adopt specs rewritten to match actual skill
behavior. catalog.yaml reset to blank template state. Removed
session-state marketing research file, removed session-state from gitignore.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Donchitos
2026-04-07 17:28:46 +10:00
parent a73ff759c9
commit 3614e1dbfb
37 changed files with 925 additions and 287 deletions

View File

@@ -2,15 +2,18 @@
## Skill Summary
`/smoke-check` runs the critical path smoke test checklist for a build. It reads
the QA plan from `production/qa/` and checks each critical path item against the
acceptance criteria defined in the current sprint's stories. Items that can be
evaluated analytically are assessed; items that require runtime verification or
visual inspection are flagged as NEEDS MANUAL CHECK.
`/smoke-check` is the gate between implementation and QA hand-off. It detects the
test environment, runs the automated test suite (via Bash), scans test coverage
against sprint stories, and uses `AskUserQuestion` to batch-verify manual smoke
checks with the developer. It writes a report to `production/qa/smoke-[date].md`
after explicit user approval.
The skill produces no file writes — output is conversational. No director gates
apply. Verdicts: PASS (all critical items verified), FAIL (at least one critical
item fails), or NEEDS MANUAL CHECK (critical items exist that require human verification).
Verdicts: PASS (tests pass, all smoke checks pass, no missing test evidence),
PASS WITH WARNINGS (tests pass or NOT RUN, all critical checks pass, but advisory
gaps exist such as missing test coverage), or FAIL (any automated test failure or
any Batch 1/Batch 2 smoke check returns FAIL).
No director gates apply. The skill does NOT invoke any director agents.
---
@@ -20,150 +23,171 @@ Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: PASS, FAIL, NEEDS MANUAL CHECK
- [ ] Does NOT contain "May I write" language (skill is read-only)
- [ ] Has a next-step handoff (e.g., `/bug-report` on FAIL, `/release-checklist` on PASS)
- [ ] Contains verdict keywords: PASS, PASS WITH WARNINGS, FAIL
- [ ] Contains "May I write" collaborative protocol language before writing the report
- [ ] Has a next-step handoff (e.g., `/bug-report` on FAIL, QA hand-off guidance on PASS)
---
## Director Gate Checks
None. `/smoke-check` is a QA utility skill. No director gates apply.
None. `/smoke-check` is a pre-QA utility skill. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — All critical path items verifiable, PASS
### Case 1: Happy Path — Automated tests pass, manual items confirmed, PASS
**Fixture:**
- `production/qa/qa-plan-sprint-005.md` exists with 4 critical path items
- All 4 items are logic or integration type (analytically assessable)
- Corresponding story ACs are defined and met per sprint stories
- `tests/` directory exists with a GDUnit4 runner script
- Engine detected as Godot from `technical-preferences.md`
- `production/qa/qa-plan-sprint-005.md` exists
- Automated test runner reports 12 tests, 12 passing, 0 failing
- Developer confirms all Batch 1 and Batch 2 smoke checks as PASS
- All sprint stories have matching test files (no MISSING coverage)
**Input:** `/smoke-check`
**Expected behavior:**
1. Skill reads the QA plan and identifies 4 critical path items
2. Skill evaluates each item against the story's acceptance criteria
3. All 4 items pass
4. Skill outputs a checklist: each item with a PASS marker
5. Verdict is PASS with summary: "4/4 critical path items verified"
1. Skill detects test directory and engine, notes QA plan found
2. Runs `godot --headless --script tests/gdunit4_runner.gd` via Bash
3. Parses output: 12/12 passing
4. Scans test coverage — all stories COVERED or EXPECTED
5. Uses `AskUserQuestion` for Batch 1 (core stability) and Batch 2 (sprint mechanics)
6. Developer selects PASS for all items
7. Report assembled: automated tests PASS, all smoke checks PASS, no MISSING coverage
8. Asks "May I write this smoke check report to `production/qa/smoke-[date].md`?"
9. Writes report after approval
10. Delivers verdict: PASS
**Assertions:**
- [ ] All 4 items appear in the checklist output
- [ ] Each item is marked PASS
- [ ] Automated test runner is invoked via Bash
- [ ] `AskUserQuestion` is used for manual smoke check batches
- [ ] "May I write" is asked before writing the report file
- [ ] Report is written to `production/qa/smoke-[date].md`
- [ ] Verdict is PASS
- [ ] No files are written
---
### Case 2: Failure Path — One critical item fails, FAIL verdict
### Case 2: Failure Path — Automated test fails, FAIL verdict
**Fixture:**
- QA plan has 3 critical path items
- Item 2 ("Player health does not go below 0") fails — story AC indicates
clamping logic was not implemented
- `tests/` directory exists, engine is Godot
- Automated test runner reports 10 tests run: 8 passing, 2 failing
- Failing tests: `test_health_clamp_at_zero`, `test_damage_calculation_negative`
- QA plan exists
**Input:** `/smoke-check`
**Expected behavior:**
1. Skill evaluates all 3 items
2. Item 1 and Item 3 pass; Item 2 fails
3. Skill outputs checklist with specific failure: "Item 2 FAIL — Health clamping not verified"
4. Verdict is FAIL
5. Skill suggests running `/bug-report` for the failing item
1. Skill runs automated tests via Bash
2. Parses output — 2 failures detected
3. Records failing test names
4. Proceeds through manual smoke check batches
5. Report shows automated tests as FAIL with failing test names listed
6. Asks to write report; writes after approval
7. Delivers FAIL verdict with message: "The smoke check failed. Do not hand off to
QA until these failures are resolved." Lists failing tests and suggests fixing
then re-running `/smoke-check`
**Assertions:**
- [ ] Verdict is FAIL (not PARTIAL or NEEDS MANUAL CHECK)
- [ ] Failing item is identified by name/description
- [ ] Passing items are also shown (not hidden)
- [ ] `/bug-report` is suggested for the failure
- [ ] Failing test names are listed in the report
- [ ] Verdict is FAIL
- [ ] Post-verdict message directs developer to fix failures before QA hand-off
- [ ] `/smoke-check` re-run is suggested after fixing
---
### Case 3: Visual Item Cannot Be Auto-Verified — NEEDS MANUAL CHECK
### Case 3: Manual Confirmation — AskUserQuestion used, PASS WITH WARNINGS
**Fixture:**
- QA plan has 3 items: 2 logic items (PASS) and 1 visual item
("Explosion VFX triggers correctly on enemy death" — ADVISORY, visual type)
- `tests/` directory exists, engine is Godot
- Automated test runner reports all tests passing (8/8)
- One Logic story has no matching test file (MISSING coverage)
- Developer confirms all Batch 1 and Batch 2 smoke checks as PASS
**Input:** `/smoke-check`
**Expected behavior:**
1. Skill evaluates the 2 logic items — both pass
2. Skill evaluates the visual item — cannot be verified analytically
3. Visual item is marked NEEDS MANUAL CHECK with a note: "Visual quality requires
human verification — see production/qa/evidence/"
4. Verdict is NEEDS MANUAL CHECK (not PASS, because human action is required)
5. Guidance on how to perform manual check is provided
1. Automated tests PASS
2. Coverage scan finds 1 MISSING entry for a Logic story
3. `AskUserQuestion` is used for Batch 1 and Batch 2 — developer confirms all PASS
4. Report shows: automated tests PASS, manual checks all PASS, 1 MISSING coverage entry
5. Verdict is PASS WITH WARNINGS — build ready for QA, but MISSING entry must be
resolved before `/story-done` closes the affected story
6. Asks to write report; writes after approval
**Assertions:**
- [ ] Verdict is NEEDS MANUAL CHECK (not PASS or FAIL)
- [ ] Visual item is marked with explicit NEEDS MANUAL CHECK tag
- [ ] Guidance for manual verification process is included
- [ ] Logic items are still shown as PASS
- [ ] `AskUserQuestion` is used for manual smoke check batches (not inline text prompts)
- [ ] MISSING test coverage entry appears in the report
- [ ] Verdict is PASS WITH WARNINGS (not PASS, not FAIL)
- [ ] Advisory note explains MISSING entry must be resolved before `/story-done`
- [ ] Report file is written to `production/qa/smoke-[date].md`
---
### Case 4: No Smoke Test Plan — Guidance to run /qa-plan
### Case 4: No Test Directory — Skill stops with guidance
**Fixture:**
- `production/qa/` directory exists but contains no QA plan file for the
current sprint
- Current sprint is sprint-006
- `tests/` directory does not exist
- Engine is configured as Godot
**Input:** `/smoke-check`
**Expected behavior:**
1. Skill looks for QA plan for the current sprint — not found
2. Skill outputs: "No smoke test plan found for sprint-006"
3. Skill suggests running `/qa-plan sprint-006` first
4. No checklist is produced
1. Phase 1 checks for `tests/` directory — not found
2. Skill outputs: "No test directory found at `tests/`. Run `/test-setup` to
scaffold the testing infrastructure, or create the directory manually if
tests live elsewhere."
3. Skill stops — no automated tests run, no manual smoke checks, no report written
**Assertions:**
- [ ] Error message names the missing sprint's plan
- [ ] `/qa-plan` is suggested with the correct sprint argument
- [ ] Skill does not produce a checklist with no plan
- [ ] Verdict is not PASS (error state, no checklist evaluated)
- [ ] Error message references the missing `tests/` directory
- [ ] `/test-setup` is suggested as the remediation step
- [ ] Skill stops after this message (no further phases run)
- [ ] No report file is written
---
### Case 5: Director Gate Check — No gate; smoke-check is a QA utility
### Case 5: Director Gate Check — No gate; smoke-check is a QA pre-check utility
**Fixture:**
- Valid QA plan with assessable items
- Valid test setup, automated tests pass, manual smoke checks confirmed
**Input:** `/smoke-check`
**Expected behavior:**
1. Skill runs the smoke check and produces a verdict
2. No director agents are spawned
3. No gate IDs appear in output
1. Skill runs all phases and produces a PASS or PASS WITH WARNINGS verdict
2. No director agents are spawned at any point
3. No gate IDs (CD-*, TD-*, AD-*, PR-*) appear in output
4. No `/gate-check` is invoked
**Assertions:**
- [ ] No director gate is invoked
- [ ] No write tool is called
- [ ] Verdict is PASS, FAIL, or NEEDS MANUAL CHECK — no gate verdict involved
- [ ] No gate skip messages appear
- [ ] Verdict is PASS, PASS WITH WARNINGS, or FAIL — no gate verdict involved
---
## Protocol Compliance
- [ ] Reads QA plan before evaluating any items
- [ ] Evaluates each item explicitly (no silent skips)
- [ ] Visual/feel items are always flagged NEEDS MANUAL CHECK (not auto-passed)
- [ ] FAIL verdict triggers on first critical failure (not advisory)
- [ ] Verdict is PASS, FAIL, or NEEDS MANUAL CHECK — no other verdicts
- [ ] Uses `AskUserQuestion` for all manual smoke check batches (Batch 1, Batch 2, Batch 3)
- [ ] Runs automated tests via Bash before asking any manual questions
- [ ] Asks "May I write" before creating the report file — never writes without approval
- [ ] Verdict vocabulary is strictly PASS / PASS WITH WARNINGS / FAIL — no other verdicts
- [ ] FAIL is triggered by automated test failures or Batch 1/Batch 2 FAIL responses
- [ ] PASS WITH WARNINGS is triggered when MISSING test coverage exists but no critical failures
- [ ] NOT RUN (engine binary unavailable) is recorded as a warning, not a FAIL
- [ ] Does not invoke director gates at any point
---
## Coverage Notes
- The case where the QA plan exists but has no critical path items (all items
are ADVISORY) is not tested; PASS would be returned with a note that no
critical items were checked.
- The distinction between BLOCKING and ADVISORY gate levels from coding-standards.md
is relied upon to determine which items can produce a FAIL.
- Build-specific failures (runtime crashes) that occur during manual testing are
outside the scope of this skill — use `/bug-report` for those.
- The `quick` argument (skips Phase 3 coverage scan and Batch 3) is not separately
fixture-tested; it follows the same pattern as Case 1 with a coverage-skip note in output.
- The `--platform` argument adds platform-specific AskUserQuestion batches and a
per-platform verdict table; not separately tested here.
- The case where the engine binary is not on PATH (NOT RUN) follows the PASS WITH
WARNINGS pattern and is covered by the protocol compliance assertions above.