Add v0.6.0: full skill/agent QA pass, 3 new agents tested, template cleanup

Skills fixed: sprint-status (stale escalation, threshold), retrospective (existing file detection, missing data fallback), changelog (misc category, task-ref count), patch-notes (BLOCKED on missing changelog, tone/template paths), story-readiness (Phase 0 mode resolution, QL-STORY-READY gate), art-bible, brainstorm, design-system, ux-design, dev-story, story-done, create-architecture, create-control-manifest, map-systems, propagate-design-change, quick-design, prototype, asset-spec. Agents fixed: all 4 directors (gate verdict token format), engine-programmer, ui-programmer, tools-programmer, technical-artist (engine version safety), gameplay-programmer (ADR compliance), godot-gdextension-specialist (ABI warning), systems-designer (escalation path to creative-director), accessibility-specialist (model, tools, WCAG criterion format, findings template), live-ops-designer (escalation paths, battle pass value language), qa-tester (model, test case format, evidence routing, ambiguous criteria, regression scope). Specs updated: smoke-check and adopt specs rewritten to match actual skill behavior. catalog.yaml reset to blank template state. Removed session-state marketing research file, removed session-state from gitignore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-27 13:01:50 +00:00 · 2026-04-07 17:28:46 +10:00
parent a73ff759c9
commit 3614e1dbfb
37 changed files with 925 additions and 287 deletions
--- a/Framework/skills/utility/smoke-check.md
+++ b/Framework/skills/utility/smoke-check.md
@@ -2,15 +2,18 @@

 ## Skill Summary

-`/smoke-check` runs the critical path smoke test checklist for a build. It reads
-the QA plan from `production/qa/` and checks each critical path item against the
-acceptance criteria defined in the current sprint's stories. Items that can be
-evaluated analytically are assessed; items that require runtime verification or
-visual inspection are flagged as NEEDS MANUAL CHECK.
+`/smoke-check` is the gate between implementation and QA hand-off. It detects the
+test environment, runs the automated test suite (via Bash), scans test coverage
+against sprint stories, and uses `AskUserQuestion` to batch-verify manual smoke
+checks with the developer. It writes a report to `production/qa/smoke-[date].md`
+after explicit user approval.

-The skill produces no file writes — output is conversational. No director gates
-apply. Verdicts: PASS (all critical items verified), FAIL (at least one critical
-item fails), or NEEDS MANUAL CHECK (critical items exist that require human verification).
+Verdicts: PASS (tests pass, all smoke checks pass, no missing test evidence),
+PASS WITH WARNINGS (tests pass or NOT RUN, all critical checks pass, but advisory
+gaps exist such as missing test coverage), or FAIL (any automated test failure or
+any Batch 1/Batch 2 smoke check returns FAIL).
+
+No director gates apply. The skill does NOT invoke any director agents.

 ---

@@ -20,150 +23,171 @@ Verified automatically by `/skill-test static` — no fixture needed.

 - [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
 - [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: PASS, FAIL, NEEDS MANUAL CHECK
- [ ] Does NOT contain "May I write" language (skill is read-only)
- [ ] Has a next-step handoff (e.g., `/bug-report` on FAIL, `/release-checklist` on PASS)
+- [ ] Contains verdict keywords: PASS, PASS WITH WARNINGS, FAIL
+- [ ] Contains "May I write" collaborative protocol language before writing the report
+- [ ] Has a next-step handoff (e.g., `/bug-report` on FAIL, QA hand-off guidance on PASS)

 ---

 ## Director Gate Checks

-None. `/smoke-check` is a QA utility skill. No director gates apply.
+None. `/smoke-check` is a pre-QA utility skill. No director gates apply.

 ---

 ## Test Cases

-### Case 1: Happy Path — All critical path items verifiable, PASS
+### Case 1: Happy Path — Automated tests pass, manual items confirmed, PASS

 **Fixture:**
- `production/qa/qa-plan-sprint-005.md` exists with 4 critical path items
- All 4 items are logic or integration type (analytically assessable)
- Corresponding story ACs are defined and met per sprint stories
+- `tests/` directory exists with a GDUnit4 runner script
+- Engine detected as Godot from `technical-preferences.md`
+- `production/qa/qa-plan-sprint-005.md` exists
+- Automated test runner reports 12 tests, 12 passing, 0 failing
+- Developer confirms all Batch 1 and Batch 2 smoke checks as PASS
+- All sprint stories have matching test files (no MISSING coverage)

 **Input:** `/smoke-check`

 **Expected behavior:**
-1. Skill reads the QA plan and identifies 4 critical path items
-2. Skill evaluates each item against the story's acceptance criteria
-3. All 4 items pass
-4. Skill outputs a checklist: each item with a PASS marker
-5. Verdict is PASS with summary: "4/4 critical path items verified"
+1. Skill detects test directory and engine, notes QA plan found
+2. Runs `godot --headless --script tests/gdunit4_runner.gd` via Bash
+3. Parses output: 12/12 passing
+4. Scans test coverage — all stories COVERED or EXPECTED
+5. Uses `AskUserQuestion` for Batch 1 (core stability) and Batch 2 (sprint mechanics)
+6. Developer selects PASS for all items
+7. Report assembled: automated tests PASS, all smoke checks PASS, no MISSING coverage
+8. Asks "May I write this smoke check report to `production/qa/smoke-[date].md`?"
+9. Writes report after approval
+10. Delivers verdict: PASS

 **Assertions:**
- [ ] All 4 items appear in the checklist output
- [ ] Each item is marked PASS
+- [ ] Automated test runner is invoked via Bash
+- [ ] `AskUserQuestion` is used for manual smoke check batches
+- [ ] "May I write" is asked before writing the report file
+- [ ] Report is written to `production/qa/smoke-[date].md`
 - [ ] Verdict is PASS
- [ ] No files are written

 ---

-### Case 2: Failure Path — One critical item fails, FAIL verdict
+### Case 2: Failure Path — Automated test fails, FAIL verdict

 **Fixture:**
- QA plan has 3 critical path items
- Item 2 ("Player health does not go below 0") fails — story AC indicates
-  clamping logic was not implemented
+- `tests/` directory exists, engine is Godot
+- Automated test runner reports 10 tests run: 8 passing, 2 failing
+  - Failing tests: `test_health_clamp_at_zero`, `test_damage_calculation_negative`
+- QA plan exists

 **Input:** `/smoke-check`

 **Expected behavior:**
-1. Skill evaluates all 3 items
-2. Item 1 and Item 3 pass; Item 2 fails
-3. Skill outputs checklist with specific failure: "Item 2 FAIL — Health clamping not verified"
-4. Verdict is FAIL
-5. Skill suggests running `/bug-report` for the failing item
+1. Skill runs automated tests via Bash
+2. Parses output — 2 failures detected
+3. Records failing test names
+4. Proceeds through manual smoke check batches
+5. Report shows automated tests as FAIL with failing test names listed
+6. Asks to write report; writes after approval
+7. Delivers FAIL verdict with message: "The smoke check failed. Do not hand off to
+   QA until these failures are resolved." Lists failing tests and suggests fixing
+   then re-running `/smoke-check`

 **Assertions:**
- [ ] Verdict is FAIL (not PARTIAL or NEEDS MANUAL CHECK)
- [ ] Failing item is identified by name/description
- [ ] Passing items are also shown (not hidden)
- [ ] `/bug-report` is suggested for the failure
+- [ ] Failing test names are listed in the report
+- [ ] Verdict is FAIL
+- [ ] Post-verdict message directs developer to fix failures before QA hand-off
+- [ ] `/smoke-check` re-run is suggested after fixing

 ---

-### Case 3: Visual Item Cannot Be Auto-Verified — NEEDS MANUAL CHECK
+### Case 3: Manual Confirmation — AskUserQuestion used, PASS WITH WARNINGS

 **Fixture:**
- QA plan has 3 items: 2 logic items (PASS) and 1 visual item
-  ("Explosion VFX triggers correctly on enemy death" — ADVISORY, visual type)
+- `tests/` directory exists, engine is Godot
+- Automated test runner reports all tests passing (8/8)
+- One Logic story has no matching test file (MISSING coverage)
+- Developer confirms all Batch 1 and Batch 2 smoke checks as PASS

 **Input:** `/smoke-check`

 **Expected behavior:**
-1. Skill evaluates the 2 logic items — both pass
-2. Skill evaluates the visual item — cannot be verified analytically
-3. Visual item is marked NEEDS MANUAL CHECK with a note: "Visual quality requires
-   human verification — see production/qa/evidence/"
-4. Verdict is NEEDS MANUAL CHECK (not PASS, because human action is required)
-5. Guidance on how to perform manual check is provided
+1. Automated tests PASS
+2. Coverage scan finds 1 MISSING entry for a Logic story
+3. `AskUserQuestion` is used for Batch 1 and Batch 2 — developer confirms all PASS
+4. Report shows: automated tests PASS, manual checks all PASS, 1 MISSING coverage entry
+5. Verdict is PASS WITH WARNINGS — build ready for QA, but MISSING entry must be
+   resolved before `/story-done` closes the affected story
+6. Asks to write report; writes after approval

 **Assertions:**
- [ ] Verdict is NEEDS MANUAL CHECK (not PASS or FAIL)
- [ ] Visual item is marked with explicit NEEDS MANUAL CHECK tag
- [ ] Guidance for manual verification process is included
- [ ] Logic items are still shown as PASS
+- [ ] `AskUserQuestion` is used for manual smoke check batches (not inline text prompts)
+- [ ] MISSING test coverage entry appears in the report
+- [ ] Verdict is PASS WITH WARNINGS (not PASS, not FAIL)
+- [ ] Advisory note explains MISSING entry must be resolved before `/story-done`
+- [ ] Report file is written to `production/qa/smoke-[date].md`

 ---

-### Case 4: No Smoke Test Plan — Guidance to run /qa-plan
+### Case 4: No Test Directory — Skill stops with guidance

 **Fixture:**
- `production/qa/` directory exists but contains no QA plan file for the
-  current sprint
- Current sprint is sprint-006
+- `tests/` directory does not exist
+- Engine is configured as Godot

 **Input:** `/smoke-check`

 **Expected behavior:**
-1. Skill looks for QA plan for the current sprint — not found
-2. Skill outputs: "No smoke test plan found for sprint-006"
-3. Skill suggests running `/qa-plan sprint-006` first
-4. No checklist is produced
+1. Phase 1 checks for `tests/` directory — not found
+2. Skill outputs: "No test directory found at `tests/`. Run `/test-setup` to
+   scaffold the testing infrastructure, or create the directory manually if
+   tests live elsewhere."
+3. Skill stops — no automated tests run, no manual smoke checks, no report written

 **Assertions:**
- [ ] Error message names the missing sprint's plan
- [ ] `/qa-plan` is suggested with the correct sprint argument
- [ ] Skill does not produce a checklist with no plan
- [ ] Verdict is not PASS (error state, no checklist evaluated)
+- [ ] Error message references the missing `tests/` directory
+- [ ] `/test-setup` is suggested as the remediation step
+- [ ] Skill stops after this message (no further phases run)
+- [ ] No report file is written

 ---

-### Case 5: Director Gate Check — No gate; smoke-check is a QA utility
+### Case 5: Director Gate Check — No gate; smoke-check is a QA pre-check utility

 **Fixture:**
- Valid QA plan with assessable items
+- Valid test setup, automated tests pass, manual smoke checks confirmed

 **Input:** `/smoke-check`

 **Expected behavior:**
-1. Skill runs the smoke check and produces a verdict
-2. No director agents are spawned
-3. No gate IDs appear in output
+1. Skill runs all phases and produces a PASS or PASS WITH WARNINGS verdict
+2. No director agents are spawned at any point
+3. No gate IDs (CD-*, TD-*, AD-*, PR-*) appear in output
+4. No `/gate-check` is invoked

 **Assertions:**
 - [ ] No director gate is invoked
- [ ] No write tool is called
- [ ] Verdict is PASS, FAIL, or NEEDS MANUAL CHECK — no gate verdict involved
+- [ ] No gate skip messages appear
+- [ ] Verdict is PASS, PASS WITH WARNINGS, or FAIL — no gate verdict involved

 ---

 ## Protocol Compliance

- [ ] Reads QA plan before evaluating any items
- [ ] Evaluates each item explicitly (no silent skips)
- [ ] Visual/feel items are always flagged NEEDS MANUAL CHECK (not auto-passed)
- [ ] FAIL verdict triggers on first critical failure (not advisory)
- [ ] Verdict is PASS, FAIL, or NEEDS MANUAL CHECK — no other verdicts
+- [ ] Uses `AskUserQuestion` for all manual smoke check batches (Batch 1, Batch 2, Batch 3)
+- [ ] Runs automated tests via Bash before asking any manual questions
+- [ ] Asks "May I write" before creating the report file — never writes without approval
+- [ ] Verdict vocabulary is strictly PASS / PASS WITH WARNINGS / FAIL — no other verdicts
+- [ ] FAIL is triggered by automated test failures or Batch 1/Batch 2 FAIL responses
+- [ ] PASS WITH WARNINGS is triggered when MISSING test coverage exists but no critical failures
+- [ ] NOT RUN (engine binary unavailable) is recorded as a warning, not a FAIL
+- [ ] Does not invoke director gates at any point

 ---

 ## Coverage Notes

- The case where the QA plan exists but has no critical path items (all items
-  are ADVISORY) is not tested; PASS would be returned with a note that no
-  critical items were checked.
- The distinction between BLOCKING and ADVISORY gate levels from coding-standards.md
-  is relied upon to determine which items can produce a FAIL.
- Build-specific failures (runtime crashes) that occur during manual testing are
-  outside the scope of this skill — use `/bug-report` for those.
+- The `quick` argument (skips Phase 3 coverage scan and Batch 3) is not separately
+  fixture-tested; it follows the same pattern as Case 1 with a coverage-skip note in output.
+- The `--platform` argument adds platform-specific AskUserQuestion batches and a
+  per-platform verdict table; not separately tested here.
+- The case where the engine binary is not on PATH (NOT RUN) follows the PASS WITH
+  WARNINGS pattern and is covered by the protocol compliance assertions above.