Gap closure: feedback loops, traceability, and new /content-audit skill

- NEW /content-audit skill: GDD-specified content vs implemented content gap report with COMPLETE/IN PROGRESS/EARLY/NOT STARTED per system - balance-check: Fix & Verify Cycle phase (fix → re-verify → propagate-design-change) - perf-profile: Scope & Timeline Decision phase for M/L effort optimizations - playtest-report: Action Routing phase categorizes findings → design/balance/bugs/polish - review-all-gdds: Phase 4 Cross-System Scenario Walkthrough (multi-system sequences) - story-done: Test-Criterion Traceability (each AC mapped to a test, BLOCKING if >50% untested) - code-review: ADR Compliance Check (ARCHITECTURAL VIOLATION / ADR DRIFT / MINOR DEVIATION) - setup-engine: upgrade subcommand (pre-upgrade API scan, migration plan, VERSION.md update) - story-readiness: Asset References Check (verifies referenced asset paths exist) - validate-assets.sh: invalid JSON now exits 1 (blocking); naming issues exit 0 (warning) - workflow-catalog.yaml + sprint-plan: /scope-check wired into production phase Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-27 04:51:46 +00:00 · 2026-03-12 11:18:43 +11:00
parent 0bbf25ec31
commit 70fbf670fc
12 changed files with 609 additions and 20 deletions
--- a/.claude/docs/workflow-catalog.yaml
+++ b/.claude/docs/workflow-catalog.yaml
@@ -221,6 +221,16 @@ phases:
        repeatable: true
        description: "Verify all acceptance criteria, check GDD/ADR deviations, close the story"

+      - id: scope-check
+        name: "Scope Check"
+        command: /scope-check
+        required: false
+        repeatable: true
+        artifact:
+          glob: "production/sprints/sprint-*.md"
+          note: "Run when stories are added mid-sprint, or before sprint retrospectives"
+        description: "Detect scope creep by comparing current sprint scope to original epic scope. Run (a) when stories are added mid-sprint, or (b) before sprint retrospectives."
+
      - id: sprint-status
        name: "Sprint Status"
        command: /sprint-status
--- a/.claude/hooks/validate-assets.sh
+++ b/.claude/hooks/validate-assets.sh
@@ -1,7 +1,10 @@
 #!/bin/bash
 # Claude Code PostToolUse hook: Validates asset files after Write/Edit
 # Checks naming conventions for files in assets/ directory
-# Exit 0 = success (non-blocking, PostToolUse cannot block)
+#
+# Exit behavior:
+#   exit 0 = success or advisory warnings only (non-blocking)
+#   exit 1 = blocking error (build-breaking issues: invalid JSON, missing required fields)
 #
 # Input schema (PostToolUse for Write/Edit):
 # { "tool_name": "Write", "tool_input": { "file_path": "assets/data/foo.json", "content": "..." } }
@@ -24,14 +27,18 @@ if ! echo "$FILE_PATH" | grep -qE '(^|/)assets/'; then
 fi

 FILENAME=$(basename "$FILE_PATH")
-WARNINGS=""
+WARNINGS=""   # Style/convention issues -- exit 0 with advisory message
+ERRORS=""     # Build-breaking issues -- exit 1 to block the operation

-# Check naming convention (lowercase with underscores only) -- uses grep -E instead of grep -P
+# ADVISORY: Check naming convention (lowercase with underscores only)
+# Naming issues are style violations -- warn but do not block
+# Uses grep -E (POSIX) not grep -P (Perl) for Windows Git Bash compatibility
 if echo "$FILENAME" | grep -qE '[A-Z[:space:]-]'; then
-    WARNINGS="$WARNINGS\nNAMING: $FILE_PATH must be lowercase with underscores (got: $FILENAME)"
+    WARNINGS="$WARNINGS\n  NAMING: $FILE_PATH must be lowercase with underscores (got: $FILENAME)"
 fi

-# Check JSON validity for data files
+# BLOCKING: Check JSON validity for data files
+# Invalid JSON will break runtime loading -- this is a build-breaking error
 if echo "$FILE_PATH" | grep -qE '(^|/)assets/data/.*\.json$'; then
    if [ -f "$FILE_PATH" ]; then
        # Find a working Python command
@@ -45,14 +52,21 @@ if echo "$FILE_PATH" | grep -qE '(^|/)assets/data/.*\.json$'; then

        if [ -n "$PYTHON_CMD" ]; then
            if ! "$PYTHON_CMD" -m json.tool "$FILE_PATH" > /dev/null 2>&1; then
-                WARNINGS="$WARNINGS\nFORMAT: $FILE_PATH is not valid JSON"
+                ERRORS="$ERRORS\n  FORMAT: $FILE_PATH is not valid JSON — fix syntax errors before continuing"
            fi
        fi
    fi
 fi

+# Report warnings (advisory -- non-blocking)
 if [ -n "$WARNINGS" ]; then
-    echo -e "=== Asset Validation ===$WARNINGS\n========================" >&2
+    echo -e "=== Asset Validation: Warnings ===$WARNINGS\n==================================\n(Warnings are advisory. Fix before final commit.)" >&2
+fi
+
+# Report errors and block if any build-breaking issues found
+if [ -n "$ERRORS" ]; then
+    echo -e "=== Asset Validation: ERRORS (Blocking) ===$ERRORS\n===========================================\nFix these errors before proceeding." >&2
+    exit 1
 fi

 exit 0
--- a/.claude/skills/balance-check/SKILL.md
+++ b/.claude/skills/balance-check/SKILL.md
@@ -72,3 +72,22 @@ When this skill is invoked:
 ### Values That Need Attention
 [Specific values with suggested adjustments and rationale]
 ```
+
+6. **Fix & Verify Cycle**
+
+   After presenting the report, ask:
+
+   > "Would you like to fix any of these balance issues now?"
+
+   If yes:
+   - Ask which issue to address first (refer to the Recommendations table by priority row)
+   - Guide the user to update the relevant data file in `assets/data/` or formula in `design/balance/`
+   - After each fix, offer to re-run the relevant balance checks for that system to verify the fix did not introduce new outliers or degenerate interactions
+   - If the fix changes a tuning knob that is defined in a GDD or referenced by an ADR, remind the user:
+     > "This value is defined in a design document. Run `/propagate-design-change [path]` on the affected GDD to find downstream impacts before committing."
+
+   If no:
+   - Summarize the open issues and suggest saving the report to `design/balance/balance-check-[system]-[date].md` for later.
+
+   End with:
+   > "Re-run `/balance-check` after fixes to verify."
--- a/.claude/skills/code-review/SKILL.md
+++ b/.claude/skills/code-review/SKILL.md
@@ -14,10 +14,43 @@ When this skill is invoked:

 2. **Read the CLAUDE.md** for project coding standards.

-3. **Identify the system category** (engine, gameplay, AI, networking, UI, tools)
+3. **ADR Compliance Check**:
+
+   a. Search for ADR references in: the story file associated with this work (if
+      provided), any commit message context, and header comments in the files being
+      reviewed. Look for patterns like `ADR-NNN`, `ADR-[name]`, or
+      `docs/architecture/ADR-`.
+
+   b. If no ADR references are found, note:
+      > "No ADR references found — skipping ADR compliance check."
+      Then proceed to step 4.
+
+   c. For each referenced ADR: read `docs/architecture/ADR-NNN-*.md` and extract
+      the **Decision** and **Consequences** sections.
+
+   d. Check the implementation against each ADR:
+      - What pattern/approach was chosen in the Decision?
+      - Are there alternatives explicitly rejected in the ADR?
+      - Are there required guardrails or constraints in the Consequences?
+
+   e. Classify any deviation found:
+      - **ARCHITECTURAL VIOLATION** (BLOCKING): Implementation uses a pattern
+        explicitly rejected in the ADR (e.g., ADR rejected singletons for game
+        state, but the code uses a singleton).
+      - **ADR DRIFT** (WARNING): Implementation diverges meaningfully from the
+        chosen approach without using an explicitly forbidden pattern (e.g., ADR
+        chose event-based communication but code uses direct method calls).
+      - **MINOR DEVIATION** (INFO): Small difference from ADR guidance that does
+        not affect the overall architecture (e.g., slightly different naming from
+        the ADR's example code).
+
+   f. Include ADR compliance findings in the review output under
+      `### ADR Compliance` before the Standards Compliance section.
+
+4. **Identify the system category** (engine, gameplay, AI, networking, UI, tools)
   and apply category-specific standards.

-4. **Evaluate against coding standards**:
+5. **Evaluate against coding standards**:
   - [ ] Public methods and classes have doc comments
   - [ ] Cyclomatic complexity under 10 per method
   - [ ] No method exceeds 40 lines (excluding data declarations)
@@ -25,32 +58,35 @@ When this skill is invoked:
   - [ ] Configuration values loaded from data files
   - [ ] Systems expose interfaces (not concrete class dependencies)

-5. **Check architectural compliance**:
+6. **Check architectural compliance**:
   - [ ] Correct dependency direction (engine <- gameplay, not reverse)
   - [ ] No circular dependencies between modules
   - [ ] Proper layer separation (UI does not own game state)
   - [ ] Events/signals used for cross-system communication
   - [ ] Consistent with established patterns in the codebase

-6. **Check SOLID compliance**:
+7. **Check SOLID compliance**:
   - [ ] Single Responsibility: Each class has one reason to change
   - [ ] Open/Closed: Extendable without modification
   - [ ] Liskov Substitution: Subtypes substitutable for base types
   - [ ] Interface Segregation: No fat interfaces
   - [ ] Dependency Inversion: Depends on abstractions, not concretions

-7. **Check for common game development issues**:
+8. **Check for common game development issues**:
   - [ ] Frame-rate independence (delta time usage)
   - [ ] No allocations in hot paths (update loops)
   - [ ] Proper null/empty state handling
   - [ ] Thread safety where required
   - [ ] Resource cleanup (no leaks)

-8. **Output the review** in this format:
+9. **Output the review** in this format:

 ```
 ## Code Review: [File/System Name]

+### ADR Compliance: [NO ADRS FOUND / COMPLIANT / DRIFT / VIOLATION]
+[List each ADR checked, result, and any deviations with severity]
+
 ### Standards Compliance: [X/6 passing]
 [List failures with line references]

@@ -67,7 +103,7 @@ When this skill is invoked:
 [What is done well -- always include this section]

 ### Required Changes
-[Must-fix items before approval]
+[Must-fix items before approval — ARCHITECTURAL VIOLATIONs always appear here]

 ### Suggestions
 [Nice-to-have improvements]
--- a/.claude/skills/content-audit/SKILL.md
+++ b/.claude/skills/content-audit/SKILL.md
@@ -0,0 +1,179 @@
+---
+name: content-audit
+description: "Audit GDD-specified content counts against implemented content. Identifies what's planned vs built."
+argument-hint: "[system-name|--summary]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Write
+context: fork
+agent: producer
+---
+
+When this skill is invoked:
+
+Parse the argument:
+- No argument → full audit across all systems
+- `[system-name]` → audit that single system only
+- `--summary` → summary table only, no file write
+
+---
+
+## Phase 1 — Context Gathering
+
+1. **Read `design/gdd/systems-index.md`** for the full list of systems, their
+   categories, and MVP/priority tier.
+
+2. **Read all GDD files** in `design/gdd/` (or the single system GDD if a
+   system name was given).
+
+3. **For each GDD, extract explicit content counts or lists.** Look for patterns
+   like:
+   - "N enemies" / "enemy types:" / list of named enemies
+   - "N levels" / "N areas" / "N maps" / "N stages"
+   - "N items" / "N weapons" / "N equipment pieces"
+   - "N abilities" / "N skills" / "N spells"
+   - "N dialogue scenes" / "N conversations" / "N cutscenes"
+   - "N quests" / "N missions" / "N objectives"
+   - Any explicit enumerated list (bullet list of named content pieces)
+
+4. **Build a content inventory table** from the extracted data:
+
+   | System | Content Type | Specified Count/List | Source GDD |
+   |--------|-------------|---------------------|------------|
+
+   Note: If a GDD describes content qualitatively but gives no count, record
+   "Unspecified" and flag it — unspecified counts are a design gap worth noting.
+
+---
+
+## Phase 2 — Implementation Scan
+
+For each content type found in Phase 1, scan the relevant directories to count
+what has been implemented. Use Glob and Grep to locate files.
+
+**Levels / Areas / Maps:**
+- Glob `assets/**/*.tscn`, `assets/**/*.unity`, `assets/**/*.umap`
+- Glob `src/**/*.tscn`, `src/**/*.unity`
+- Look for scene files in subdirectories named `levels/`, `areas/`, `maps/`,
+  `worlds/`, `stages/`
+- Count unique files that appear to be level/scene definitions (not UI scenes)
+
+**Enemies / Characters / NPCs:**
+- Glob `assets/data/**/enemies/**`, `assets/data/**/characters/**`
+- Glob `src/**/enemies/**`, `src/**/characters/**`
+- Look for `.json`, `.tres`, `.asset`, `.yaml` data files defining entity stats
+- Look for scene/prefab files in character subdirectories
+
+**Items / Equipment / Loot:**
+- Glob `assets/data/**/items/**`, `assets/data/**/equipment/**`,
+  `assets/data/**/loot/**`
+- Look for `.json`, `.tres`, `.asset` data files
+
+**Abilities / Skills / Spells:**
+- Glob `assets/data/**/abilities/**`, `assets/data/**/skills/**`,
+  `assets/data/**/spells/**`
+- Look for `.json`, `.tres`, `.asset` data files
+
+**Dialogue / Conversations / Cutscenes:**
+- Glob `assets/**/*.dialogue`, `assets/**/*.csv`, `assets/**/*.ink`
+- Grep for dialogue data files in `assets/data/`
+
+**Quests / Missions:**
+- Glob `assets/data/**/quests/**`, `assets/data/**/missions/**`
+- Look for `.json`, `.yaml` definition files
+
+**Engine-specific notes (acknowledge in the report):**
+- Counts are approximations — the skill cannot perfectly parse every engine
+  format or distinguish editor-only files from shipped content
+- Scene files may include both gameplay content and system/UI scenes; the scan
+  counts all matches and notes this caveat
+
+---
+
+## Phase 3 — Gap Report
+
+Produce the gap table:
+
+```
+| System | Content Type | Specified | Found | Gap | Status |
+|--------|-------------|-----------|-------|-----|--------|
+```
+
+**Status categories:**
+- `COMPLETE` — Found ≥ Specified (100%+)
+- `IN PROGRESS` — Found is 50–99% of Specified
+- `EARLY` — Found is 1–49% of Specified
+- `NOT STARTED` — Found is 0
+
+**Priority flags:**
+Flag a system as `HIGH PRIORITY` in the report if:
+- Status is `NOT STARTED` or `EARLY`, AND
+- The system is tagged MVP or Vertical Slice in the systems index, OR
+- The systems index shows the system is blocking downstream systems
+
+**Summary line:**
+- Total content items specified (sum of all Specified column values)
+- Total content items found (sum of all Found column values)
+- Overall gap percentage: `(Specified - Found) / Specified * 100`
+
+---
+
+## Phase 4 — Output
+
+### Full audit and single-system modes
+
+Write the report to `docs/content-audit-[YYYY-MM-DD].md`:
+
+```markdown
+# Content Audit — [Date]
+
+## Summary
+- **Total specified**: [N] content items across [M] systems
+- **Total found**: [N]
+- **Gap**: [N] items ([X%] unimplemented)
+- **Scope**: [Full audit | System: name]
+
+> Note: Counts are approximations based on file scanning.
+> The audit cannot distinguish shipped content from editor/test assets.
+> Manual verification is recommended for any HIGH PRIORITY gaps.
+
+## Gap Table
+
+| System | Content Type | Specified | Found | Gap | Status |
+|--------|-------------|-----------|-------|-----|--------|
+
+## HIGH PRIORITY Gaps
+
+[List systems flagged HIGH PRIORITY with rationale]
+
+## Per-System Breakdown
+
+### [System Name]
+- **GDD**: `design/gdd/[file].md`
+- **Content types audited**: [list]
+- **Notes**: [any caveats about scan accuracy for this system]
+
+## Recommendation
+
+Focus implementation effort on:
+1. [Highest-gap HIGH PRIORITY system]
+2. [Second system]
+3. [Third system]
+
+## Unspecified Content Counts
+
+The following GDDs describe content without giving explicit counts.
+Consider adding counts to improve auditability:
+[List of GDDs and content types with "Unspecified"]
+```
+
+After writing the report, ask:
+
+> "Would you like to create backlog stories for any of the content gaps?"
+
+If yes: for each system the user selects, suggest a story title and point them
+to `/create-epics-stories` or `/quick-design` depending on the size of the gap.
+
+### --summary mode
+
+Print the Gap Table and Summary directly to conversation. Do not write a file.
+End with: "Run `/content-audit` without `--summary` to write the full report."
--- a/.claude/skills/perf-profile/SKILL.md
+++ b/.claude/skills/perf-profile/SKILL.md
@@ -84,6 +84,27 @@ When this skill is invoked:

 5. **Output the report** with a summary: top 3 hotspots, estimated headroom vs budget, and recommended next action.

+6. **Scope & Timeline Decision** — activate this phase only if any hotspot has Fix Effort rated M or L.
+
+   Present a summary of the significant-effort items:
+
+   > "The following optimizations require significant effort: [list titles and effort ratings from the Hotspots table]"
+
+   For each M/L item, ask the user to choose one of:
+
+   - **A) Implement the optimization** (estimated effort: [S/M/L] — proceed with fix now or schedule it)
+   - **B) Reduce feature scope to avoid the bottleneck** (run `/scope-check [feature]` to analyze the trade-offs)
+   - **C) Accept the performance hit and defer to Polish phase** (log it as a known issue)
+   - **D) Escalate to technical-director for an architectural decision** (the bottleneck warrants an ADR)
+
+   For choice B, remind the user:
+   > "Run `/scope-check [feature]` to see what simplifications are available without sacrificing player experience."
+
+   For choice D, note:
+   > "A bottleneck requiring architectural change should become a new Architecture Decision Record. Run `/architecture-decision` to capture the decision and its trade-offs."
+
+   If multiple items are deferred to Polish (choice C), record them in the report under a `### Deferred to Polish` section so they are not lost.
+
 ### Rules
 - Never optimize without measuring first — gut feelings about performance are unreliable
 - Recommendations must include estimated impact — "make it faster" is not actionable
--- a/.claude/skills/playtest-report/SKILL.md
+++ b/.claude/skills/playtest-report/SKILL.md
@@ -75,3 +75,35 @@ When invoked with `new`, generate this template:
 When invoked with `analyze`, read the raw notes, cross-reference with existing
 design documents, and fill in the template above with structured findings.
 Flag any playtest observations that conflict with design intent.
+
+After generating or analyzing a report, run the **Action Routing** phase:
+
+**Action Routing**
+
+Categorize all findings from the report into the four buckets below (a single
+finding may appear in more than one bucket if appropriate):
+
+- **Design changes needed** — fun issues, player confusion, broken mechanics,
+  observations that conflict with the GDD's intended experience
+- **Balance adjustments** — numbers feel wrong, difficulty too spiked or too
+  flat, economy or progression feedback
+- **Bug reports** — clear implementation defects that are reproducible
+- **Polish items** — not blocking progress, but friction or feel issues noted
+  for later
+
+Present the categorized list, then provide the routing guidance for each
+non-empty bucket:
+
+- **Design changes:** "These findings suggest GDD revisions. Run
+  `/propagate-design-change [path]` on the affected design document to find
+  downstream impacts before making changes."
+- **Balance adjustments:** "Run `/balance-check [system]` to verify the full
+  balance picture before tuning individual values."
+- **Bugs:** "Use `/bug-report` to formally track these so they are not lost
+  between sessions."
+- **Polish items:** "No immediate action required. Consider adding these to the
+  polish backlog in `production/` when the team reaches that phase."
+
+Finally, ask:
+
+> "Which category would you like to act on first?"
--- a/.claude/skills/review-all-gdds/SKILL.md
+++ b/.claude/skills/review-all-gdds/SKILL.md
@@ -339,7 +339,90 @@ exploration.md: "You are a reckless adventurer — diving in without a plan"

 ---

-## Phase 4: Output the Review Report
+## Phase 4: Cross-System Scenario Walkthrough
+
+Walk through the game from the player's perspective to find problems that only
+appear at the interaction boundary between multiple systems — things static
+analysis of individual GDDs cannot surface.
+
+### 4a: Identify Key Multi-System Moments
+
+Scan all GDDs and identify the 3–5 most important player-facing moments where
+multiple systems activate simultaneously. Look specifically for:
+
+- **Combat + Economy overlap**: killing enemies that drop resources, spending
+  resources during combat, death/respawn interacting with economy state
+- **Progression + Difficulty overlap**: level-up triggering mid-fight, ability
+  unlocks changing combat viability, difficulty scaling at progression milestones
+- **Narrative + Gameplay overlap**: dialogue choices locking/unlocking mechanics,
+  story beats interrupting resource loops, quest completion triggering system
+  state changes
+- **3+ system chains**: any player action that triggers System A, which feeds
+  into System B, which triggers System C (these are highest-risk interaction paths)
+
+List each identified scenario with a one-line description before proceeding.
+
+### 4b: Walk Through Each Scenario
+
+For each scenario, step through the sequence explicitly:
+
+1. **Trigger** — what player action or game event starts this?
+2. **Activation order** — which systems activate, in what sequence?
+3. **Data flow** — what does each system output, and is that output a valid
+   input for the next system in the chain?
+4. **Player experience** — what does the player see, hear, or feel at each step?
+5. **Failure modes** — are there any of the following?
+   - **Race conditions**: two systems trying to modify the same state simultaneously
+   - **Feedback loops**: System A amplifies System B which re-amplifies System A
+     with no cap or dampener
+   - **Broken state transitions**: a system assumes a state that a previous
+     system may have changed (e.g., "player is alive" assumption after a combat
+     step that could have caused death)
+   - **Contradictory messaging**: player receives conflicting feedback from two
+     systems reacting to the same event (e.g., "success" sound + "failure" UI)
+   - **Compounding difficulty spikes**: two systems both scaling up at the same
+     progression point, multiplying the intended difficulty increase
+   - **Reward conflicts**: two systems both reacting to the same trigger with
+     rewards that together exceed the intended value (double-dipping)
+   - **Undefined behavior**: the GDDs don't specify what happens in this combined
+     state (neither system's rules cover it)
+
+```
+Example walkthrough:
+Scenario: Player kills elite enemy at level-up threshold during active quest
+
+Trigger: Player lands killing blow on elite enemy
+→ combat.md: awards kill XP (100 pts)
+→ progression.md: XP total crosses level threshold → triggers level-up
+  Output: new level, stat increases, ability unlock popup
+→ quest.md: kill-count criterion met → triggers quest completion event
+  Output: quest reward XP (500 pts), completion fanfare
+→ progression.md (again): quest XP added → triggers SECOND level-up in same frame
+  ⚠️  Data flow issue: quest.md awards XP without checking if a level-up
+  is already in progress. progression.md has no guard against concurrent
+  level-up events. Undefined behavior: does the player level up once or twice?
+  Does the ability popup fire twice? Does the second level use the updated or
+  pre-update stat baseline?
+```
+
+### 4c: Flag Scenario Issues
+
+For each problem found during the walkthrough, categorize severity:
+
+- **BLOCKER**: undefined behavior, broken state transition, or contradictory
+  player messaging — the experience is broken or incoherent in this scenario
+- **WARNING**: compounding spikes, feedback loops without caps, reward conflicts —
+  the experience works but produces unintended outcomes
+- **INFO**: minor ordering ambiguity or messaging overlap — worth noting but
+  unlikely to cause player-visible problems
+
+Add all findings to the output report under **"Cross-System Scenario Issues"**.
+Each finding must cite: the scenario name, the specific systems involved, the
+step where the issue occurs, and the nature of the failure mode.
+
+---
+
+## Phase 5: Output the Review Report

 ```
 ## Cross-GDD Review Report
@@ -373,6 +456,25 @@ Systems Covered: [list]

 ---

+### Cross-System Scenario Issues
+
+Scenarios walked: [N]
+[List scenario names]
+
+#### Blockers
+🔴 [Scenario name] — [Systems involved]
+[Step where failure occurs, nature of the failure mode, what must be resolved]
+
+#### Warnings
+⚠️  [Scenario name] — [Systems involved]
+[What the unintended outcome is, recommendation]
+
+#### Info
+ℹ️  [Scenario name] — [Systems involved]
+[Minor ordering ambiguity or note]
+
+---
+
 ### GDDs Flagged for Revision

 | GDD | Reason | Type | Priority |
@@ -395,7 +497,7 @@ FAIL: One or more blocking issues must be resolved before architecture begins.

 ---

-## Phase 5: Write Report and Flag GDDs
+## Phase 6: Write Report and Flag GDDs

 Ask: "May I write this review to `design/gdd/gdd-cross-review-[date].md`?"

@@ -410,7 +512,7 @@ Ask: "Should I update the systems index to mark these GDDs as needing revision?"

 ---

-## Phase 6: Handoff
+## Phase 7: Handoff

 After the report is written:

--- a/.claude/skills/setup-engine/SKILL.md
+++ b/.claude/skills/setup-engine/SKILL.md
@@ -1,7 +1,7 @@
 ---
 name: setup-engine
 description: "Configure the project's game engine and version. Pins the engine in CLAUDE.md, detects knowledge gaps, and populates engine reference docs via WebSearch when the version is beyond the LLM's training data."
-argument-hint: "[engine version] or no args for guided selection"
+argument-hint: "[engine version] | refresh | upgrade [old-version] [new-version] | no args for guided selection"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write, Edit, WebSearch, WebFetch, Task
 ---
@@ -10,11 +10,13 @@ When this skill is invoked:

 ## 1. Parse Arguments

-Three modes:
+Four modes:

 - **Full spec**: `/setup-engine godot 4.6` — engine and version provided
 - **Engine only**: `/setup-engine unity` — engine provided, version will be looked up
 - **No args**: `/setup-engine` — fully guided mode (engine recommendation + version)
+- **Refresh**: `/setup-engine refresh` — update reference docs (see Section 10)
+- **Upgrade**: `/setup-engine upgrade [old-version] [new-version]` — migrate to a new engine version (see Section 11)

 ---

@@ -275,7 +277,112 @@ If invoked as `/setup-engine refresh`:

 ---

-## 11. Output Summary
+## 11. Upgrade Subcommand
+
+If invoked as `/setup-engine upgrade [old-version] [new-version]`:
+
+### Step 1 — Read Current Version State
+
+Read `docs/engine-reference/<engine>/VERSION.md` to confirm the current pinned
+version, risk level, and any migration note URLs already recorded. If
+`old-version` was not provided as an argument, use the pinned version from this
+file.
+
+### Step 2 — Fetch Migration Guide
+
+Use WebSearch and WebFetch to locate the official migration guide between
+`old-version` and `new-version`:
+
+- Search: `"[engine] [old-version] to [new-version] migration guide"`
+- Search: `"[engine] [new-version] breaking changes changelog"`
+- Fetch the migration guide URL from VERSION.md if one is already recorded,
+  or use the URL found via search.
+
+Extract: renamed APIs, removed APIs, changed defaults, behavior changes, and
+any "must migrate" items.
+
+### Step 3 — Pre-Upgrade Audit
+
+Scan `src/` for code that uses APIs known to be deprecated or changed in the
+target version:
+
+- Use Grep to search for deprecated API names extracted from the migration
+  guide (e.g., old function names, removed node types, changed property names)
+- List each file that matches, with the specific API reference found
+
+Present the audit results as a table:
+
+```
+Pre-Upgrade Audit: [engine] [old-version] → [new-version]
+==========================================================
+
+Files requiring changes:
+  File                              | Deprecated API Found       | Effort
+  --------------------------------- | -------------------------- | ------
+  src/gameplay/player_movement.gd   | old_api_name               | Low
+  src/ui/hud.gd                     | removed_node_type          | Medium
+
+Breaking changes to watch for:
+  - [change description from migration guide]
+  - [change description from migration guide]
+
+Recommended migration order (dependency-sorted):
+  1. [system/layer with fewest dependencies first]
+  2. [next system]
+  ...
+```
+
+If no deprecated APIs are found in `src/`, report: "No deprecated API usage
+found in src/ — upgrade may be low-risk."
+
+### Step 4 — Confirm Before Updating
+
+Ask the user before making any changes:
+
+> "Pre-upgrade audit complete. Found [N] files using deprecated APIs.
+> Proceed with upgrading VERSION.md to [new-version]?
+> (This will update the pinned version and add migration notes — it does NOT
+> change any source files. Source migration is done manually or via stories.)"
+
+Wait for explicit confirmation before continuing.
+
+### Step 5 — Update VERSION.md
+
+After confirmation:
+
+1. Update `docs/engine-reference/<engine>/VERSION.md`:
+   - `Engine Version` → `[new-version]`
+   - `Project Pinned` → today's date
+   - `Last Docs Verified` → today's date
+   - Re-evaluate and update the `Risk Level` and `Post-Cutoff Version Timeline`
+     table if the new version falls beyond the LLM knowledge cutoff
+   - Add a `## Migration Notes — [old-version] → [new-version]` section
+     containing: migration guide URL, key breaking changes, deprecated APIs
+     found in this project, and recommended migration order from the audit
+
+2. If `breaking-changes.md` or `deprecated-apis.md` exist in the engine
+   reference directory, append the new version's changes to those files.
+
+### Step 6 — Post-Upgrade Reminder
+
+After updating VERSION.md, output:
+
+```
+VERSION.md updated: [engine] [old-version] → [new-version]
+
+Next steps:
+1. Migrate deprecated API usages in the [N] files listed above
+2. Run /setup-engine refresh after upgrading the actual engine binary to
+   verify no new deprecations were missed
+3. Run /architecture-review — the engine upgrade may invalidate ADRs that
+   reference specific APIs or engine capabilities
+4. If any ADRs are invalidated, run /propagate-design-change to update
+   downstream stories
+```
+
+---
+
+## 12. Output Summary

 After setup is complete, output:

--- a/.claude/skills/sprint-plan/SKILL.md
+++ b/.claude/skills/sprint-plan/SKILL.md
@@ -142,6 +142,14 @@ Initialize each story from the sprint plan's task tables:
 For `update`: read the existing `sprint-status.yaml`, carry over statuses for
 stories that haven't changed, add new stories, remove dropped ones.

+### Scope Reminder
+
+After presenting the sprint plan, add:
+
+> **Scope check:** If this sprint includes stories added beyond the original epic scope, run `/scope-check [epic]` to detect scope creep before implementation begins.
+
+When reviewing stories during selection (step 3 above), note any stories that appear outside the original epic goals. If any are uncertain, flag them inline: "Are these stories within the original epic scope? If unsure, `/scope-check` can verify."
+
 ### Agent Consultation

 For comprehensive sprint planning, consider consulting:
--- a/.claude/skills/story-done/SKILL.md
+++ b/.claude/skills/story-done/SKILL.md
@@ -95,6 +95,44 @@ options: "Yes — passes", "No — fails", "Not tested yet"
 - Criteria that require a full game build to test (end-to-end gameplay scenarios)
 - Mark as: `DEFERRED — requires playtest session`

+### Test-Criterion Traceability
+
+After completing the pass/fail/deferred check above, map each acceptance
+criterion to the test that covers it:
+
+For each acceptance criterion in the story:
+
+1. Ask: is there a test — unit, integration, or confirmed manual playtest — that
+   directly verifies this criterion?
+   - **Unit test**: check `tests/unit/` for a test file or function name that
+     matches the criterion's subject (use `Glob` and `Grep`)
+   - **Integration test**: check `tests/integration/` similarly
+   - **Manual confirmation**: if the criterion was verified via `AskUserQuestion`
+     above with a "Yes — passes" answer, count that as a manual test
+
+2. Produce a traceability table:
+
+```
+| Criterion | Test | Status |
+|-----------|------|--------|
+| AC-1: [criterion text] | tests/unit/test_foo.gd::test_bar | COVERED |
+| AC-2: [criterion text] | Manual playtest confirmation | COVERED |
+| AC-3: [criterion text] | — | UNTESTED |
+```
+
+3. Apply these escalation rules:
+
+   - If **>50% of criteria are UNTESTED**: escalate to **BLOCKING** — test
+     coverage is insufficient to confirm the story is actually done. The verdict
+     in Phase 6 cannot be COMPLETE until coverage improves.
+   - If **some (≤50%) criteria are UNTESTED**: remain ADVISORY — does not block
+     completion, but must appear in Completion Notes.
+   - If **all criteria are COVERED**: no action needed beyond including the
+     table in the report.
+
+4. For any ADVISORY untested criteria, add to the Completion Notes in Phase 7:
+   `"Untested criteria: [AC-N list]. Recommend adding tests in a follow-up story."`
+
 ---

 ## Phase 4: Check for Deviations
@@ -175,6 +213,13 @@ Before updating any files, present the full report:
 - [ ] [Criterion 3] — FAILS: [reason]
 - [?] [Criterion 4] — DEFERRED: requires playtest

+### Test-Criterion Traceability
+| Criterion | Test | Status |
+|-----------|------|--------|
+| AC-1: [text] | [test file::test name] | COVERED |
+| AC-2: [text] | Manual confirmation | COVERED |
+| AC-3: [text] | — | UNTESTED |
+
 ### Deviations
 [NONE] OR:
 - BLOCKING: [description] — [GDD/ADR reference]
--- a/.claude/skills/story-readiness/SKILL.md
+++ b/.claude/skills/story-readiness/SKILL.md
@@ -144,6 +144,22 @@ items pass or are explicitly marked N/A with a stated reason.
  story that depends on a DRAFT or missing story is BLOCKED, not just
  NEEDS WORK.

+### Asset References Check
+
+- [ ] **Referenced assets exist**: Scan the story text for asset path patterns
+  (paths containing `assets/`, or file extensions `.png`, `.jpg`, `.svg`,
+  `.wav`, `.ogg`, `.mp3`, `.glb`, `.gltf`, `.tres`, `.tscn`, `.res`).
+  - For each asset path found: use Glob to check whether the file exists.
+  - If any referenced asset does not exist: **NEEDS WORK** — note the missing
+    path(s). (The story references assets that have not been created yet.
+    Either remove the reference, create a placeholder, or mark it as an
+    explicit dependency on an asset creation story.)
+  - If all referenced assets exist: note "Referenced assets verified:
+    [count] found."
+  - If no asset paths are referenced in the story: note "No asset references
+    found in story — skipping asset check." This item auto-passes.
+  - This is an existence-only check. Do not validate file format or content.
+
 ### Definition of Done

 - [ ] **At least 3 testable acceptance criteria**: Fewer than 3 suggests