mirror of https://github.com/Donchitos/Claude-Code-Game-Studios.git synced 2026-06-27 04:51:46 +00:00

Files

Donchitos 984023ddac Release v1.0.0 — concept-prototype/vertical-slice split, workflow restructure, polish (#50 )

* Add /vertical-slice skill, prototype overhaul, and workflow integration

- Add /vertical-slice skill for pre-production validation (Phase 4 gate)
- Overhaul /prototype skill with two-mode design: concept prototype (Phase 1)
  vs vertical slice (Phase 4), with clearer differentiation and higher standards for VS
- Update prototyper agent to own both prototype and vertical-slice workflows
- Add prototype-report.md and vertical-slice-report.md output templates
- Update WORKFLOW-GUIDE, quick-start, skills-reference, agent-coordination-map,
  and skill-flow-diagrams to fully integrate both skills into the 7-phase pipeline
- Remove orphaned empty quick-prototype/ directory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sync v1 counts + polish

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add entity inventory flow, relax vertical-slice gate, improve UX authoring prompts

- /asset-spec: new Phase 0b entity & screen inventory when no argument and no
  existing inventory — reads GDDs/art-bible, proposes categorized list, writes
  design/assets/entity-inventory.md collaboratively
- /asset-spec: entity/character target falls back to inline user description
  when no source doc exists, rather than failing
- /gate-check: vertical slice changed from blocking to CONCERNS-only when
  absent; built-but-broken slice still fails; adds entity inventory as gate artifact
- /ux-design: convert inline approval prompts to AskUserQuestion for structured
  option capture at key authoring decision points
- workflow-catalog.yaml: entity-inventory step added to pre-production; UX spec
  min_count raised to 3; vertical-slice and prototype marked required: false with
  updated descriptions
- .gitignore: exclude marrow/ eval tooling directory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add missing AskUserQuestion widgets to 7 skills

Audit found 11 decision points across 7 skills where structured option
prompts were missing — using plain text, auto-selection, or no gate at all.

Skills patched:
- create-epics: per-epic approval + producer CONCERNS verdict
- sprint-plan: producer CONCERNS verdict with scope/timeline options
- milestone-review: AT RISK / OFF TRACK producer verdicts require acknowledgement
- retrospective: existing-retro handling converted from plain text [A]/[B]
- quick-design: classification confirmation + draft approve/revise/redirect
- tech-debt add mode: category (6 options) + effort (S/M/L/XL) structured capture
- regression-suite: no-arg mode selection instead of silent auto-detect
- hotfix: severity confirmation gate before workflow begins

Also added AskUserQuestion to allowed-tools headers for retrospective,
quick-design, tech-debt, regression-suite, and hotfix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Prep v1 stable: fix WORKFLOW-GUIDE counts, stale agent names, and skill model fields

- WORKFLOW-GUIDE.md: correct agent count (48→49), skill count (66/68→73),
  add 6 missing skills to Appendix B, fix Creative category count (2→4),
  replace 3 non-existent agent names with correct ue-*/unity-* specialists,
  add missing godot-csharp/gdextension specialists to hierarchy,
  fix production/stories/ paths → production/epics/
- coordination-rules.md: replace "not yet used" with opt-in env var note
- quick-start.md: rename duplicate "Validate the concept" label → "Prototype the mechanic"
- skill-flow-diagrams.md: remove duplicate legacy UX pipeline section
- All 62 skills missing model: field now have explicit model: sonnet

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: comprehensive skill audit — consistency, UX, and flow gaps

Two-pass audit fixing ~35 bugs across 41 files.

Pre-production flow:
- Brainstorm next-steps split into Path A (design-first) and Path B
  (prototype-first) — eliminates "prototype after architecture" confusion
- /architecture-review added to pre-production flow in brainstorm and
  create-architecture handoffs
- gate-check traceability check corrected to requirements-traceability.md
- dev-story TR registry error now points to /architecture-review (not /create-epics)
- start now writes production/stage.txt on first onboarding

AskUserQuestion gaps filled:
- balance-check, code-review, hotfix, day-one-patch, consistency-check
  all gain closing widgets and/or missing allowed-tools declarations
- hotfix git branch creation now requires user confirmation
- sprint-plan review-mode setup moved to Phase 0 (before gates run)
- team-combat gains architecture→implementation approval gate
- design-review APPROVED path consolidated from 3 widgets to 1 multiSelect

All 9 team-* skills:
- Phase 0 review-mode resolution added (solo/lean/full now respected)
- team-audio output path fixed (design/gdd/ → design/audio/)
- team-level final doc compilation delegated to level-designer subagent
- team-narrative localization-lead added to composition list
- team-qa sprint path fixed (flat files, not directories)
- team-release NO-GO override captures written justification
- team-live-ops Cancel verdict now explicitly BLOCKED

Other fixes:
- Art bible path standardized to design/art/art-bible.md (3 wrong refs)
- AD-PHASE-GATE added to lean-mode skip list in director-gates.md
- design-system duplicate 5d heading fixed; skeleton decline path added;
  mandatory agent spawns now respect review mode
- story-readiness acceptance criteria thresholds now type-aware
- create-stories gains multi-ADR and no-ADR handling guidance
- consistency-check creates docs/consistency-failures.md on first run
- retrospective frontmatter bash injection replaced with explicit Bash call
- smoke-check ls -t gains PowerShell fallback
- Conventional Commits format documented in coding-standards.md
- gate-check: ADR acceptance gate, QA plan check, chain-of-verification
  tool-action requirement all added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: expose --review flag in argument-hints for all team-* skills

All 9 team-* skills already implement Phase 0 review-mode resolution
internally (full/lean/solo), but none advertised [--review full|lean|solo]
in their argument-hint. Users had no way to discover the per-run override.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add SECURITY.md with coordinated disclosure policy

Defines scope, reporting process (GitHub private vulnerability reporting),
contributor security guidelines for hooks/skills/agents, and 90-day
coordinated disclosure timeline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add CONTRIBUTING.md with framework contribution guidelines

Covers what PRs are welcome, skill/hook/agent technical requirements,
the collaborative principle, testing expectations, commit format,
and platform compatibility requirements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add v1.0.0-beta → v1.0 upgrade section to UPGRADING.md

Documents the 17 commits since the beta tag: new /vertical-slice gate,
entity inventory flow in /map-systems, AskUserQuestion widgets across
7 skills, --review flag exposure on team-* skills, bug fixes
(#21, #36, #42, #43, #45), and the new CONTRIBUTING.md and SECURITY.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-13 20:15:08 +10:00

8.0 KiB

Raw Permalink Blame History

name, description, argument-hint, user-invocable, allowed-tools, model

name	description	argument-hint	user-invocable	allowed-tools	model
test-flakiness	Detect non-deterministic (flaky) tests by reading CI run logs or test result history. Aggregates pass rates per test, identifies intermittent failures, recommends quarantine or fix, and maintains a flaky test registry. Best run during Polish phase or after multiple CI runs.	[ci-log-path \| scan \| registry]	true	Read, Glob, Grep, Write, Edit, Bash	sonnet

Test Flakiness Detection

A flaky test is one that sometimes passes and sometimes fails without any code change. Flaky tests are worse than no tests in some ways — they train the team to ignore red CI runs, masking genuine failures. This skill identifies them, explains likely causes, and recommends whether to quarantine or fix each one.

Output: Updated tests/regression-suite.md quarantine section + optional production/qa/flakiness-report-[date].md

When to run:

Polish phase (tests have had many runs; statistical signal is reliable)
When developers start dismissing CI failures as "probably flaky"
After /regression-suite identifies quarantined tests that need diagnosis

1. Parse Arguments

Modes:

/test-flakiness [ci-log-path] — analyse a specific CI run log file
/test-flakiness scan — scan all available CI logs in .github/ or standard log output directories
/test-flakiness registry — read existing regression-suite.md quarantine section and provide remediation guidance for already-known flaky tests
No argument — auto-detect: run scan if CI logs are accessible, else registry

2. Locate CI Log Data

Option A — GitHub Actions (preferred)

Check for test result artifacts:

ls -t .github/ 2>/dev/null
ls -t test-results/ 2>/dev/null

For Godot projects: GdUnit4 outputs XML results compatible with JUnit format. Check test-results/ for .xml files.

For Unity projects: game-ci test runner outputs NUnit XML to test-results/ by default.

For Unreal projects: automation logs go to Saved/Logs/. Grep for Result: Success and Result: Fail patterns.

Option B — Local log files

If a path argument is provided, read that file directly.

Option C — No log data available

If no logs found:

"No CI log data found. To detect flaky tests, this skill needs test result history from multiple runs. Options:

Run the test suite at least 3 times and collect the output logs

Check CI pipeline output and save a log to test-results/

Run /test-flakiness registry to review tests already flagged as flaky in tests/regression-suite.md"

Stop and ask the user which option to pursue.

3. Parse Test Results

For each CI log or result file found, parse:

JUnit XML format (GdUnit4 / Unity):

Grep for <testcase name= to get test names
Grep for <failure or <error to identify failures
Parse classname and name attributes for full test identifiers

Plain text logs:

Grep for pass/fail patterns:
- Godot: PASSED / FAILED adjacent to test names
- Unreal: Result: Success / Result: Fail
- Unity: Test passed / Test failed

Build a table: test_id → [run1_result, run2_result, run3_result, ...]

4. Identify Flaky Tests

A test is flaky if it appears in the result history with both PASS and FAIL outcomes across runs with no code changes between them.

Flakiness thresholds:

High flakiness: Fails in >25% of runs — quarantine immediately
Moderate flakiness: Fails in 5–25% of runs — investigate and fix soon
Low/suspected flakiness: Fails in 1–5% of runs — monitor; may be genuinely rare failure

For each flaky test, classify the likely cause:

Cause classification

Cause	Symptoms	Fix direction
Timing / async	Fails after awaiting signals or timers; pass rate correlates with system load	Add explicit await/synchronisation; avoid time-based delays
Order dependency	Fails when run after specific other tests; passes in isolation	Add proper setup/teardown; ensure test isolation
Random seed	Fails intermittently with no pattern; involves RNG	Pass explicit seed; don't use `randf()` in tests
Resource leak	Fails more often later in a test run	Fix cleanup in teardown; check orphan nodes (Godot) or object disposal (Unity)
External state	Fails when a file, scene, or global exists from a prior test	Isolate test from file system; use in-memory mocks
Floating point	Fails on comparisons like `== 0.5`	Use epsilon comparison (`is_equal_approx`, `Assert.AreApproximately`)
Scene/prefab load race	Fails when scenes are not yet ready	Await one frame after instantiation; use `await get_tree().process_frame`

Use Grep to check the test file for timing calls, randf, global state access, or equality comparisons on floats to narrow down the cause.

For each flaky test:

Quarantine (High flakiness):

"Quarantine this test immediately. Disable it in CI by adding @pytest.mark.skip / [Ignore] / GdUnitSkip annotation. Log it in tests/regression-suite.md quarantine section. The test is now opt-in only. Fix the root cause before removing quarantine."

Investigate and fix soon (Moderate):

"This test is intermittently unreliable. Root cause appears to be [cause]. Suggested fix: [specific fix based on cause classification]. Do not quarantine yet — fix the test directly."

Monitor (Low/suspected):

"This test shows suspected flakiness. Collect more run data before quarantining. Note it as 'suspected' in the regression suite."

6. Generate Reports

In-conversation summary

## Flakiness Detection Results

**Runs analysed**: [N]
**Tests tracked**: [N]

### Flaky Tests Found

| Test | System | Fail Rate | Likely Cause | Recommendation |
|------|--------|-----------|--------------|----------------|
| [test_name] | [system] | [N]% | Timing | Quarantine + fix async |
| [test_name] | [system] | [N]% | Float comparison | Fix: use epsilon compare |
| [test_name] | [system] | [N]% | Order dependency | Investigate teardown |

### Clean Tests (no flakiness detected)

[N] tests ran across [N] runs with consistent results — no flakiness detected.

### Data Limitations

[Note if fewer than 5 runs were available — fewer runs = less statistical confidence]

7. Update Regression Suite + Optional Report File

Ask: "May I update the quarantine section of tests/regression-suite.md with the flaky tests found?"

If yes: use Edit to append entries to the Quarantined Tests table. Never remove existing quarantine entries — only add new ones.

Ask (separately): "May I write a full flakiness report to production/qa/flakiness-report-[date].md?"

The full report includes per-test analysis with cause details and engine-specific fix snippets.

After writing:

For each quarantined test: "Add the engine-specific skip annotation to disable this test in CI. Re-enable after the root cause is fixed."
For fix-eligible tests: "The fix for [test] is straightforward — change the equality comparison on line [N] to use is_equal_approx."
Summary: "Once all quarantine annotations are applied, CI should run green. Schedule fix work for the [N] quarantined tests before the release gate."

Collaborative Protocol

Never delete test files — quarantine means annotate + list, not remove
Statistical confidence matters — with < 3 runs, flag findings as "suspected" not "confirmed"; ask if more run data is available
Fix is always the goal — quarantine is temporary; surface the fix direction even when recommending quarantine
Ask before writing — both the regression-suite update and the report file require explicit approval. On write: Verdict: COMPLETE — flakiness report written. On decline: Verdict: BLOCKED — user declined write.
Flakiness in CI is a team problem — surface the list and recommended actions clearly; do not just silently quarantine without the team knowing

8.0 KiB Raw Permalink Blame History Unescape Escape