Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
467 changes: 467 additions & 0 deletions docs/superpowers/plans/2026-06-09-producer-trace-validation.md

Large diffs are not rendered by default.

370 changes: 370 additions & 0 deletions docs/superpowers/plans/2026-06-09-unified-trace-viewer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,370 @@
# Unified Trace Viewer Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Merge the current online producer trace viewer and offline hotspot viewer into one unified viewer that defaults to all tasks, shows overview + stage stats + task list + task detail, and uses one shared payload model for live and offline modes.

**Architecture:** Move viewer analysis toward a single shared payload builder in `producer_trace_analysis.py`, then make both `producer_trace_viewer.py` and `producer_trace_hotspots.py` render the same page model. Keep live/offline differences in data loading only. Preserve `latest batch` as a client-visible filter option, but make `all tasks` the default semantic view.

**Tech Stack:** Python dataclasses, existing `TraceEvent` JSONL shards, current producer trace analysis helpers, static HTML + inline JavaScript, `unittest`.

---

## File Structure

- `xtuner/tools/producer_trace_analysis.py`
- Shared analysis layer for unified task rows, stage stats, per-task chart rows, and per-scope payloads.
- `xtuner/tools/producer_trace_viewer.py`
- Live viewer server + offline snapshot entrypoint for the unified page.
- `xtuner/tools/producer_trace_hotspots.py`
- Becomes a thin compatibility wrapper around the unified offline page builder.
- `xtuner/v1/rl/trace.py`
- Update `TraceConfig.viewer_scope` default if unified viewer should default to all tasks.
- `tests/rl/test_trace.py`
- Unified viewer payload tests, default scope tests, and compatibility tests.
- `docs/superpowers/specs/2026-06-09-trace-next-phase-working-notes.md`
- Keep requirement decisions synchronized as implementation proceeds.

## Task 1: Add Unified Analysis Payload

**Files:**
- Modify: `xtuner/tools/producer_trace_analysis.py`
- Test: `tests/rl/test_trace.py`

- [ ] **Step 1: Add the failing tests for unified summary semantics**

Add tests that assert:

- overview uses all tasks by default
- stage summary exposes:
- `running_tasks`
- `visited_tasks`
- `avg_s`
- `p95_s`
- `max_s`
- task detail data includes both text timeline events and graphical spans
- failed tasks are counted separately

Sketch:

```python
def test_unified_view_payload_reports_overview_stage_stats_and_task_detail(self):
payload = build_unified_trace_payload_from_events(events, trace_source="/tmp/trace")
self.assertEqual(payload["default_scope"], "all")
self.assertEqual(payload["views"]["all"]["overview"]["total_tasks"], 3)
self.assertEqual(payload["views"]["all"]["overview"]["failed_tasks"], 1)
stage = payload["views"]["all"]["stage_stats"][0]
self.assertIn("running_tasks", stage)
self.assertIn("visited_tasks", stage)
detail = payload["views"]["all"]["task_details"]["gsm8k:1"]
self.assertTrue(detail["timeline_events"])
self.assertTrue(detail["timeline_spans"])
```

- [ ] **Step 2: Run the targeted test to confirm it fails**

Run:

```bash
python -m unittest tests.rl.test_trace.TraceStoreAndViewerTest.test_unified_view_payload_reports_overview_stage_stats_and_task_detail
```

Expected:

- FAIL because `build_unified_trace_payload_from_events` or equivalent fields do not exist yet.

- [ ] **Step 3: Add shared dataclasses / payload builders in `producer_trace_analysis.py`**

Implement shared analysis primitives instead of keeping viewer/hotspot summaries separate:

- enhanced task row
- per-stage stats
- per-task detail payload
- scope-aware top-level payload

The new payload should conceptually look like:

```python
{
"default_scope": "all",
"available_scopes": ["all", "latest-produce-batch"],
"views": {
"all": {
"overview": {...},
"stage_stats": [...],
"task_rows": [...],
"task_details": {...},
},
"latest-produce-batch": {...},
},
}
```

- [ ] **Step 4: Re-run the targeted unified payload test**

Run:

```bash
python -m unittest tests.rl.test_trace.TraceStoreAndViewerTest.test_unified_view_payload_reports_overview_stage_stats_and_task_detail
```

Expected:

- PASS

## Task 2: Replace Separate Viewer/Hotspot Pages With One Unified Page

**Files:**
- Modify: `xtuner/tools/producer_trace_viewer.py`
- Modify: `xtuner/tools/producer_trace_hotspots.py`
- Test: `tests/rl/test_trace.py`

- [ ] **Step 1: Add failing tests for unified page structure**

Add assertions that rendered HTML contains the new sections and no longer contains removed sections:

```python
def test_unified_viewer_html_contains_new_sections(self):
html = render_unified_trace_html(payload, live=False)
self.assertIn("Total tasks", html)
self.assertIn("Failed", html)
self.assertIn("Stage", html)
self.assertIn("Task Timeline", html)
self.assertNotIn("Suspect Open Spans", html)
self.assertNotIn("Latest Stage Distribution", html)
```

- [ ] **Step 2: Run the targeted HTML test to confirm it fails**

Run:

```bash
python -m unittest tests.rl.test_trace.TraceStoreAndViewerTest.test_unified_viewer_html_contains_new_sections
```

Expected:

- FAIL because the old HTML still renders the old layout.

- [ ] **Step 3: Implement the unified HTML / JS page in `producer_trace_viewer.py`**

Refactor page structure to:

- header
- overview cards
- scope toggle
- stage summary table
- task list with filters
- task detail:
- text timeline
- chart timeline below

The JS should:

- switch between `all` and `latest-produce-batch`
- filter task rows by:
- state
- current stage
- search text
- render task detail for the selected row

- [ ] **Step 4: Convert `producer_trace_hotspots.py` into a compatibility entrypoint**

Make the offline hotspots script reuse the unified offline page builder instead of maintaining a separate page model.

Compatibility behavior:

- existing CLI entry still works
- output HTML is the unified viewer page
- offline mode loads static payload only

- [ ] **Step 5: Re-run the targeted HTML test**

Run:

```bash
python -m unittest tests.rl.test_trace.TraceStoreAndViewerTest.test_unified_viewer_html_contains_new_sections
```

Expected:

- PASS

## Task 3: Flip Default Viewer Semantics to All Tasks

**Files:**
- Modify: `xtuner/v1/rl/trace.py`
- Modify: `xtuner/tools/producer_trace_viewer.py`
- Modify: `xtuner/tools/producer_trace_hotspots.py`
- Test: `tests/rl/test_trace.py`

- [ ] **Step 1: Add failing tests for default scope**

Add tests that assert:

- `TraceConfig.viewer_scope` defaults to `"all"`
- CLI default scope for unified viewer is `"all"`
- live payload chooses `all` as `default_scope`

Sketch:

```python
def test_trace_config_defaults_viewer_scope_to_all(self):
self.assertEqual(TraceConfig().viewer_scope, "all")
```

- [ ] **Step 2: Run the targeted default-scope tests**

Run:

```bash
python -m unittest tests.rl.test_trace.TraceStoreAndViewerTest.test_trace_config_defaults_viewer_scope_to_all
```

Expected:

- FAIL because current default is `latest-produce-batch`.

- [ ] **Step 3: Change default viewer scope to `all`**

Update:

- `TraceConfig.viewer_scope`
- CLI defaults for unified viewer/offline page entrypoints
- any tests or assumptions that still rely on `latest-produce-batch` as the default

- [ ] **Step 4: Keep `latest-produce-batch` as an optional filter**

Do not remove the capability. Keep it available in:

- payload `available_scopes`
- UI scope selector
- offline CLI option

- [ ] **Step 5: Re-run the default-scope tests**

Run:

```bash
python -m unittest tests.rl.test_trace.TraceStoreAndViewerTest.test_trace_config_defaults_viewer_scope_to_all
```

Expected:

- PASS

## Task 4: Add Viewer Tests for Failed Tasks and Task Detail Behavior

**Files:**
- Modify: `tests/rl/test_trace.py`

- [ ] **Step 1: Add failing tests for failed-task accounting**

Add tests that assert:

- `failed_tasks` is counted in overview
- failed tasks appear in `task_rows`
- `error_msg` appears in task detail only

Sketch:

```python
def test_unified_viewer_counts_failed_tasks_and_keeps_error_msg_in_task_detail(self):
payload = build_unified_trace_payload_from_events(events, trace_source="/tmp/trace")
overview = payload["views"]["all"]["overview"]
self.assertEqual(overview["failed_tasks"], 1)
row = next(row for row in payload["views"]["all"]["task_rows"] if row["trace_id"] == "gsm8k:9")
self.assertEqual(row["status"], "failed")
detail = payload["views"]["all"]["task_details"]["gsm8k:9"]
self.assertIn("trace smoke judger failure", json.dumps(detail, ensure_ascii=False))
self.assertNotIn("error_msg", row)
```

- [ ] **Step 2: Run the targeted failed-task test**

Run:

```bash
python -m unittest tests.rl.test_trace.TraceStoreAndViewerTest.test_unified_viewer_counts_failed_tasks_and_keeps_error_msg_in_task_detail
```

Expected:

- FAIL until failed-task handling and detail structure are correct.

- [ ] **Step 3: Finish the analysis/payload wiring for failed tasks**

Make sure:

- overview counts failed tasks
- task rows expose status and current stage
- task detail contains full event records including `error_msg`
- task rows do not duplicate the full `error_msg`

- [ ] **Step 4: Re-run the targeted failed-task test**

Run:

```bash
python -m unittest tests.rl.test_trace.TraceStoreAndViewerTest.test_unified_viewer_counts_failed_tasks_and_keeps_error_msg_in_task_detail
```

Expected:

- PASS

## Task 5: Full Verification

**Files:**
- Verify touched files only

- [ ] **Step 1: Run unified trace tests**

Run:

```bash
python -m unittest discover -s tests/rl -p test_trace.py
```

Expected:

- PASS

- [ ] **Step 2: Run compile checks**

Run:

```bash
python -m compileall -q xtuner/tools/producer_trace_analysis.py xtuner/tools/producer_trace_viewer.py xtuner/tools/producer_trace_hotspots.py xtuner/v1/rl/trace.py tests/rl/test_trace.py
```

Expected:

- PASS

- [ ] **Step 3: Run diff sanity**

Run:

```bash
git diff --check
```

Expected:

- no whitespace / merge-marker issues

- [ ] **Step 4: Optional live smoke after unit verification**

Run:

```bash
bash -x examples/v1/scripts/run_rl.sh examples/v1/config/testing/rl_trace_smoke_enabled.py lmdeploy "$MODEL_PATH" "$DATA_PATH" "$EVAL_DATA_PATH"
```

Expected:

- unified live viewer starts
- page defaults to all tasks
- scope selector can switch to latest batch

Loading