Add the triage skills and strategies experiments by ralphbean · Pull Request #12 · fullsend-ai/experiments

ralphbean · 2026-04-29T18:03:19Z

These came from fullsend-ai/fullsend#170 and were used to form the basis of our real triage agent from fullsend-ai/fullsend#279

waynesun09

Reviewed with a 10-agent review squad. Posting the top 5 most actionable findings inline — 2 are script bugs that crash or fail on macOS, 2 are data integrity issues affecting experiment results, and 1 is a JSON parsing bug that silently truncates output.

The $SCENARIO_NAME_ unbound variable (github-adapter.sh:76) and grep -oP portability issue (github-adapter.sh:80) are the quickest wins. The data integrity findings in the README and judge.sh are worth addressing before drawing conclusions from the experiment results.

waynesun09 · 2026-05-19T22:41:14Z

+
+---
+_This issue was created by the triage-skill-comparison experiment._
+_Strategy: $STRATEGY_NAME | Scenario: $SCENARIO_NAME_" \


Bug — Unbound variable crash on every run

$SCENARIO_NAME_ (with trailing underscore) is interpreted by bash as a single variable name, since _ is a valid identifier character. This variable is never set, so with set -euo pipefail (line 3), this line will crash every invocation with unbound variable: SCENARIO_NAME_.

Suggested change

_Strategy: $STRATEGY_NAME | Scenario: $SCENARIO_NAME_" \

--body "_Strategy: $STRATEGY_NAME | Scenario: ${SCENARIO_NAME}_"

Use ${SCENARIO_NAME}_ to explicitly delimit the variable name from the trailing underscore literal.

waynesun09 · 2026-05-19T22:41:16Z

+  --label "$LABEL_TRIAGE" \
+  2>/dev/null)"
+
+ISSUE_NUMBER="$(echo "$ISSUE_URL" | grep -oP '\d+$')"


Bug — grep -oP is GNU-only, fails on macOS

grep -P (Perl regex) is not available on macOS's default BSD grep. This will fail with grep: invalid option -- P on any macOS contributor's machine.

Suggested change

ISSUE_NUMBER="$(echo "$ISSUE_URL" | grep -oP '\d+$')"

ISSUE_NUMBER="$(echo "$ISSUE_URL" | grep -oE '[0-9]+$')"

grep -oE with POSIX extended regex achieves the same result and works on both GNU and BSD grep.

waynesun09 · 2026-05-19T22:41:17Z

+| Rank | Strategy | Mean score | Reliability |
+|------|----------|-----------|-------------|
+| 1 (tie) | omo-prometheus | 4.38 | 98% |
+| 1 (tie) | omc-deep-interview | 4.38 | 97% |


Data integrity — Results table is incomplete and reliability numbers don't match trial data

Two issues with this rankings table:

Incomplete results presented as final rankings: slow-search and wrong-search-results scenarios have zero results, and silent-data-corruption only has 2 of 5 strategies. The rankings here are drawn from partial data and may change significantly once all scenarios are run.

Reliability percentages contradict trial data: The table shows values like 98% and 97%, but examining the actual result files, all trials show parse_failures: 0 — suggesting either 100% reliability or a different calculation method that isn't documented.

Consider either marking this table as preliminary/partial, or holding it until all scenario × strategy combinations have results.

waynesun09 · 2026-05-19T22:41:18Z

+}
+
+echo "$JUDGE_JSON" | jq '.' > "$TRIAL_DIR/judge-assessment.json"
+SCORE="$(echo "$JUDGE_JSON" | jq -r '.weighted_total // 0')"


Data integrity — weighted_total values are unreliable

Two problems with trusting the LLM-provided weighted_total:

Arithmetic drift: Spot-checking ~33 of 120 judge assessment files shows 0.05–0.15 point discrepancies between the LLM's weighted_total and the sum you'd get from applying the stated weights to the individual scores. These small errors can change rankings.

Inconsistent nesting: At least one file (crash-on-save/structured-triage/trial-8/judge-assessment.json) has weighted_total nested inside .scores instead of at the top level, causing this jq expression to return 0 via the // 0 fallback — silently zeroing out the score.

Consider computing weighted_total deterministically from the component scores rather than trusting the LLM's arithmetic, and normalize the JSON structure before reading it.

waynesun09 · 2026-05-19T22:41:20Z

+  # Try first { ... } block
+  local braced
+  braced="$(echo "$raw" | awk '/{/{found=1} found{print} /}/{if(found) exit}')"
+  if [[ -n "$braced" ]] && echo "$braced" | jq . &>/dev/null; then
+    echo "$braced"; return 0
+  fi
+
+  echo "$raw"
+  return 1


Bug — extract_json truncates nested JSON objects

The awk pattern /{/,/}/ exits on the first closing } it encounters. For any JSON with nested objects (which is the expected output format for triage responses), this silently truncates the response — cutting off fields that appear after the first nested object closes.

For example, given:

{ "priority": { "level": "high", "reason": "crash" }, "component": "auth" }

The function would return only { "priority": { "level": "high", "reason": "crash" } — dropping "component" entirely.

Consider using a brace-depth counter in awk, or piping through jq to extract the first valid JSON object from the mixed output.

ralphbean · 2026-06-16T20:19:37Z

Rather than fix this one up, I'm going to drop it. Focusing on other things.

fullsend-ai-retro · 2026-06-16T20:24:41Z

🤖 Finished Retro · ✅ Success · Started 8:24 PM UTC · Completed 8:32 PM UTC
Commit: f40693c · View workflow run →

fullsend-ai-retro · 2026-06-16T20:32:35Z

Retro: PR #12 — Add the triage skills and strategies experiments

PR #12 was a human-authored PR by ralphbean adding 3,833 files (136K lines) of triage experiment data. It was opened April 29, received a thorough CHANGES_REQUESTED review from waynesun09 on May 19 (citing a "10-agent review squad"), and was closed without merge on June 16 when the author chose to drop it.

Workflow observations

Limited agent involvement: The review dispatch fired on PR creation, but the actual review was posted by a human 20 days later. No fix agent ran after the CHANGES_REQUESTED review. The workflow was primarily human-driven.
Retro value questionable: Running a retro on a human-authored, closed-without-merge PR with minimal automated agent interaction yields limited actionable signal.

Existing issue coverage

All potential improvement areas are already tracked by open issues in fullsend-ai/fullsend:

Skip retro on PRs with no/minimal agent involvement: #939, #1411, #1675
Large PR handling (3,000+ files, diff truncation): #1041, #2034, #2096, #2118
Dispatch on close events: #1870

No new proposals are warranted — existing issues adequately cover the improvement opportunities observed in this workflow.

Add the triage skills and strategies experiments

f40693c

These came from fullsend-ai/fullsend#170 and were used to form the basis of our real triage agent from fullsend-ai/fullsend#279

ralphbean requested a review from a team as a code owner April 29, 2026 18:03

waynesun09 requested changes May 19, 2026

View reviewed changes

ralphbean closed this Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the triage skills and strategies experiments#12

Add the triage skills and strategies experiments#12
ralphbean wants to merge 1 commit into
mainfrom
triage-skills-and-strategies

ralphbean commented Apr 29, 2026

Uh oh!

waynesun09 left a comment

Uh oh!

waynesun09 May 19, 2026

Uh oh!

waynesun09 May 19, 2026

Uh oh!

waynesun09 May 19, 2026

Uh oh!

waynesun09 May 19, 2026

Uh oh!

waynesun09 May 19, 2026

Uh oh!

ralphbean commented Jun 16, 2026

Uh oh!

fullsend-ai-retro Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

fullsend-ai-retro Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	_Strategy: $STRATEGY_NAME \| Scenario: $SCENARIO_NAME_" \
	--body "_Strategy: $STRATEGY_NAME \| Scenario: ${SCENARIO_NAME}_"

	ISSUE_NUMBER="$(echo "$ISSUE_URL" \| grep -oP '\d+$')"
	ISSUE_NUMBER="$(echo "$ISSUE_URL" \| grep -oE '[0-9]+$')"

Conversation

ralphbean commented Apr 29, 2026

Uh oh!

waynesun09 left a comment

Choose a reason for hiding this comment

Uh oh!

waynesun09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

waynesun09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

waynesun09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

waynesun09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

waynesun09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

ralphbean commented Jun 16, 2026

Uh oh!

fullsend-ai-retro Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fullsend-ai-retro Bot commented Jun 16, 2026

Retro: PR #12 — Add the triage skills and strategies experiments

Workflow observations

Existing issue coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fullsend-ai-retro Bot commented Jun 16, 2026 •

edited

Loading