feat: review agent applies contextual labels via issue-labels skill (#1706) by ralphbean · Pull Request #2196 · fullsend-ai/fullsend

ralphbean · 2026-06-11T20:28:36Z

Summary

Generalize the issue-labels skill to work for both issues and PRs (remove hardcoded control-label list, reword triage-specific language)
Add optional label_actions field to the review result schema (same shape as triage)
Add label_actions processing to post-review.sh — validates labels against a control-label guard, appends reason to review body, applies via GitHub labels API
Wire issue-labels into the review agent harness and definition
Update user-facing docs (review agent docs, skills table, triage docs alignment)

Test plan

Schema validation tests pass (8 cases: 4 valid, 4 invalid)
Post-review control-label guard tests pass (19 total: 9 existing + 10 new)
make lint passes
Manual: fullsend run review against appdumpster/test-repo#33 — agent produces label_actions with valid repo labels, schema validates, post-script validates and appends label reason to review body (3 runs, consistent results — see comment)

🤖 Generated with Claude Code

Generalize the issue-labels skill to work for both triage and review agents, then wire it into the review agent's harness, schema, agent definition, and post-script. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>

Six tasks covering skill generalization, schema extension, post-script label processing, harness/agent wiring, and user-facing documentation. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>

Remove hardcoded control-label exclusion list (post-scripts enforce this server-side) and reword triage-specific language to be agent-agnostic. Add note to skip issue-type check for PRs. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>

Same shape as triage-result.schema.json. The field is optional -- when omitted the post-script skips label processing. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>

Validate agent-recommended labels against a control-label guard list, check label existence, append reason to review body, and apply mutations via the GitHub labels API after posting. Mirrors the label_actions processing in post-triage.sh. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>

Add issue-labels to the harness skills list and agent definition. Document when and how to invoke the skill during review, and add label_actions to the pipeline mode output docs. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>

Add issue-labels skill section to review agent docs, update the built-in skills table, and align triage docs example with the generalized skill language. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>

github-actions · 2026-06-11T20:30:10Z

Site preview

Preview: https://ccafaa30-site.fullsend-ai.workers.dev

Commit: 7fdab7e668efef8a50024519c80971e275aaf491

fullsend-ai-review · 2026-06-11T20:31:21Z

🤖 Review · Started 8:31 PM UTC
Commit: 0ec7546 · View workflow run →

codecov · 2026-06-11T20:32:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- Revert triage.md example wording to stay issue-specific (triage agent doesn't process PRs) - Add trap for LABEL_MODIFIED_RESULT temp file cleanup in post-review.sh - Add integration tests for label_actions processing in post-review-test.sh (10 cases covering: applied, control-label refused, nonexistent skipped, invalid chars refused, remove, multiple add, all-refused no body append, absent, request-changes) Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>

fullsend-ai-review · 2026-06-11T20:48:08Z

🤖 Finished Review · ✅ Success · Started 8:48 PM UTC · Completed 9:01 PM UTC
Commit: 39e19c9 · View workflow run →

fullsend-ai-review · 2026-06-11T21:01:23Z

Review

Findings

Low

[incomplete-change] docs/agents/triage.md:72 — The implementation plan (Task 5, Step 4) explicitly calls for updating docs/agents/triage.md to generalize the example skill description, and includes it in the commit step. The actual diff does not modify triage.md. Either apply the planned changes or remove triage.md from the plan's commit step.
[edge-case] internal/scaffold/fullsend-repo/scripts/post-review.sh:168 — EXISTING_LABELS is populated via gh api ... || true. If the API call fails (rate limit, network error), label_exists() returns false for every label, causing all add actions to be silently skipped with a misleading "does not exist" warning. There is no indication that the label-list fetch itself failed.
[test-integrity] internal/scaffold/fullsend-repo/scripts/post-review-test.sh:104 — REVIEW_CONTROL_LABELS array and is_control_label() function are duplicated between the test file and post-review.sh. If a new control label is added to one but not the other, the unit tests will pass while the production code diverges. The integration tests (which run the real script) partially mitigate this.
[defense-in-depth] internal/scaffold/fullsend-repo/skills/issue-labels/SKILL.md:10 — Removing the hardcoded control-label list from the skill means the LLM agent has no awareness of which labels are forbidden. It will recommend control labels that get refused server-side, wasting tokens and API calls. The server-side guard is the correct primary enforcement point.

Info

[scope-alignment] docs/agents/review.md:9 — "may also apply" is ambiguous about whether contextual labeling is automatic or configurable. The internal agent definition is clearer ("invoke the issue-labels skill"); user-facing docs should match.
[intent-alignment] docs/superpowers/plans/2026-06-11-review-agent-contextual-labels.md — The docs/superpowers/ directory is used for implementation plans and design specs but this directory structure is not documented in CLAUDE.md, AGENTS.md, or README.md.

Previous run

Review

Findings

Medium

[GHA-workflow-command-injection] internal/scaffold/fullsend-repo/scripts/post-review.sh:167 — LA_LABEL is extracted via jq -r which decodes JSON escape sequences (\n becomes a real newline). When the regex rejects the label, the unvalidated value is interpolated into echo "::warning::Refused label '${LA_LABEL}'...". A crafted label containing a JSON-encoded newline followed by a GHA workflow command would inject that command into the GHA log stream. The same vector applies to LA_ACTION at lines 172 and 188, which is never regex-validated at all.
Remediation: Sanitize both LA_LABEL and LA_ACTION before any echo that emits GHA workflow commands. Strip newlines, carriage returns, and :: sequences before interpolation.
[error-handling] internal/scaffold/fullsend-repo/scripts/post-review.sh:217 — The new trap 'rm -f "${LABEL_MODIFIED_RESULT}"' EXIT overwrites the existing trap 'rm -f "${MODIFIED_RESULT}"' EXIT set at line 124. In bash, setting a new trap ... EXIT replaces the previous handler — it does not compose. When both the protected-path downgrade AND label_actions processing fire, the MODIFIED_RESULT temp file is never cleaned up.
Remediation: Compose the trap handlers so both temp files are cleaned up. Use a single trap that references both variables, or accumulate temp files in an array and clean them all in one trap.

Low

[incomplete-change] docs/agents/triage.md:72 — The implementation plan (Task 5, Step 4) explicitly calls for updating docs/agents/triage.md to generalize the example skill description, and includes it in the commit step. The actual diff does not modify triage.md. Either apply the planned changes or remove triage.md from the plan's commit step.
[edge-case] internal/scaffold/fullsend-repo/scripts/post-review.sh:163 — EXISTING_LABELS is populated via gh api ... || true. If the API call fails (rate limit, network error), label_exists() returns false for every label, causing all add actions to be silently skipped with a misleading "does not exist" warning. There is no indication that the label-list fetch itself failed.
[test-integrity] internal/scaffold/fullsend-repo/scripts/post-review-test.sh:104 — REVIEW_CONTROL_LABELS array and is_control_label() function are duplicated between the test file and post-review.sh. If a new control label is added to one but not the other, the unit tests will pass while the production code diverges. The integration tests (which run the real script) partially mitigate this.
[defense-in-depth] internal/scaffold/fullsend-repo/skills/issue-labels/SKILL.md:10 — Removing the hardcoded control-label list from the skill means the LLM agent has no awareness of which labels are forbidden. It will recommend control labels that get refused server-side, wasting tokens and API calls. The server-side guard is the correct primary enforcement point. See also: [Authorization/RBAC] info finding at this location.

Info

[scope-alignment] docs/agents/review.md:9 — "may also apply" is ambiguous about whether contextual labeling is automatic or configurable. The internal agent definition is clearer ("invoke the issue-labels skill"); user-facing docs should match.
[intent-alignment] docs/superpowers/plans/2026-06-11-review-agent-contextual-labels.md — The docs/superpowers/ directory is used for implementation plans and design specs but this directory structure is not documented in CLAUDE.md, AGENTS.md, or README.md.

Previous run

Review

Findings

Medium

[GHA-workflow-command-injection] internal/scaffold/fullsend-repo/scripts/post-review.sh:167 — LA_LABEL is extracted via jq -r which decodes JSON escape sequences (\n becomes a real newline). When the regex rejects the label, the unvalidated value is interpolated into echo "::warning::Refused label '${LA_LABEL}'...". A crafted label containing a JSON-encoded newline followed by a GHA workflow command would inject that command into the GHA log stream. The same vector applies to LA_ACTION at lines 172 and 188, which is never regex-validated at all.
Remediation: Sanitize both LA_LABEL and LA_ACTION before any echo that emits GHA workflow commands. Strip newlines, carriage returns, and :: sequences before interpolation.
[error-handling] internal/scaffold/fullsend-repo/scripts/post-review.sh:217 — The new trap 'rm -f "${LABEL_MODIFIED_RESULT}"' EXIT overwrites the existing trap 'rm -f "${MODIFIED_RESULT}"' EXIT set at line 124. In bash, setting a new trap ... EXIT replaces the previous handler — it does not compose. When both the protected-path downgrade AND label_actions processing fire, the MODIFIED_RESULT temp file is never cleaned up.
Remediation: Compose the trap handlers so both temp files are cleaned up. Use a single trap that references both variables, or accumulate temp files in an array and clean them all in one trap.

Low

[incomplete-change] docs/agents/triage.md:72 — The implementation plan (Task 5, Step 4) explicitly calls for updating docs/agents/triage.md to generalize the example skill description, and includes it in the commit step. The actual diff does not modify triage.md. Either apply the planned changes or remove triage.md from the plan's commit step.
[edge-case] internal/scaffold/fullsend-repo/scripts/post-review.sh:163 — EXISTING_LABELS is populated via gh api ... || true. If the API call fails (rate limit, network error), label_exists() returns false for every label, causing all add actions to be silently skipped with a misleading "does not exist" warning. There is no indication that the label-list fetch itself failed.
[test-integrity] internal/scaffold/fullsend-repo/scripts/post-review-test.sh:104 — REVIEW_CONTROL_LABELS array and is_control_label() function are duplicated between the test file and post-review.sh. If a new control label is added to one but not the other, the unit tests will pass while the production code diverges. The integration tests (which run the real script) partially mitigate this.
[defense-in-depth] internal/scaffold/fullsend-repo/skills/issue-labels/SKILL.md:10 — Removing the hardcoded control-label list from the skill means the LLM agent has no awareness of which labels are forbidden. It will recommend control labels that get refused server-side, wasting tokens and API calls. The server-side guard is the correct primary enforcement point. See also: [Authorization/RBAC] info finding at this location.

Info

[scope-alignment] docs/agents/review.md:9 — "may also apply" is ambiguous about whether contextual labeling is automatic or configurable. The internal agent definition is clearer ("invoke the issue-labels skill"); user-facing docs should match.
[intent-alignment] docs/superpowers/plans/2026-06-11-review-agent-contextual-labels.md — The docs/superpowers/ directory is used for implementation plans and design specs but this directory structure is not documented in CLAUDE.md, AGENTS.md, or README.md.

Previous run (2)

Review

Findings

Medium

[error-handling] internal/scaffold/fullsend-repo/scripts/post-review.sh:217 — The new trap 'rm -f "${LABEL_MODIFIED_RESULT}"' EXIT overwrites the existing trap 'rm -f "${MODIFIED_RESULT}"' EXIT set at line 124 for the protected-path downgrade. In bash, setting a new trap ... EXIT replaces the previous handler — it does not compose. When both the protected-path downgrade AND label_actions processing fire, the MODIFIED_RESULT temp file is never cleaned up.
Remediation: Combine both cleanups into a single trap or use a cleanup function that tracks all temp files.
[GHA-workflow-command-injection] internal/scaffold/fullsend-repo/scripts/post-review.sh:155 — LA_LABEL is extracted via jq -r which decodes JSON escape sequences (e.g., \n → literal newline). A crafted label name containing embedded newlines could split the echo "::warning::..." output across lines, potentially injecting GHA workflow commands (though the regex check provides partial mitigation and dangerous commands like ::set-env:: are disabled by GitHub). The same pattern affects LA_ACTION in the *) fallthrough case.
Remediation: Sanitize LA_LABEL and LA_ACTION immediately after jq -r extraction — strip newlines, carriage returns, and :: sequences before any use in echo statements.

Low

[incomplete-change] docs/agents/triage.md:72 — The implementation plan (Task 5, Step 4) explicitly calls out updating docs/agents/triage.md to align the example skill description with the generalized language, and the commit step includes it in git add. The actual diff does not modify the file. Either update triage.md or remove it from the plan's commit step.
[test-integrity] internal/scaffold/fullsend-repo/scripts/post-review-test.sh:104 — REVIEW_CONTROL_LABELS array and is_control_label() function are duplicated between the test file and post-review.sh. If a new control label is added to one but not the other, the test will pass while the production code diverges.
[edge-case] internal/scaffold/fullsend-repo/scripts/post-review.sh:163 — EXISTING_LABELS is populated via gh api ... || true. If the API call fails (rate limit, network), the || true swallows the error and label_exists() returns false for every label, causing all add actions to be skipped with a misleading "does not exist" warning. Consistent with triage post-script behavior.
[defense-in-depth] internal/scaffold/fullsend-repo/skills/issue-labels/SKILL.md:10 — Removing the hardcoded control-label list changes behavior: agents will now recommend control labels that get silently refused server-side, wasting tokens and adding log noise. The server-side guard is the correct primary control, but the in-prompt list was a useful efficiency layer. See also: [Authorization/RBAC] info finding at this location.
[control-label-guard] internal/scaffold/fullsend-repo/scripts/post-review.sh:135 — REVIEW_CONTROL_LABELS guards only review pipeline labels but not triage pipeline labels. Impact is minimal since triage control labels target issues not PRs, but the design choice to separate lists could be documented.
[doc-style] docs/agents/review.md:10 — Uses double-hyphen (--) while triage.md uses em-dash (—). Minor typographic inconsistency.
[consumer-compatibility] internal/scaffold/fullsend-repo/schemas/review-result.schema.json — External consumers of review-result JSON should be aware that label_actions may now be present. No action needed if all consumers are internal to this repo.

Info

[Authorization/RBAC] internal/scaffold/fullsend-repo/skills/issue-labels/SKILL.md — Removing the explicit control-label exclusion list from the skill instructions means the LLM agent has no awareness of which labels are forbidden, relying entirely on server-side filtering. This is a defense-in-depth reduction.
[scope-alignment] internal/scaffold/fullsend-repo/agents/review.md:111 — The new "Contextual labels" section says "After producing the review verdict" but doesn't specify when in the orchestrator workflow (which uses parallel sub-agents) the issue-labels skill should be invoked.
[scope-alignment] docs/agents/review.md:9 — "may also apply" is ambiguous about whether contextual labeling is automatic or configurable.
[intent-alignment] docs/superpowers/plans/2026-06-11-review-agent-contextual-labels.md — The docs/superpowers/plans/ directory is not referenced in CLAUDE.md or documented architecture.

ralphbean

LGTM. Pushed 39e19c9 with three fixes:

Reverted triage.md example wording to stay issue-specific
Added trap for the label temp file
Added integration tests for label_actions processing (10 cases)

The test helpers intentionally export variables inside subshells for isolation. Shellcheck flags these as accidental — disable the warnings. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>

fullsend-ai-review · 2026-06-11T21:16:04Z

🤖 Finished Review · ✅ Success · Started 9:16 PM UTC · Completed 9:29 PM UTC
Commit: 397115d · View workflow run →

ralphbean · 2026-06-12T01:24:50Z

Manual testing against `appdumpster/test-repo`

Ran fullsend run review three times against appdumpster/test-repo#33 (a deliberately buggy Python web server).

What works

Step	Run 1	Run 2	Run 3
Agent produces `label_actions`	`type/feature`, `needs-info`	`enhancement`	`enhancement`, `bug`
Schema validation passes with `label_actions`	✅	✅	✅
Post-script validates labels (character check, control-label guard, repo existence check)	✅	✅	✅
Label reason appended to review body	✅	n/a (fullsend not on PATH)	✅

All three runs chose real labels from the repo's label list. The issue-labels skill correctly ran gh label list, researched conventions via sub-agent, and emitted valid label_actions.

What doesn't work (unrelated to this PR)

fullsend post-review returns 422 when using a PAT — it submits PR reviews via the REST API which requires a GitHub App installation token. The post-script exits at line 255 before reaching the label-application calls at line 306. This is the expected failure mode for local testing with a PAT — it would work in CI with the actual GitHub App token.

The label validation and body-append logic (lines 157-212) runs successfully before that exit. The actual API calls at lines 306-318 are trivial gh api wrappers already covered by the unit tests in post-review-test.sh.

Review comment posted

The review comment on #33 shows the label reason appended at the bottom: **Labels:** PR adds a new feature (web server) and needs more information...

Sanitize LA_LABEL and LA_ACTION after jq -r extraction by stripping newlines, carriage returns, and GHA workflow command delimiters (::). This prevents command injection via crafted label names that embed GHA workflow commands after a JSON-decoded newline. Replace per-tempfile trap EXIT handlers with a CLEANUP_FILES array and a single composed trap. Bash traps don't compose — the second trap was silently replacing the first, leaking MODIFIED_RESULT when both protected-path downgrade and label_actions processing fired. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>

fullsend-ai-review · 2026-06-12T16:08:17Z

🤖 Finished Review · ✅ Success · Started 4:08 PM UTC · Completed 4:24 PM UTC
Commit: 7fdab7e · View workflow run →

fullsend-ai-review · 2026-06-12T16:23:56Z

+  LABEL_COUNT=$(jq '.label_actions.actions | length' "${RESULT_FILE}")
+
+  echo "Validating ${LABEL_COUNT} label action(s)..."
+


[low] edge-case

EXISTING_LABELS is populated via gh api ... || true. If the API call fails (rate limit, network error), label_exists() returns false for every label, causing all add actions to be silently skipped with a misleading does not exist warning. There is no indication that the label-list fetch itself failed.

fullsend-ai-review · 2026-06-12T16:23:56Z


+# ---------------------------------------------------------------------------
+# Control-label guard tests
+# ---------------------------------------------------------------------------


[low] test-integrity

REVIEW_CONTROL_LABELS array and is_control_label() function are duplicated between the test file and post-review.sh. If a new control label is added to one but not the other, the unit tests will pass while the production code diverges. The integration tests (which run the real script) partially mitigate this.

fullsend-ai-review · 2026-06-12T16:23:56Z

-Recommend contextual labels for the issue being triaged. These are labels that
-describe the issue's domain, area, priority, or other team-specific dimensions
-- NOT control labels used by the triage pipeline.
+Recommend contextual labels for the issue or pull request being processed.


[low] defense-in-depth

Removing the hardcoded control-label list from the skill means the LLM agent has no awareness of which labels are forbidden. It will recommend control labels that get refused server-side, wasting tokens and API calls. The server-side guard is the correct primary enforcement point.

ralphbean added 7 commits June 11, 2026 15:40

github-actions Bot deployed to site-preview June 11, 2026 20:30 View deployment

ralphbean commented Jun 11, 2026

View reviewed changes

Comment thread docs/agents/triage.md Outdated

github-actions Bot deployed to site-preview June 11, 2026 20:46 View deployment

fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label Jun 11, 2026

ralphbean commented Jun 11, 2026

View reviewed changes

github-actions Bot deployed to site-preview June 11, 2026 21:14 View deployment

fullsend-ai-review Bot added requires-manual-review Review requires human judgment and removed requires-manual-review Review requires human judgment labels Jun 11, 2026

ralphbean mentioned this pull request Jun 12, 2026

feat: add simple web server with query handling appdumpster/test-repo#33

Closed

github-actions Bot deployed to site-preview June 12, 2026 16:04 View deployment

fullsend-ai-review Bot approved these changes Jun 12, 2026

View reviewed changes

fullsend-ai-review Bot removed the requires-manual-review Review requires human judgment label Jun 12, 2026

fullsend-ai-review Bot added the ready-for-merge All reviewers approved — ready to merge label Jun 12, 2026

		LABEL_COUNT=$(jq '.label_actions.actions \| length' "${RESULT_FILE}")

		echo "Validating ${LABEL_COUNT} label action(s)..."

Conversation

ralphbean commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Site preview

Uh oh!

fullsend-ai-review Bot commented Jun 11, 2026

Uh oh!

codecov Bot commented Jun 11, 2026

Codecov Report

Uh oh!

Uh oh!

fullsend-ai-review Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fullsend-ai-review Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Findings

Low

Info

Review

Findings

Medium

Low

Info

Review

Findings

Medium

Low

Info

Review

Findings

Medium

Low

Info

Uh oh!

ralphbean left a comment

Choose a reason for hiding this comment

Uh oh!

fullsend-ai-review Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ralphbean commented Jun 12, 2026

Manual testing against appdumpster/test-repo

What works

What doesn't work (unrelated to this PR)

Review comment posted

Uh oh!

fullsend-ai-review Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fullsend-ai-review Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

fullsend-ai-review Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

fullsend-ai-review Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ralphbean commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading

fullsend-ai-review Bot commented Jun 11, 2026 •

edited

Loading

fullsend-ai-review Bot commented Jun 11, 2026 •

edited

Loading

fullsend-ai-review Bot commented Jun 11, 2026 •

edited

Loading

Manual testing against `appdumpster/test-repo`

fullsend-ai-review Bot commented Jun 12, 2026 •

edited

Loading