Skip to content

feat: review agent applies contextual labels via issue-labels skill (#1706)#2196

Open
ralphbean wants to merge 10 commits into
mainfrom
feat/1706-review-agent-labels
Open

feat: review agent applies contextual labels via issue-labels skill (#1706)#2196
ralphbean wants to merge 10 commits into
mainfrom
feat/1706-review-agent-labels

Conversation

@ralphbean

@ralphbean ralphbean commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

  • Generalize the issue-labels skill to work for both issues and PRs (remove hardcoded control-label list, reword triage-specific language)
  • Add optional label_actions field to the review result schema (same shape as triage)
  • Add label_actions processing to post-review.sh — validates labels against a control-label guard, appends reason to review body, applies via GitHub labels API
  • Wire issue-labels into the review agent harness and definition
  • Update user-facing docs (review agent docs, skills table, triage docs alignment)

Closes #1706

Test plan

  • Schema validation tests pass (8 cases: 4 valid, 4 invalid)
  • Post-review control-label guard tests pass (19 total: 9 existing + 10 new)
  • make lint passes
  • Manual: fullsend run review against appdumpster/test-repo#33 — agent produces label_actions with valid repo labels, schema validates, post-script validates and appends label reason to review body (3 runs, consistent results — see comment)

🤖 Generated with Claude Code

Generalize the issue-labels skill to work for both triage and review
agents, then wire it into the review agent's harness, schema, agent
definition, and post-script.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Six tasks covering skill generalization, schema extension, post-script
label processing, harness/agent wiring, and user-facing documentation.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Remove hardcoded control-label exclusion list (post-scripts enforce
this server-side) and reword triage-specific language to be
agent-agnostic. Add note to skip issue-type check for PRs.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Same shape as triage-result.schema.json. The field is optional --
when omitted the post-script skips label processing.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Validate agent-recommended labels against a control-label guard list,
check label existence, append reason to review body, and apply
mutations via the GitHub labels API after posting.

Mirrors the label_actions processing in post-triage.sh.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Add issue-labels to the harness skills list and agent definition.
Document when and how to invoke the skill during review, and add
label_actions to the pipeline mode output docs.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
Add issue-labels skill section to review agent docs, update the
built-in skills table, and align triage docs example with the
generalized skill language.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Site preview

Preview: https://ccafaa30-site.fullsend-ai.workers.dev

Commit: 7fdab7e668efef8a50024519c80971e275aaf491

@fullsend-ai-review

Copy link
Copy Markdown

🤖 Review · Started 8:31 PM UTC
Commit: 0ec7546 · View workflow run →

@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment thread docs/agents/triage.md Outdated
- Revert triage.md example wording to stay issue-specific (triage
  agent doesn't process PRs)
- Add trap for LABEL_MODIFIED_RESULT temp file cleanup in post-review.sh
- Add integration tests for label_actions processing in
  post-review-test.sh (10 cases covering: applied, control-label
  refused, nonexistent skipped, invalid chars refused, remove,
  multiple add, all-refused no body append, absent, request-changes)

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 11, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 8:48 PM UTC · Completed 9:01 PM UTC
Commit: 39e19c9 · View workflow run →

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review

Findings

Low

  • [incomplete-change] docs/agents/triage.md:72 — The implementation plan (Task 5, Step 4) explicitly calls for updating docs/agents/triage.md to generalize the example skill description, and includes it in the commit step. The actual diff does not modify triage.md. Either apply the planned changes or remove triage.md from the plan's commit step.

  • [edge-case] internal/scaffold/fullsend-repo/scripts/post-review.sh:168EXISTING_LABELS is populated via gh api ... || true. If the API call fails (rate limit, network error), label_exists() returns false for every label, causing all add actions to be silently skipped with a misleading "does not exist" warning. There is no indication that the label-list fetch itself failed.

  • [test-integrity] internal/scaffold/fullsend-repo/scripts/post-review-test.sh:104REVIEW_CONTROL_LABELS array and is_control_label() function are duplicated between the test file and post-review.sh. If a new control label is added to one but not the other, the unit tests will pass while the production code diverges. The integration tests (which run the real script) partially mitigate this.

  • [defense-in-depth] internal/scaffold/fullsend-repo/skills/issue-labels/SKILL.md:10 — Removing the hardcoded control-label list from the skill means the LLM agent has no awareness of which labels are forbidden. It will recommend control labels that get refused server-side, wasting tokens and API calls. The server-side guard is the correct primary enforcement point.

Info

  • [scope-alignment] docs/agents/review.md:9 — "may also apply" is ambiguous about whether contextual labeling is automatic or configurable. The internal agent definition is clearer ("invoke the issue-labels skill"); user-facing docs should match.

  • [intent-alignment] docs/superpowers/plans/2026-06-11-review-agent-contextual-labels.md — The docs/superpowers/ directory is used for implementation plans and design specs but this directory structure is not documented in CLAUDE.md, AGENTS.md, or README.md.

Previous run

Review

Findings

Medium

  • [GHA-workflow-command-injection] internal/scaffold/fullsend-repo/scripts/post-review.sh:167LA_LABEL is extracted via jq -r which decodes JSON escape sequences (\n becomes a real newline). When the regex rejects the label, the unvalidated value is interpolated into echo "::warning::Refused label '${LA_LABEL}'...". A crafted label containing a JSON-encoded newline followed by a GHA workflow command would inject that command into the GHA log stream. The same vector applies to LA_ACTION at lines 172 and 188, which is never regex-validated at all.
    Remediation: Sanitize both LA_LABEL and LA_ACTION before any echo that emits GHA workflow commands. Strip newlines, carriage returns, and :: sequences before interpolation.

  • [error-handling] internal/scaffold/fullsend-repo/scripts/post-review.sh:217 — The new trap 'rm -f "${LABEL_MODIFIED_RESULT}"' EXIT overwrites the existing trap 'rm -f "${MODIFIED_RESULT}"' EXIT set at line 124. In bash, setting a new trap ... EXIT replaces the previous handler — it does not compose. When both the protected-path downgrade AND label_actions processing fire, the MODIFIED_RESULT temp file is never cleaned up.
    Remediation: Compose the trap handlers so both temp files are cleaned up. Use a single trap that references both variables, or accumulate temp files in an array and clean them all in one trap.

Low

  • [incomplete-change] docs/agents/triage.md:72 — The implementation plan (Task 5, Step 4) explicitly calls for updating docs/agents/triage.md to generalize the example skill description, and includes it in the commit step. The actual diff does not modify triage.md. Either apply the planned changes or remove triage.md from the plan's commit step.

  • [edge-case] internal/scaffold/fullsend-repo/scripts/post-review.sh:163EXISTING_LABELS is populated via gh api ... || true. If the API call fails (rate limit, network error), label_exists() returns false for every label, causing all add actions to be silently skipped with a misleading "does not exist" warning. There is no indication that the label-list fetch itself failed.

  • [test-integrity] internal/scaffold/fullsend-repo/scripts/post-review-test.sh:104REVIEW_CONTROL_LABELS array and is_control_label() function are duplicated between the test file and post-review.sh. If a new control label is added to one but not the other, the unit tests will pass while the production code diverges. The integration tests (which run the real script) partially mitigate this.

  • [defense-in-depth] internal/scaffold/fullsend-repo/skills/issue-labels/SKILL.md:10 — Removing the hardcoded control-label list from the skill means the LLM agent has no awareness of which labels are forbidden. It will recommend control labels that get refused server-side, wasting tokens and API calls. The server-side guard is the correct primary enforcement point. See also: [Authorization/RBAC] info finding at this location.

Info

  • [scope-alignment] docs/agents/review.md:9 — "may also apply" is ambiguous about whether contextual labeling is automatic or configurable. The internal agent definition is clearer ("invoke the issue-labels skill"); user-facing docs should match.

  • [intent-alignment] docs/superpowers/plans/2026-06-11-review-agent-contextual-labels.md — The docs/superpowers/ directory is used for implementation plans and design specs but this directory structure is not documented in CLAUDE.md, AGENTS.md, or README.md.

Previous run

Review

Findings

Medium

  • [GHA-workflow-command-injection] internal/scaffold/fullsend-repo/scripts/post-review.sh:167LA_LABEL is extracted via jq -r which decodes JSON escape sequences (\n becomes a real newline). When the regex rejects the label, the unvalidated value is interpolated into echo "::warning::Refused label '${LA_LABEL}'...". A crafted label containing a JSON-encoded newline followed by a GHA workflow command would inject that command into the GHA log stream. The same vector applies to LA_ACTION at lines 172 and 188, which is never regex-validated at all.
    Remediation: Sanitize both LA_LABEL and LA_ACTION before any echo that emits GHA workflow commands. Strip newlines, carriage returns, and :: sequences before interpolation.

  • [error-handling] internal/scaffold/fullsend-repo/scripts/post-review.sh:217 — The new trap 'rm -f "${LABEL_MODIFIED_RESULT}"' EXIT overwrites the existing trap 'rm -f "${MODIFIED_RESULT}"' EXIT set at line 124. In bash, setting a new trap ... EXIT replaces the previous handler — it does not compose. When both the protected-path downgrade AND label_actions processing fire, the MODIFIED_RESULT temp file is never cleaned up.
    Remediation: Compose the trap handlers so both temp files are cleaned up. Use a single trap that references both variables, or accumulate temp files in an array and clean them all in one trap.

Low

  • [incomplete-change] docs/agents/triage.md:72 — The implementation plan (Task 5, Step 4) explicitly calls for updating docs/agents/triage.md to generalize the example skill description, and includes it in the commit step. The actual diff does not modify triage.md. Either apply the planned changes or remove triage.md from the plan's commit step.

  • [edge-case] internal/scaffold/fullsend-repo/scripts/post-review.sh:163EXISTING_LABELS is populated via gh api ... || true. If the API call fails (rate limit, network error), label_exists() returns false for every label, causing all add actions to be silently skipped with a misleading "does not exist" warning. There is no indication that the label-list fetch itself failed.

  • [test-integrity] internal/scaffold/fullsend-repo/scripts/post-review-test.sh:104REVIEW_CONTROL_LABELS array and is_control_label() function are duplicated between the test file and post-review.sh. If a new control label is added to one but not the other, the unit tests will pass while the production code diverges. The integration tests (which run the real script) partially mitigate this.

  • [defense-in-depth] internal/scaffold/fullsend-repo/skills/issue-labels/SKILL.md:10 — Removing the hardcoded control-label list from the skill means the LLM agent has no awareness of which labels are forbidden. It will recommend control labels that get refused server-side, wasting tokens and API calls. The server-side guard is the correct primary enforcement point. See also: [Authorization/RBAC] info finding at this location.

Info

  • [scope-alignment] docs/agents/review.md:9 — "may also apply" is ambiguous about whether contextual labeling is automatic or configurable. The internal agent definition is clearer ("invoke the issue-labels skill"); user-facing docs should match.

  • [intent-alignment] docs/superpowers/plans/2026-06-11-review-agent-contextual-labels.md — The docs/superpowers/ directory is used for implementation plans and design specs but this directory structure is not documented in CLAUDE.md, AGENTS.md, or README.md.

Previous run (2)

Review

Findings

Medium

  • [error-handling] internal/scaffold/fullsend-repo/scripts/post-review.sh:217 — The new trap 'rm -f "${LABEL_MODIFIED_RESULT}"' EXIT overwrites the existing trap 'rm -f "${MODIFIED_RESULT}"' EXIT set at line 124 for the protected-path downgrade. In bash, setting a new trap ... EXIT replaces the previous handler — it does not compose. When both the protected-path downgrade AND label_actions processing fire, the MODIFIED_RESULT temp file is never cleaned up.
    Remediation: Combine both cleanups into a single trap or use a cleanup function that tracks all temp files.

  • [GHA-workflow-command-injection] internal/scaffold/fullsend-repo/scripts/post-review.sh:155LA_LABEL is extracted via jq -r which decodes JSON escape sequences (e.g., \n → literal newline). A crafted label name containing embedded newlines could split the echo "::warning::..." output across lines, potentially injecting GHA workflow commands (though the regex check provides partial mitigation and dangerous commands like ::set-env:: are disabled by GitHub). The same pattern affects LA_ACTION in the *) fallthrough case.
    Remediation: Sanitize LA_LABEL and LA_ACTION immediately after jq -r extraction — strip newlines, carriage returns, and :: sequences before any use in echo statements.

Low

  • [incomplete-change] docs/agents/triage.md:72 — The implementation plan (Task 5, Step 4) explicitly calls out updating docs/agents/triage.md to align the example skill description with the generalized language, and the commit step includes it in git add. The actual diff does not modify the file. Either update triage.md or remove it from the plan's commit step.

  • [test-integrity] internal/scaffold/fullsend-repo/scripts/post-review-test.sh:104REVIEW_CONTROL_LABELS array and is_control_label() function are duplicated between the test file and post-review.sh. If a new control label is added to one but not the other, the test will pass while the production code diverges.

  • [edge-case] internal/scaffold/fullsend-repo/scripts/post-review.sh:163EXISTING_LABELS is populated via gh api ... || true. If the API call fails (rate limit, network), the || true swallows the error and label_exists() returns false for every label, causing all add actions to be skipped with a misleading "does not exist" warning. Consistent with triage post-script behavior.

  • [defense-in-depth] internal/scaffold/fullsend-repo/skills/issue-labels/SKILL.md:10 — Removing the hardcoded control-label list changes behavior: agents will now recommend control labels that get silently refused server-side, wasting tokens and adding log noise. The server-side guard is the correct primary control, but the in-prompt list was a useful efficiency layer. See also: [Authorization/RBAC] info finding at this location.

  • [control-label-guard] internal/scaffold/fullsend-repo/scripts/post-review.sh:135REVIEW_CONTROL_LABELS guards only review pipeline labels but not triage pipeline labels. Impact is minimal since triage control labels target issues not PRs, but the design choice to separate lists could be documented.

  • [doc-style] docs/agents/review.md:10 — Uses double-hyphen (--) while triage.md uses em-dash (). Minor typographic inconsistency.

  • [consumer-compatibility] internal/scaffold/fullsend-repo/schemas/review-result.schema.json — External consumers of review-result JSON should be aware that label_actions may now be present. No action needed if all consumers are internal to this repo.

Info

  • [Authorization/RBAC] internal/scaffold/fullsend-repo/skills/issue-labels/SKILL.md — Removing the explicit control-label exclusion list from the skill instructions means the LLM agent has no awareness of which labels are forbidden, relying entirely on server-side filtering. This is a defense-in-depth reduction.

  • [scope-alignment] internal/scaffold/fullsend-repo/agents/review.md:111 — The new "Contextual labels" section says "After producing the review verdict" but doesn't specify when in the orchestrator workflow (which uses parallel sub-agents) the issue-labels skill should be invoked.

  • [scope-alignment] docs/agents/review.md:9 — "may also apply" is ambiguous about whether contextual labeling is automatic or configurable.

  • [intent-alignment] docs/superpowers/plans/2026-06-11-review-agent-contextual-labels.md — The docs/superpowers/plans/ directory is not referenced in CLAUDE.md or documented architecture.

@fullsend-ai-review fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label Jun 11, 2026

@ralphbean ralphbean left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Pushed 39e19c9 with three fixes:

  • Reverted triage.md example wording to stay issue-specific
  • Added trap for the label temp file
  • Added integration tests for label_actions processing (10 cases)

The test helpers intentionally export variables inside subshells for
isolation. Shellcheck flags these as accidental — disable the warnings.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 11, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 9:16 PM UTC · Completed 9:29 PM UTC
Commit: 397115d · View workflow run →

@fullsend-ai-review fullsend-ai-review Bot added requires-manual-review Review requires human judgment and removed requires-manual-review Review requires human judgment labels Jun 11, 2026
@ralphbean

Copy link
Copy Markdown
Member Author

Manual testing against appdumpster/test-repo

Ran fullsend run review three times against appdumpster/test-repo#33 (a deliberately buggy Python web server).

What works

Step Run 1 Run 2 Run 3
Agent produces label_actions type/feature, needs-info enhancement enhancement, bug
Schema validation passes with label_actions
Post-script validates labels (character check, control-label guard, repo existence check)
Label reason appended to review body n/a (fullsend not on PATH)

All three runs chose real labels from the repo's label list. The issue-labels skill correctly ran gh label list, researched conventions via sub-agent, and emitted valid label_actions.

What doesn't work (unrelated to this PR)

fullsend post-review returns 422 when using a PAT — it submits PR reviews via the REST API which requires a GitHub App installation token. The post-script exits at line 255 before reaching the label-application calls at line 306. This is the expected failure mode for local testing with a PAT — it would work in CI with the actual GitHub App token.

The label validation and body-append logic (lines 157-212) runs successfully before that exit. The actual API calls at lines 306-318 are trivial gh api wrappers already covered by the unit tests in post-review-test.sh.

Review comment posted

The review comment on #33 shows the label reason appended at the bottom: **Labels:** PR adds a new feature (web server) and needs more information...

Sanitize LA_LABEL and LA_ACTION after jq -r extraction by stripping
newlines, carriage returns, and GHA workflow command delimiters (::).
This prevents command injection via crafted label names that embed
GHA workflow commands after a JSON-decoded newline.

Replace per-tempfile trap EXIT handlers with a CLEANUP_FILES array
and a single composed trap. Bash traps don't compose — the second
trap was silently replacing the first, leaking MODIFIED_RESULT when
both protected-path downgrade and label_actions processing fired.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Ralph Bean <rbean@redhat.com>
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 12, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 4:08 PM UTC · Completed 4:24 PM UTC
Commit: 7fdab7e · View workflow run →

LABEL_COUNT=$(jq '.label_actions.actions | length' "${RESULT_FILE}")

echo "Validating ${LABEL_COUNT} label action(s)..."

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] edge-case

EXISTING_LABELS is populated via gh api ... || true. If the API call fails (rate limit, network error), label_exists() returns false for every label, causing all add actions to be silently skipped with a misleading does not exist warning. There is no indication that the label-list fetch itself failed.


# ---------------------------------------------------------------------------
# Control-label guard tests
# ---------------------------------------------------------------------------

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] test-integrity

REVIEW_CONTROL_LABELS array and is_control_label() function are duplicated between the test file and post-review.sh. If a new control label is added to one but not the other, the unit tests will pass while the production code diverges. The integration tests (which run the real script) partially mitigate this.

Recommend contextual labels for the issue being triaged. These are labels that
describe the issue's domain, area, priority, or other team-specific dimensions
-- NOT control labels used by the triage pipeline.
Recommend contextual labels for the issue or pull request being processed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] defense-in-depth

Removing the hardcoded control-label list from the skill means the LLM agent has no awareness of which labels are forbidden. It will recommend control labels that get refused server-side, wasting tokens and API calls. The server-side guard is the correct primary enforcement point.

@fullsend-ai-review fullsend-ai-review Bot removed the requires-manual-review Review requires human judgment label Jun 12, 2026
@fullsend-ai-review fullsend-ai-review Bot added the ready-for-merge All reviewers approved — ready to merge label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge All reviewers approved — ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Review agent: apply contextual labels to PRs using the issue-labels skill

1 participant