feat: review agent applies contextual labels via issue-labels skill (#1706)#2196
feat: review agent applies contextual labels via issue-labels skill (#1706)#2196ralphbean wants to merge 10 commits into
Conversation
Generalize the issue-labels skill to work for both triage and review agents, then wire it into the review agent's harness, schema, agent definition, and post-script. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
Six tasks covering skill generalization, schema extension, post-script label processing, harness/agent wiring, and user-facing documentation. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
Remove hardcoded control-label exclusion list (post-scripts enforce this server-side) and reword triage-specific language to be agent-agnostic. Add note to skip issue-type check for PRs. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
Same shape as triage-result.schema.json. The field is optional -- when omitted the post-script skips label processing. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
Validate agent-recommended labels against a control-label guard list, check label existence, append reason to review body, and apply mutations via the GitHub labels API after posting. Mirrors the label_actions processing in post-triage.sh. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
Add issue-labels to the harness skills list and agent definition. Document when and how to invoke the skill during review, and add label_actions to the pipeline mode output docs. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
Add issue-labels skill section to review agent docs, update the built-in skills table, and align triage docs example with the generalized skill language. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
Site previewPreview: https://ccafaa30-site.fullsend-ai.workers.dev Commit: |
|
🤖 Review · Started 8:31 PM UTC |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
- Revert triage.md example wording to stay issue-specific (triage agent doesn't process PRs) - Add trap for LABEL_MODIFIED_RESULT temp file cleanup in post-review.sh - Add integration tests for label_actions processing in post-review-test.sh (10 cases covering: applied, control-label refused, nonexistent skipped, invalid chars refused, remove, multiple add, all-refused no body append, absent, request-changes) Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
|
🤖 Finished Review · ✅ Success · Started 8:48 PM UTC · Completed 9:01 PM UTC |
ReviewFindingsLow
Info
Previous runReviewFindingsMedium
Low
Info
Previous runReviewFindingsMedium
Low
Info
Previous run (2)ReviewFindingsMedium
Low
Info
|
The test helpers intentionally export variables inside subshells for isolation. Shellcheck flags these as accidental — disable the warnings. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
|
🤖 Finished Review · ✅ Success · Started 9:16 PM UTC · Completed 9:29 PM UTC |
Manual testing against
|
| Step | Run 1 | Run 2 | Run 3 |
|---|---|---|---|
Agent produces label_actions |
type/feature, needs-info |
enhancement |
enhancement, bug |
Schema validation passes with label_actions |
✅ | ✅ | ✅ |
| Post-script validates labels (character check, control-label guard, repo existence check) | ✅ | ✅ | ✅ |
| Label reason appended to review body | ✅ | n/a (fullsend not on PATH) | ✅ |
All three runs chose real labels from the repo's label list. The issue-labels skill correctly ran gh label list, researched conventions via sub-agent, and emitted valid label_actions.
What doesn't work (unrelated to this PR)
fullsend post-review returns 422 when using a PAT — it submits PR reviews via the REST API which requires a GitHub App installation token. The post-script exits at line 255 before reaching the label-application calls at line 306. This is the expected failure mode for local testing with a PAT — it would work in CI with the actual GitHub App token.
The label validation and body-append logic (lines 157-212) runs successfully before that exit. The actual API calls at lines 306-318 are trivial gh api wrappers already covered by the unit tests in post-review-test.sh.
Review comment posted
The review comment on #33 shows the label reason appended at the bottom: **Labels:** PR adds a new feature (web server) and needs more information...
Sanitize LA_LABEL and LA_ACTION after jq -r extraction by stripping newlines, carriage returns, and GHA workflow command delimiters (::). This prevents command injection via crafted label names that embed GHA workflow commands after a JSON-decoded newline. Replace per-tempfile trap EXIT handlers with a CLEANUP_FILES array and a single composed trap. Bash traps don't compose — the second trap was silently replacing the first, leaking MODIFIED_RESULT when both protected-path downgrade and label_actions processing fired. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Ralph Bean <rbean@redhat.com>
|
🤖 Finished Review · ✅ Success · Started 4:08 PM UTC · Completed 4:24 PM UTC |
| LABEL_COUNT=$(jq '.label_actions.actions | length' "${RESULT_FILE}") | ||
|
|
||
| echo "Validating ${LABEL_COUNT} label action(s)..." | ||
|
|
There was a problem hiding this comment.
[low] edge-case
EXISTING_LABELS is populated via gh api ... || true. If the API call fails (rate limit, network error), label_exists() returns false for every label, causing all add actions to be silently skipped with a misleading does not exist warning. There is no indication that the label-list fetch itself failed.
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Control-label guard tests | ||
| # --------------------------------------------------------------------------- |
There was a problem hiding this comment.
[low] test-integrity
REVIEW_CONTROL_LABELS array and is_control_label() function are duplicated between the test file and post-review.sh. If a new control label is added to one but not the other, the unit tests will pass while the production code diverges. The integration tests (which run the real script) partially mitigate this.
| Recommend contextual labels for the issue being triaged. These are labels that | ||
| describe the issue's domain, area, priority, or other team-specific dimensions | ||
| -- NOT control labels used by the triage pipeline. | ||
| Recommend contextual labels for the issue or pull request being processed. |
There was a problem hiding this comment.
[low] defense-in-depth
Removing the hardcoded control-label list from the skill means the LLM agent has no awareness of which labels are forbidden. It will recommend control labels that get refused server-side, wasting tokens and API calls. The server-side guard is the correct primary enforcement point.
Summary
issue-labelsskill to work for both issues and PRs (remove hardcoded control-label list, reword triage-specific language)label_actionsfield to the review result schema (same shape as triage)post-review.sh— validates labels against a control-label guard, appends reason to review body, applies via GitHub labels APIissue-labelsinto the review agent harness and definitionCloses #1706
Test plan
make lintpassesfullsend run reviewagainstappdumpster/test-repo#33— agent produceslabel_actionswith valid repo labels, schema validates, post-script validates and appends label reason to review body (3 runs, consistent results — see comment)🤖 Generated with Claude Code