feat(notifications): preview-deploy screenshot pipeline (provider-agnostic)#241
feat(notifications): preview-deploy screenshot pipeline (provider-agnostic)#241isadeks wants to merge 24 commits into
Conversation
… yet) Lambda + AgentCore Browser plumbing for capturing screenshots of preview deployments. Provider-agnostic — listens for GitHub deployment_status events from any source (Vercel, Amplify Hosting, Netlify, GitHub Actions custom CD). This commit lands the handler / construct code only. Stack wiring follows in the next commit.
- New `GitHubScreenshotIntegration` construct (mirrors `LinearIntegration`): bundles the screenshot bucket, dedup table, signing-secret placeholder, receiver Lambda, processor Lambda, and the API Gateway route. cdk-nag suppressions added inline (HMAC auth instead of Cognito; AgentCore Browser sessions have no per-resource ARN; Secrets Manager rotation is owned by GitHub). - Wired into `agent.ts` after the LinearIntegration block. Reuses the existing `githubTokenSecret` (the processor uses ABCA's main GitHub token to look up which PR a deploy SHA belongs to and post the screenshot comment — no new credential). - Three new stack outputs: `GitHubWebhookUrl`, `GitHubWebhookSecretArn`, `ScreenshotBucketName`. - Bumped agent.test.ts table count from 13 to 14 to account for the new dedup table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ot bucket cdk-nag's S2 fires on any bucket that has `blockPublicPolicy: false` even when the policy is intentionally permissive. Add the suppression with the same rationale as S1/S5 — public reads are required by GitHub Markdown renderers and Linear `imageUploadFromUrl`, and the read grant is prefix-scoped to `screenshots/*`. Caught when the first deploy attempt aborted at synth-time on the new GitHubScreenshotIntegration construct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first deploy attempt failed at CFN-execute time on the bucket policy: s3:PutBucketPolicy ... because public policies are prevented by the BlockPublicPolicy setting in S3 Block Public Access. Account-level Block Public Access is on for this AWS account, which overrides per-bucket BPA settings. Disabling it would change the security posture of the whole account, so route around the constraint with the AWS-recommended pattern: private S3 + CloudFront with Origin Access Control. Changes: - `ScreenshotBucket` is now `BLOCK_ALL` BPA, no public bucket policy. Adds a `cloudfront.Distribution` whose origin is the bucket via `S3BucketOrigin.withOriginAccessControl`. The distribution policy is scoped to the CloudFront service principal only, so account-level BPA accepts it. - Processor reads `SCREENSHOT_PUBLIC_HOST` (the CloudFront domain) instead of building an S3 URL. PR comments now embed `https://<dist>.cloudfront.net/screenshots/...` URLs. - New stack output `ScreenshotCloudFrontDomain`. - Bucket-level S2/S5 suppressions removed (no longer applicable — bucket is private). Distribution gets CFR1/CFR2/CFR3/CFR4/CFR7 suppressions with rationales. Heads up on deploy time: CloudFront distributions take 5-15 min to provision on first create. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CommonRuleSet was 403'ing GitHub deployment_status webhooks before the request reached our Lambda — the deployment payload contains absolute Vercel preview URLs in the body, which trips GenericRFI_BODY. Mirror the Linear webhook exemption: the GitHub webhook path is HMAC-verified in the Lambda, parsed as strict JSON, never interpolated into SQL/HTML, and rate-limited by the priority-3 rule. CRS still applies to every other route. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…loyment
GitHub's `deployment_status` webhook puts the deployed URL on the
*status* object, not the deployment itself. The deployment object is
immutable per (sha, environment); the status changes through the
deploy lifecycle (`pending` → `success`) and carries the URL only
once the deploy finishes.
Symptom: receiver kept short-circuiting `success` events from Vercel
with `{ok: true, skipped_no_url: true}` because we read the wrong
field. Verified by inspecting the webhook delivery payload via
`gh api .../deliveries/<id> --jq .request.payload.deployment_status` —
URL was there all along.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dshake
Node 24's global WebSocket (from undici) does NOT support arbitrary
HTTP headers on the upgrade request — passing them as the second arg
gets silently ignored. AgentCore Browser's WSS handshake requires
SigV4-signed Authorization + X-Amz-* headers, so the connection was
opening but then getting rejected, which surfaced as an empty
`error` event ("AgentCore Browser WebSocket error: ").
Switch to the `ws` package which natively supports `options.headers`.
Also add an `unexpected-response` handler so HTTP-level handshake
failures (403, 400) surface with status codes instead of empty errors.
Smoke verified locally — the ws-based path opens cleanly against
example.com and Vercel preview URLs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lambda runtime returned a 403 on the WSS upgrade despite well-formed SigV4 headers — `ws` rewrites the Host header during the upgrade GET, which invalidates the canonical-request signature we computed against the original Host. This works locally because Node's tooling on macOS keeps the original Host through the handshake, but the Lambda runtime's TLS stack normalizes differently. Switch to query-parameter SigV4 (presigned URL): SignatureV4.presign returns a wss://...?X-Amz-Algorithm=...&X-Amz-Signature=... URL where the auth lives in the URL itself, so any Host-header rewriting downstream doesn't break the signature. Smoke verified locally — presigned URL connects cleanly to AgentCore Browser and the screenshot pipeline runs end-to-end (6.3s, valid PNG, captures example.com correctly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The minimal IAM I shipped earlier (`StartBrowserSession`, `StopBrowserSession`, `GetBrowserSession`, `UpdateBrowserStream`) wasn't enough — the WSS automation-stream connect requires an additional `ConnectBrowserAutomationStream`-flavored action that isn't in the public CLI command list. Lambda invocations were opening sessions cleanly but 403'ing on the WSS upgrade. Widen to `bedrock-agentcore:*` to unblock the e2e flow. Followup: scope back down to the specific connect action once it's documented or surfaced via CloudTrail decoded-message-on-deny. Smoke verified: PR #1 on isadeks/vercel-abca-linear now receives a screenshot comment within ~7s of the deployment_status webhook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the screenshot processor to find a Linear issue via the PR's title/body and post the same image comment there. Approach (no GSI write-back needed): - Regex-extract Linear identifier (e.g. `ABCA-42`) from PR title/body. These are present whether the agent put them there (`task_description` carries the identifier) or Linear's own GitHub integration auto-injected the back-reference on PR open. - Scan `LinearWorkspaceRegistryTable` for `status=active` workspaces. Per-workspace, query Linear's `issueVcsBranchSearch` (which accepts the human-readable identifier) and accept the first exact-match hit. - Post the markdown image comment via the existing `postIssueComment` helper from Phase 2.0b. The Linear post is best-effort — if the registry table isn't wired, the identifier doesn't extract, or the lookup misses, the GitHub PR comment still lands. New env var `LINEAR_WORKSPACE_REGISTRY_TABLE_NAME` is optional on the processor; the construct only sets it when the prop is provided. CDK: `GitHubScreenshotIntegrationProps` gains an optional `linearWorkspaceRegistryTable`. When provided, the processor's IAM grows: ReadData on the registry, GetSecretValue+PutSecretValue on `bgagent-linear-oauth-*`. `agent.ts` wires `linearIntegration.workspaceRegistryTable` into the screenshot construct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Some providers (Vercel, Netlify) post deployment_status faster than the agent can run `gh pr create`. Retry the GitHub PR-lookup with backoff so the screenshot finds the open PR rather than dropping the event when the timing is reversed.
…issue comment spam Move the trigger-label check ahead of every user-facing comment path in the Linear webhook processor, and switch the default trigger label from 'bgagent' to 'abca'. An unlabeled issue is now a true no-op: no comment, no reaction, no createTaskCore, no DDB writes — regardless of whether the project is onboarded. Why: workspace webhooks fire workspace-wide. A single un-onboarded team in the same Linear workspace produced 47 identical "❌ project isn't onboarded" comments on GRO-783 in 5 minutes because every Issue event (create/update/label-change) hit the not-onboarded gate before the label gate. With the gate order flipped, only issues that explicitly opt in via the trigger label can ever generate user-facing feedback. Per-project label_filter override is still respected — the project mapping lookup now happens once, before the label gate, instead of after. Tests: two new regression tests pin the spam scenario (unlabeled issue in a non-onboarded project, and unlabeled issue with no projectId) to zero side effects. Full CDK suite (89 suites / 1572 tests) passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the operator walkthrough for wiring up the AgentCore-Browser preview-deploy screenshot pipeline.
Mirrors the Linear webhook-info pattern so docs and onboarding don't
have to embed stack-specific URLs or copy-paste aws CLI invocations.
Two subcommands:
- `webhook-info` — read-only. Reads GitHubWebhookUrl + GitHubWebhookSecretArn
from the CFN stack outputs and prints values to paste into a GitHub
repo's webhook config (Settings → Webhooks → Add webhook). Includes
the event-type ('Deployment statuses') and content-type guidance
that operators consistently miss.
- `set-webhook-secret` — interactive PutSecretValue against the stack
output ARN. Replaces the cargo-cult `aws secretsmanager put-secret-
value` operators were copy-pasting from the screenshot setup notes.
Warns before overwriting an existing real secret (heuristic: a CDK-
seeded JSON placeholder starts with `{`; a real GitHub secret won't).
No CDK changes — both stack outputs were already there. Pure CLI add.
The pipeline was always provider-agnostic — it listens for GitHub deployment_status events, which Vercel, AWS Amplify, Netlify, and any GitHub-Actions-driven CD pipeline all post. Code comments, inline strings, and the setup guide referenced Vercel as if it were the only supported path; this commit aligns the surfacing with what the code actually does. Code: - Linear comment body: "after the Vercel preview deploy finished" → "after the deploy finished" (the GitHub PR comment already said this; just the Linear path was inconsistent) - Webhook receiver doc-comment + envelope interface comment: drop Vercel-only language; explain that the `environment` filter (`SCREENSHOT_TARGET_ENVIRONMENT` env var) is configurable per- provider, with a table of common values - Processor PR-race comment: explain that the gap is also seen on Netlify/Amplify, not unique to Vercel - AgentCore Browser comment: drop Vercel-specific phrasing on "what we don't try to be clever about" - GitHubScreenshotIntegration construct prop docstring: explain the per-provider env-name conventions Docs: - Rename VERCEL_SETUP_GUIDE.md → DEPLOY_PREVIEW_SCREENSHOTS_GUIDE.md - Lead with a "works with any provider that posts deployment_status" table (Vercel / Amplify / Netlify / GitHub Actions custom CD, with "out-of-the-box?" yes/no per provider) - Keep Vercel as the worked example since it's what we smoke-tested, but add a "skip Steps 1-2" callout for non-Vercel providers - New "Configuring for non-Vercel providers" section with the SCREENSHOT_TARGET_ENVIRONMENT override pointer - Replace 4a/4b's CFN-output spelunking with `bgagent github webhook-info` + `bgagent github set-webhook-secret` (commands shipped in 1c1b618) - Troubleshooting: mention that 401 "Invalid signature" is the set-webhook-secret-mismatch case - Sync registration: register as DEPLOY_PREVIEW_SCREENSHOTS_GUIDE in sync-starlight.mjs route map + the explicit mirror call; added to astro.config.mjs sidebar after the PAK runbook No CDK structural changes — the construct prop, env-var, and code behaviour were already provider-agnostic. Pure surfacing fix.
…eamble
Step 3 (repo onboarding + Linear project mapping) duplicated work
the Prerequisites section already establishes ('Linear OAuth installed
for at least one workspace'). If the user followed the Linear setup
guide, both are done. If they didn't, Step 4's smoke test fails fast
and the troubleshooting routes them back. Net: 30 lines of doc gone,
no information lost.
Renumbered Step 4 → 3 and Step 5 → 4 (and the 4a/b/c → 3a/b/c
sub-steps).
Also dropped the 'demo configuration optimizes for "look, it works"
rather than security posture' framing on the production-hardening
section. The list of followups stands on its own; the framing reads
as condescending toward someone reaching the bottom of the guide.
… state Public docs that say 'followup' read as commitments to do that work. Reframe gaps as current limitations with neutral language: - 'Production hardening (followups)' → 'Production hardening considerations'; bullets describe what to think about, not what ABCA promises to ship - Netlify table row: 'followup to support pattern matching' → '⚠ workable today only by picking one specific PR's environment string; broader pattern matching isn't shipped' - Vercel auth callout: 'tracked as a followup' → 'currently not implemented' - Non-Vercel providers table: drop 'followup aws-samples#96 covers prefix routing' reference (issue numbers don't belong in user-facing docs) Net: same information, no implicit roadmap commitments.
The screenshot pipeline only needs GitHub. Linear-side posting was
phrased as a hard requirement throughout the guide because the demo
flow happens to use Linear, but a non-Linear team gets a perfectly
useful integration: screenshots land on GitHub PRs, the Linear
lookup silently no-ops.
Reframings:
- Lead-in: 'on both the open GitHub PR AND the linked Linear issue'
→ 'on the open GitHub PR. If you also have Linear configured,
the same screenshot is posted to the linked Linear issue as a
bonus.' Plus a note on the gating (LinearWorkspaceRegistryTable
having active rows is what flips the Linear path on).
- 'How it works': step 4 (Linear post) marked optional with the
silent-skip behaviour spelled out
- Architecture comment: 'GitHub PR comment + Linear issue comment'
→ '... (+ Linear issue comment if linked)'
- Prerequisites: Linear OAuth marked optional with rationale
- Smoke test: rewritten as PR-driven by default ('open any PR on
the configured repo'), with Linear-driven path as a follow-on
paragraph ('If you also have Linear configured...')
- Troubleshooting: 'Linear is best-effort' → 'opt-in and best-
effort', explicit note that skipping is normal without Linear
GitHub PR comment now reads 'From [preview link](url)' and Linear comment reads '[Preview link](url)' instead of pasting the bare URL. Cleaner visual when the same comment is posted on both surfaces.
Closes the doc gaps from the screenshot feature followup list: - USER_GUIDE.md: new 'Preview-deploy screenshots (optional)' subsection under Notifications, points at DEPLOY_PREVIEW_SCREENSHOTS_GUIDE.md. - COST_MODEL.md: 'Optional: deploy-preview screenshots' table covering AgentCore Browser session, Lambda processor, S3, CloudFront line items (~$0.01 per screenshot, dominated by Browser session time). - ROADMAP.md: marks the feature shipped under Notification plane with a one-line description of the trigger model and post-deploy latency. Mirrors regenerated via docs/scripts/sync-starlight.mjs.
The 'Inviting teammates' section was missing the prerequisite that the teammate needs their own ABCA account (Cognito user + configured CLI) before they can redeem a Linear invite code. New flow walks through: Admin: invite-user (Cognito) → invite-user <slug> (Linear) Teammate: configure --from-bundle → login → linear link <code> with cross-references to USER_GUIDE.md's 'Joining an existing deployment' for the Cognito-side details. Also corrects the stale 'auto-links the person running the wizard' claim — setup now offers an inline picker (opt-in by admin), not an automatic mapping.
Last batch of stale 'Vercel' framing in CLI command output, missed in the original de-Vercel-ize sweep. Provider-agnostic now: webhook-info header reads 'preview-deploy screenshot pipeline', the closing note lists Vercel/Amplify/Netlify/GitHub Actions as example providers, and the smoke-test instruction says 'push to a PR-attached branch' rather than 'trigger a Vercel preview deploy'. No behaviour change; pure copy.
…mutation) The local build's eslint --fix step rewrote a no-interpolation template literal to single quotes; CI's 'Fail build on mutation' guard caught that the mutation wasn't committed. Apply the fix.
krokoko
left a comment
There was a problem hiding this comment.
Review — Preview-deploy screenshot pipeline
Thanks for this — it's a really well-built feature. The receiver/processor topology faithfully mirrors LinearIntegration, the verify module follows linear-verify.ts, the private-S3 + CloudFront-OAC design is the correct answer to account-level Block Public Access, and the SigV4-presigned-WSS handshake is nicely reasoned (the commit trail documenting why you landed there is genuinely helpful). The docs are thorough and the Starlight mirrors are regenerated and committed. A few things I'd like to see addressed before merge, then some non-blocking notes.
Requesting changes on
1. Tests on the new security boundary. POST /v1/github/webhook is authorizationType: NONE, so the in-Lambda HMAC is the entire authentication path — and it currently ships with no unit tests. The repo already has copy-pasteable templates for exactly this shape (slack-verify.test.ts, linear-webhook.test.ts). The load-bearing branches I'd want pinned:
verifyGitHubSignature— thetimingSafeEqualunequal-length throw guard, thesha256=prefix check, and the rotation re-fetch inverifyGitHubRequest.github-webhook.tsreceiver —state/environmentfilter gates, and the dedupConditionalCheckFailedException+ rollback-DeleteCommandpath.extractLinearIdentifier— pure function, trivial to test, and theg-flaglastIndexreset is the kind of thing that breaks silently on back-to-back calls.
The browser/CDP plumbing in agentcore-browser.ts is a fair followup (it's smoke-tested), but the HMAC + receiver routing + regex are ~2–3h with the existing templates. Worth noting the coverage gate added in b277cc6 may reject this as-is.
2. Unrelated default-label change (bgagent → abca). linear-webhook-processor.ts:34 flips DEFAULT_LABEL_FILTER, but cli/src/commands/linear.ts:50, the mapping-table doc, and LINEAR_SETUP_GUIDE.md:89 ("Default trigger label is bgagent") still say bgagent. Any onboarded project whose row lacks an explicit label_filter would silently switch trigger label on deploy. Could we either split this out of the screenshot PR, or align all four sites and add a migration note? (The label-gate reorder itself looks correct and the two new regression tests are great — it's just the default-value change + drift.)
3. Processor failures are invisible. WebhookProcessorFn is invoked InvocationType: 'Event' with no DLQ / onFailure / Errors alarm, and every failure path logs-and-returns. Since the receiver already 200'd, GitHub won't redeliver — so a systemic break (IAM, AgentCore quota, token rotation) stops 100% of screenshots with no signal anywhere. Could we add an SQS DLQ + a CloudWatch alarm on processor Errors?
Non-blocking suggestions
- WAF rationale vs. code.
/v1/github/webhookis added to the priority-1 scope-down, which excludes onlySizeRestrictions_BODY— soGenericRFI_BODYis still evaluated on the path, which doesn't match the commit message ("tripsGenericRFI_BODY"). Since the smoke test passed, the real blocker may have been body-size rather than RFI. Worth reconciling the comment/commit with what's exempted (and the troubleshooting doc, which tells operators the path is RFI-exempt). - Error/login-page screenshots.
Page.navigateonly checkserrorText; an HTTP 4xx/5xx or auth wall navigates "successfully" and the 404/login PNG gets posted as if it were the app (the guide acknowledges this). Enabling theNetworkdomain and skipping the post when the main-document status isn't 2xx would avoid posting confidently-wrong output. - WebSocket leak on open-timeout (
agentcore-browser.ts:1469-1494): the open promise rejects withoutws.close()/terminate(), and the throw escapes before thetry/finally, leaving a dangling socket per failed attempt. Wrapping the open in the same finally would tidy this up. - Unguarded
res.json()(github-webhook-processor.ts:854): a 2xx non-array/HTML body throws, and sincefindPullRequestForShaWithRetryhas no try/catch, it crashes the (un-DLQ'd) processor. AnArray.isArrayguard returning null keeps the function's existing contract. - SSRF note:
environment_urlflows from the payload straight intoPage.navigate. It's HMAC-gated and runs in the managed AgentCore session (not the Lambda VPC), so blast radius is bounded — but a scheme/host allowlist before navigating would be a cheap hardening follow-up. bedrock-agentcore:*on*is broader than the project precedent (task-orchestrator.tsscopes specific actions). Acknowledged in the guide as a known gap — worth a tracked follow-up issue.- Stale strings: a couple of CfnOutput descriptions still say "Vercel-preview", and the construct doc-comment still calls it a "Public-read screenshot S3 bucket" though it's now private + OAC.
Really nice work overall — the bulk of this is the test coverage on the auth boundary and making failures observable; the rest is polish. Happy to pair on the test scaffolding if useful.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #241 +/- ##
=======================================
Coverage ? 86.15%
=======================================
Files ? 169
Lines ? 40038
Branches ? 3931
=======================================
Hits ? 34495
Misses ? 5543
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Closes #240.
Summary
deployment_status: successevents from any provider (Vercel, Amplify Hosting, Netlify, GitHub Actions custom CD); captures the preview URL via AgentCore Browser; posts a markdown image comment on the open PR.bgagent githubCLI subcommand:webhook-info+set-webhook-secret.Architecture
/v1/github/webhookforGenericRFI_BODY(deployment_status payloads embed absolute URLs).Test plan
Vercel-specific code paths, only deployment_status filtering by configurableSCREENSHOT_TARGET_ENVIRONMENT(defaultPreview)Out of scope (followups)
bedrock-agentcore:*for the processor; tighter action set is followup.