Skip to content

improvement(polling): fix correctness and efficiency across all polling handlers#4067

Open
waleedlatif1 wants to merge 7 commits intostagingfrom
fix/poll
Open

improvement(polling): fix correctness and efficiency across all polling handlers#4067
waleedlatif1 wants to merge 7 commits intostagingfrom
fix/poll

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Gmail: paginate History API with `do...while` loop, add `historyTypes=messageAdded` filter, throw on 403/429 instead of silently falling back, fetch fresh `historyId` from profile API when fallback search returns 0 results (breaks 404 retry loop)
  • Outlook: follow `@odata.nextLink` pagination so inboxes >25 new emails aren't silently truncated, use `fetchWithRetry` for all Graph API calls (handles 429 + Retry-After), align `$top` with hard cap, skip folder filter on partial well-known folder resolution failure instead of producing wrong results, remove `Content-Type` from GET requests (RFC 9110)
  • RSS: add conditional GET (`ETag`/`If-None-Match`, `Last-Modified`/`If-Modified-Since`) to eliminate redundant fetches, handle 304 per RFC 9111, raise GUID dedup cap 100→500, align GUID tracking fallback key with idempotency key for guid-less items
  • IMAP: reuse single `ImapFlow` connection across fetch and mark-as-read (was two connections per poll), track `UIDVALIDITY` per mailbox to detect mailbox recreation, advance UID inside fetch loop per received message (not before — prevents silent data loss on connection drops), fix `messageFlagsAdd` range type, remove cross-mailbox legacy UID fallback
  • Route + background task: dispatch polling via trigger.dev task with per-provider `concurrencyKey`; fall back to synchronous Redis-locked polling for self-hosted deployments

Type of Change

  • Bug fix
  • Improvement

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…ng handlers

- Gmail: paginate history API, add historyTypes filter, differentiate 403/429,
  fetch fresh historyId on fallback to break 404 retry loop
- Outlook: follow @odata.nextLink pagination, use fetchWithRetry for all Graph
  calls, fix $top alignment, skip folder filter on partial resolution failure,
  remove Content-Type from GET requests
- RSS: add conditional GET (ETag/If-None-Match), raise GUID cap to 500, fix 304
  ETag capture per RFC 9111, align GUID tracking with idempotency fallback key
- IMAP: single connection reuse, UIDVALIDITY tracking per mailbox, advance UID
  only on successful fetch, fix messageFlagsAdd range type, remove cross-mailbox
  legacy UID fallback
- Dispatch polling via trigger.dev task with per-provider concurrency key;
  fall back to synchronous Redis-locked polling for self-hosted
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 9, 2026 6:58am

Request Review

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 9, 2026

PR Summary

Medium Risk
Touches production polling flows and external API interactions (pagination, state tracking, background dispatch), which could affect webhook delivery semantics if edge cases are missed, but changes are localized to polling paths with explicit fallbacks and caps.

Overview
Improves webhook polling correctness/resilience across providers and optionally offloads polling execution to Trigger.dev.

The /api/webhooks/poll/[provider] route now attempts to dispatch polling via a new Trigger.dev task (providerPolling) with per-provider concurrency, and falls back to the prior Redis-lock + synchronous pollProvider path if dispatch fails or the flag is disabled.

Polling handlers were tightened to reduce missed/duplicate work and handle large inboxes and cache semantics: Gmail now paginates the History API and treats 403/429 as retryable failures (with a profile-based historyId refresh on invalid history); IMAP reuses a single connection, tracks UIDVALIDITY per mailbox, and advances UIDs only after successfully fetched messages; Outlook follows @odata.nextLink pagination up to a hard cap and uses fetchWithRetry for Graph API calls; RSS adds conditional GET support (ETag/Last-Modified), increases GUID dedupe retention, and aligns GUID fallback generation/validation.

Reviewed by Cursor Bugbot for commit 18f5323. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: RSS GUID fallback diverges between tracking and idempotency
    • Added the same truthiness guard (item.title && item.pubDate) to the idempotency key computation at line 295-298 to match the tracking and filtering logic.

Create PR

Or push these changes by commenting:

@cursor push 79c99ddceb
Preview (79c99ddceb)
diff --git a/apps/sim/lib/webhooks/polling/rss.ts b/apps/sim/lib/webhooks/polling/rss.ts
--- a/apps/sim/lib/webhooks/polling/rss.ts
+++ b/apps/sim/lib/webhooks/polling/rss.ts
@@ -292,7 +292,10 @@
 
   for (const item of items) {
     try {
-      const itemGuid = item.guid || item.link || `${item.title}-${item.pubDate}`
+      const itemGuid =
+        item.guid ||
+        item.link ||
+        (item.title && item.pubDate ? `${item.title}-${item.pubDate}` : '')
 
       await pollingIdempotency.executeWithIdempotency(
         'rss',

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR comprehensively fixes correctness and efficiency issues across all four polling handlers (Gmail, Outlook, IMAP, RSS) and modernises the dispatch layer by routing polling jobs through trigger.dev when available, with a Redis-locked synchronous fallback for self-hosted deployments.

Key improvements include: Gmail History API pagination with do…while and proper 403/429 escalation; Outlook @odata.nextLink pagination and fetchWithRetry; RSS conditional GET (ETag/If-None-Match) with 304 fast-path; and a major IMAP refactor that reuses a single ImapFlow connection, tracks UIDVALIDITY per mailbox, and fixes the messageFlagsAdd range type.

Confidence Score: 5/5

Safe to merge — all remaining findings are P2 style/architecture suggestions that do not affect runtime correctness.

The bug fixes (IMAP connection reuse, messageFlagsAdd range type, Gmail 403/429 escalation, Outlook pagination, RSS 304 handling) are all correct. The only flagged issue is a cross-domain import of fetchWithRetry from the knowledge module, which is a structural concern but does not cause any runtime defect.

apps/sim/lib/webhooks/polling/outlook.ts — cross-domain fetchWithRetry import worth addressing in a follow-up.

Vulnerabilities

No security concerns identified. IMAP host validation via validateDatabaseHost is retained; RSS feed URLs continue to go through validateUrlWithDNS + secureFetchWithPinnedIP; OAuth tokens are handled via the existing resolveOAuthCredential path; and no new secret-handling or injection surfaces are introduced.

Important Files Changed

Filename Overview
apps/sim/lib/webhooks/polling/outlook.ts Adds @odata.nextLink pagination, fetchWithRetry (imported from knowledge domain — cross-domain coupling), removes Content-Type from GETs, and skips folder filter on partial well-known folder resolution failure.
apps/sim/lib/webhooks/polling/imap.ts Major refactor: single ImapFlow connection shared between fetch and mark-as-read, UIDVALIDITY tracking per mailbox, UID advance inside fetch loop, fixed messageFlagsAdd range type from object to number.
apps/sim/lib/webhooks/polling/gmail.ts History API now uses do…while pagination with historyTypes=messageAdded filter, throws on 403/429, and fetches a fresh historyId from the profile API on zero-result fallback to break the 404 retry loop.
apps/sim/lib/webhooks/polling/rss.ts Adds conditional GET (ETag/If-None-Match, Last-Modified/If-Modified-Since), handles 304 correctly, raises GUID dedup cap from 100 to 500, and aligns fallback GUID key with idempotency key.
apps/sim/app/api/webhooks/poll/[provider]/route.ts Dispatches polling to trigger.dev when enabled (with per-provider concurrencyKey), gracefully falls back to Redis-locked synchronous polling when trigger.dev fails or is absent.
apps/sim/background/provider-polling.ts New trigger.dev task wrapping pollProvider; 5-minute maxDuration, single retry, concurrencyLimit=1 per provider via concurrencyKey.
apps/sim/app/api/schedules/execute/route.ts Trivial cleanup: converts dynamic import of getWorkflowById to a static top-level import.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Cron triggers GET /poll/provider] --> B{verifyCronAuth}
    B -- fail --> Z[401 Unauthorized]
    B -- pass --> C{isTriggerDevEnabled?}
    C -- yes --> D[providerPolling.trigger with per-provider concurrency]
    D -- success --> E[200 dispatched - async run begins]
    D -- error --> F[Fall through to sync path]
    C -- no --> F
    F --> G[acquireLock provider-polling-lock]
    G -- not acquired --> H[202 skipped - poll already running]
    G -- acquired --> I[pollProvider synchronously]
    I --> J[releaseLock]
    J --> K[200 completed]
    D -.-> L[Trigger.dev task runs pollProvider]
    L --> M[Gmail / Outlook / IMAP / RSS handler]
Loading

Reviews (6): Last reviewed commit: "fix(imap): preserve fresh UID after UIDV..." | Re-trigger Greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 7c7c95d. Configure here.

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 18f5323. Configure here.

nextUrl =
allEmails.length < maxEmails ? (data['@odata.nextLink'] as string | undefined) : undefined

if (pageEmails.length === 0) break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outlook pagination never fetches beyond first page

Medium Severity

The $top parameter is set to maxEmails, so the first Graph API page already returns up to maxEmails items. The while loop condition allEmails.length < maxEmails then immediately evaluates false after the first page fills up, making the @odata.nextLink pagination dead code in that scenario. When Graph returns fewer items per page than $top (which is common), pagination works correctly. But when maxEmailsPerPoll is at or below Graph's default page size (typically ~10–50), the loop terminates after one page even if there are more matching emails, silently truncating results — the exact behavior the PR aims to fix.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 18f5323. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant