Skip to content

Multi-host: retries never increments — prefer-standby loops forever, all-hosts-down hangs queries #1174

Description

@kalvenschraut

Bug

The local retries counter in src/connection.js is read but never incremented — only
options.shared.retries (the backoff counter) is. With ≥2 hosts, retries stays 0 forever, so:

  1. target_session_attrs: 'prefer-standby' with the standby down: tryNext()
    (L796) never runs
    out of hosts, so it terminates every primary connection and retries the dead standby forever.
  2. All hosts down: error()
    (L382) always sees
    "another host to try" and swallows every connect error — queries hang forever, even with
    connect_timeout.
  3. Related (Issue with auto failover with CONNECT_TIMEOUT host  #988): connectTimedOut() calls errored() directly, bypassing failover, so
    CONNECT_TIMEOUT rejects instead of trying the next host while ECONNREFUSED fails over.

Repro

With a primary on localhost:5432 and nothing on 5431 — both cases hang forever on 3.4.9:

// expected: connects to the primary
await postgres({ host: ['localhost', 'localhost'], port: [5431, 5432],
  target_session_attrs: 'prefer-standby', connect_timeout: 5 })`select 1`

// expected: rejects after both hosts fail
await postgres({ host: ['localhost', 'localhost'], port: [5431, 5430],
  connect_timeout: 5 })`select 1`

Purposed Fix

Reset retries per connection cycle, increment per failed attempt, and surface the error once
every host is tried. prefer-standby requires a standby only on the first pass over the host list
and accepts any server on a second pass (like libpq). Route connect timeouts through error()
so they fail over like refused connections (fixes #988).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions