Skip to content

Report failed async connect() through getsockopt(SO_ERROR)#860

Open
steffen-heil-secforge wants to merge 1 commit into
PowerShell:latestw_allfrom
steffen-heil-secforge:fix/win32-connect-so-error-fallback
Open

Report failed async connect() through getsockopt(SO_ERROR)#860
steffen-heil-secforge wants to merge 1 commit into
PowerShell:latestw_allfrom
steffen-heil-secforge:fix/win32-connect-so-error-fallback

Conversation

@steffen-heil-secforge

Copy link
Copy Markdown

Tracking issue: PowerShell/Win32-OpenSSH#2441

Summary

A failed asynchronous connect() is not reported through getsockopt(SO_ERROR) on Windows, so ssh treats a refused connect as a success and stops at the first resolved address instead of trying the rest (e.g. the IPv6 → IPv4 fallback in ssh_connect_direct()). On a dual-stack host whose service answers only on IPv4, ssh ends with:

banner exchange: Connection to UNKNOWN port -1: Connection refused

while the identical invocation falls back to IPv4 and connects on POSIX.

Cause

ssh_connect_direct() (sshconnect.c) → timeout_connect() (misc.c) uses the POSIX idiom: non-blocking connect()poll(POLLOUT)getsockopt(SO_ERROR), where a non-zero SO_ERROR means "this address failed, try the next one".

On Windows the asynchronous ConnectEx() failure is delivered through the overlapped completion and captured by socketio_finish_connect() / socketio_is_io_available() into w32_io.write_details.error / read_details.error; the underlying socket's SO_ERROR is left at 0. socketio_getsockopt() forwarded SO_ERROR straight to the Winsock getsockopt() (→ 0), so timeout_connect() saw optval == 0, reported success, and the address loop never advanced. getpeername() on the unconnected socket then yields UNKNOWN / -1, and the first banner write returns the original WSAECONNREFUSED. The same getsockopt(SO_ERROR) idiom in channels.c (forwarded-channel connects) was affected identically.

Changes

  • contrib/win32/win32compat/socketio.c: in socketio_getsockopt(), for SOL_SOCKET / SO_ERROR return the captured async-connect error (write_details.error / read_details.error, mapped via errno_from_WSAError()) when one is set, instead of the raw Winsock SO_ERROR. Otherwise it falls through to the normal getsockopt().
  • regress/unittests/win32compat/socket_tests.c: regression test (socket_connect_error_tests) — a non-blocking connect() to a closed loopback port must report a non-zero SO_ERROR.

Testing

Built on Windows x64 (Release). End-to-end: ssh to a dual-stack host whose IPv6 endpoint refuses the port now logs connect to address … Connection refused and falls back to the IPv4 address (matching POSIX), instead of failing at the banner exchange with Connection to UNKNOWN port -1. Added the socket_connect_error_tests regression test covering the refused-connect → getsockopt(SO_ERROR) path.

A failed asynchronous ConnectEx() delivers its error via the overlapped
completion, which socketio_finish_connect()/socketio_is_io_available()
capture into the w32_io read/write details. The underlying socket's
SO_ERROR is left at 0, so socketio_getsockopt() - which forwarded SO_ERROR
straight to Winsock - reported success for a connect that actually failed.

POSIX code uses getsockopt(SO_ERROR) after a non-blocking connect() to
detect failure and try the next address. timeout_connect() and
ssh_connect_direct() therefore treated a refused connect as success and
stopped at the first address, never falling back (e.g. IPv6 -> IPv4 on a
dual-stack host). The connection then failed at the banner exchange with
"Connection to UNKNOWN port -1: Connection refused" (getpeername() on the
unconnected socket yields UNKNOWN/-1).

Surface the captured connect error for SO_ERROR so the standard non-blocking
connect idiom and multi-address fallback behave as on POSIX. Add a
win32compat regression test.
Copilot AI review requested due to automatic review settings June 25, 2026 19:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a Windows-specific mismatch in how failed asynchronous (overlapped) connect() results are surfaced via getsockopt(SO_ERROR), restoring the expected POSIX behavior relied on by OpenSSH’s non-blocking connect logic (including address fallback when the first resolved address fails).

Changes:

  • Update socketio_getsockopt() to return the captured async-connect failure (from the overlapped completion) when callers query SO_ERROR.
  • Add a Windows unit test to ensure a refused non-blocking connect reports a non-zero SO_ERROR.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
contrib/win32/win32compat/socketio.c Makes SO_ERROR reflect the captured async-connect failure instead of the underlying socket’s (incorrectly zero) value.
regress/unittests/win32compat/socket_tests.c Adds a regression test covering refused non-blocking connect → getsockopt(SO_ERROR) behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +737 to +740
time_val.tv_usec = 0;
retValue = select(connect_fd + 1, &read_set, &write_set, NULL, &time_val);
ASSERT_INT_NE(retValue, -1);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants