Report failed async connect() through getsockopt(SO_ERROR)#860
Open
steffen-heil-secforge wants to merge 1 commit into
Open
Conversation
A failed asynchronous ConnectEx() delivers its error via the overlapped completion, which socketio_finish_connect()/socketio_is_io_available() capture into the w32_io read/write details. The underlying socket's SO_ERROR is left at 0, so socketio_getsockopt() - which forwarded SO_ERROR straight to Winsock - reported success for a connect that actually failed. POSIX code uses getsockopt(SO_ERROR) after a non-blocking connect() to detect failure and try the next address. timeout_connect() and ssh_connect_direct() therefore treated a refused connect as success and stopped at the first address, never falling back (e.g. IPv6 -> IPv4 on a dual-stack host). The connection then failed at the banner exchange with "Connection to UNKNOWN port -1: Connection refused" (getpeername() on the unconnected socket yields UNKNOWN/-1). Surface the captured connect error for SO_ERROR so the standard non-blocking connect idiom and multi-address fallback behave as on POSIX. Add a win32compat regression test.
There was a problem hiding this comment.
Pull request overview
This PR fixes a Windows-specific mismatch in how failed asynchronous (overlapped) connect() results are surfaced via getsockopt(SO_ERROR), restoring the expected POSIX behavior relied on by OpenSSH’s non-blocking connect logic (including address fallback when the first resolved address fails).
Changes:
- Update
socketio_getsockopt()to return the captured async-connect failure (from the overlapped completion) when callers querySO_ERROR. - Add a Windows unit test to ensure a refused non-blocking connect reports a non-zero
SO_ERROR.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| contrib/win32/win32compat/socketio.c | Makes SO_ERROR reflect the captured async-connect failure instead of the underlying socket’s (incorrectly zero) value. |
| regress/unittests/win32compat/socket_tests.c | Adds a regression test covering refused non-blocking connect → getsockopt(SO_ERROR) behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+737
to
+740
| time_val.tv_usec = 0; | ||
| retValue = select(connect_fd + 1, &read_set, &write_set, NULL, &time_val); | ||
| ASSERT_INT_NE(retValue, -1); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tracking issue: PowerShell/Win32-OpenSSH#2441
Summary
A failed asynchronous
connect()is not reported throughgetsockopt(SO_ERROR)on Windows, sosshtreats a refused connect as a success and stops at the first resolved address instead of trying the rest (e.g. the IPv6 → IPv4 fallback inssh_connect_direct()). On a dual-stack host whose service answers only on IPv4,sshends with:while the identical invocation falls back to IPv4 and connects on POSIX.
Cause
ssh_connect_direct()(sshconnect.c) →timeout_connect()(misc.c) uses the POSIX idiom: non-blockingconnect()→poll(POLLOUT)→getsockopt(SO_ERROR), where a non-zeroSO_ERRORmeans "this address failed, try the next one".On Windows the asynchronous
ConnectEx()failure is delivered through the overlapped completion and captured bysocketio_finish_connect()/socketio_is_io_available()intow32_io.write_details.error/read_details.error; the underlying socket'sSO_ERRORis left at 0.socketio_getsockopt()forwardedSO_ERRORstraight to the Winsockgetsockopt()(→ 0), sotimeout_connect()sawoptval == 0, reported success, and the address loop never advanced.getpeername()on the unconnected socket then yieldsUNKNOWN/-1, and the first banner write returns the originalWSAECONNREFUSED. The samegetsockopt(SO_ERROR)idiom inchannels.c(forwarded-channel connects) was affected identically.Changes
contrib/win32/win32compat/socketio.c: insocketio_getsockopt(), forSOL_SOCKET/SO_ERRORreturn the captured async-connect error (write_details.error/read_details.error, mapped viaerrno_from_WSAError()) when one is set, instead of the raw WinsockSO_ERROR. Otherwise it falls through to the normalgetsockopt().regress/unittests/win32compat/socket_tests.c: regression test (socket_connect_error_tests) — a non-blockingconnect()to a closed loopback port must report a non-zeroSO_ERROR.Testing
Built on Windows x64 (Release). End-to-end:
sshto a dual-stack host whose IPv6 endpoint refuses the port now logsconnect to address … Connection refusedand falls back to the IPv4 address (matching POSIX), instead of failing at the banner exchange withConnection to UNKNOWN port -1. Added thesocket_connect_error_testsregression test covering the refused-connect →getsockopt(SO_ERROR)path.