Make NUT OpenSSL handshake more reliable#3344
Open
jimklimov wants to merge 2 commits intonetworkupstools:masterfrom
Open
Make NUT OpenSSL handshake more reliable#3344jimklimov wants to merge 2 commits intonetworkupstools:masterfrom
jimklimov wants to merge 2 commits intonetworkupstools:masterfrom
Conversation
|
❌ Build nut 2.8.4.4308-master failed (commit b4104c9153 by @jimklimov) |
…ng sockets [networkupstools#3331] The TLS handshake functions `SSL_accept()` (server) and `SSL_connect()` (client) require a retry loop when the underlying socket is in non-blocking mode. When either function returns '-1' with an error code of `SSL_ERROR_WANT_READ` or `SSL_ERROR_WANT_WRITE`, it is signalling a non-fatal "not done yet" condition: the handshake needs more I/O turns to complete. The correct response is to wait for the socket fd to become ready in the indicated direction and then call the same function again with the same SSL object (this is explicitly documented in OpenSSL for every version since 0.9.x and the API contract has never changed). Previously both call sites used a single-shot switch/case pattern that treated these non-fatal WANT_READ/WANT_WRITE returns as fatal errors, tearing down the connection immediately. On Linux the loopback socket is fast enough that the handshake *nearly always* completes in the first call, masking the bug. On BSD (FreeBSD, OpenBSD, NetBSD), macOS, and illumos-based systems (OmniOS, OpenIndiana) the loopback socket behaves differently and WANT_READ/WANT_WRITE are returned regularly, causing CI failures on all of those platforms regardless of compiler, C standard dialect, or OpenSSL version used. The fix is identical for all supported OpenSSL versions (0.9.x / 1.0.x / 1.1.x / 3.x): the `SSL_ERROR_WANT_*` codes and the required retry semantics have been stable since the library's inception. No NSS changes are needed: NSS uses `SSL_ForceHandshake()` which is already blocking-by-design through NSPR. Fixes issues (re-)discovered and/or confirmed by work on PR networkupstools#3330 for issue networkupstools#1711 Tested-on: Linux (passes), macOS 12, FreeBSD 12, NetBSD 11, OpenBSD 6.5, OmniOS (amd64+i386), OpenIndiana (amd64+i386) - all compilers and C standard dialects in the CI matrix. Co-authored-by: Claude Sonnet 4.6 Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
…s, work on actual code may be needed to rectify it after all [networkupstools#3331, networkupstools#1711]" This reverts commit af5da88 due to added support for WANT_READ/WANT_WRITE in OpenSSL handshake. Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
|
✅ Build nut 2.8.4.4310-master completed (commit 2df679a6ef by @jimklimov) |
|
CI was failing (shellcheck). Attempt #1 to fix. Skipping: External CI (ci.networkupstools.org) requires external access to diagnose. Please check the shellcheck errors directly in the CI logs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Following up from findings in PR #3330 to address issue #1711, this PR would hopefully make the handshake more reliable on all systems supported by NUT (CI farm at least). Apparently Linux mostly handles the SSL handshake quickly enough to succeed by the first time we ask for results, and other platforms may need a few more cycles. Tests with OpenSSL in that PR failed so frequently, they had to be neutered by default.
Prepared with AI (Claude Sonnet 4.6) as a test to see how it fares... quite well, despite not seeing failure logs from CI server and mostly relying on test case descriptors which convey compilers, OSes, etc. Had troubles preparing a diff-patch file though.
Findings seem to match what @clepple proposed but could not commit to coding the fix quickly - review welcome. I think this addresses a similar problem to that explored in https://github.com/networkupstools/nut/compare/ssl_accept_nbio_v285?expand=1 (variant of an older branch, rebased over recent NUT master code to pre-resolve merge conflicts), but at least that branch did not fully solve the issues the NIT test reproduced all too well across the board. I'd love a comment from @clepple on whether that fix may bring some other benefits (e.g. by setting socket options), or may be abandoned if this PR works?