Add QuerySpaceNormalizingMiddleware server middleware to handle clients using plus-encoding#194
Add QuerySpaceNormalizingMiddleware server middleware to handle clients using plus-encoding#194rbranche wants to merge 1 commit into
QuerySpaceNormalizingMiddleware server middleware to handle clients using plus-encoding#194Conversation
|
Discussed offline, and the team would like this to be a shared Middleware instead of built into the runtime itself. Working on that now. |
4892c75 to
c5f1d13
Compare
👍 Thanks @rbranche! For others following this thread, this approach is preferred because we don't typically add workarounds for things that aren't expressly part of the spec on the default codepath. This includes codegen preferences for off-piste doc idioms, and external clients and servers that send requests and responses that do not conform to the spec. We have the middleware as an escape hatch for these sorts of compatibility issues, which presents an opt-in approach for only those that need it. For middleware that are generalisable enough to have wide appeal, we've also been happy to add these to the runtime package itself (e.g. the |
simonjbeaumont
left a comment
There was a problem hiding this comment.
Thanks for the rework. This patch is looking in really good shape. A couple of minor notes.
| if let path = request.path, let queryStart = path.firstIndex(of: "?") { | ||
| let queryEnd = path.firstIndex(of: "#") ?? path.endIndex | ||
| let query = path[path.index(after: queryStart)..<queryEnd] | ||
| if query.contains("+") { | ||
| let normalizedQuery = query.replacingOccurrences(of: "+", with: "%20") | ||
| let prefix = path[..<path.index(after: queryStart)] | ||
| let suffix = path[queryEnd...] | ||
| request.path = String(prefix) + normalizedQuery + String(suffix) | ||
| } | ||
| } |
There was a problem hiding this comment.
If a malicious client sends a malformed request with the ? inside the fragment, e.g. /a#b?c+d, I think this would crash in the range operator.
Can we add a test to confirm?
We'll probably need to first find the start of the fragment, if any, to determine where the query starts and ends. Something like:
if let path = request.path {
let fragmentStart = path.firstIndex(of: "#") ?? path.endIndex
let pathAndQuery = path[..<fragmentStart]
if let queryStart = pathAndQuery.firstIndex(of: "?") {
let query = path[path.index(after: queryStart)..<fragmentStart]
if query.contains("+") {
// ... the stuff you did here
}
}
}| /// - Note: Only the query portion of the request path is modified. The path | ||
| /// itself and the request body are not touched. |
There was a problem hiding this comment.
Nit.
| /// - Note: Only the query portion of the request path is modified. The path | |
| /// itself and the request body are not touched. | |
| /// - Note: Only the query of the request is modified. The path, fragment, and | |
| /// body are not touched. |
| let normalizedQuery = query.replacingOccurrences(of: "+", with: "%20") | ||
| let prefix = path[..<path.index(after: queryStart)] | ||
| let suffix = path[queryEnd...] | ||
| request.path = String(prefix) + normalizedQuery + String(suffix) |
There was a problem hiding this comment.
Given this will be on a pretty hot path, we should probably reduce the allocations made by the creation of the intermediate strings by using reserveCapacity and append when building up the final string.
If we really wanted to eek out performance from this, we'd probably also want to drop down to String.UTF8View for the processing but, given this is opt-in, I don't mind us deferring that.
aefce2c to
bd2a2a6
Compare
Add an opt-in ServerMiddleware that rewrites unencoded "+" characters in the request's query string to "%20" before the request reaches the decoder. This lets servers interoperate with clients that encode query parameter spaces using the application/x-www-form-urlencoded convention (as used by Python requests, Java URLEncoder, Scala sttp, and JavaScript URLSearchParams) without changing the default RFC 3986 behavior of the generated server code. Only the query of the request is modified; the path, fragment, and body are untouched. Literal "+" characters that were correctly percent-encoded as "%2B" by the client are unaffected. A malformed input where "?" appears inside the fragment (e.g. "/a#b?c+d") is left untouched rather than crashing. The rewrite builds the new path with reserveCapacity and character-by-character append to avoid the intermediate string allocations of replacingOccurrences and concatenation. Also add an "Additional middlewares" section to the DocC documentation that lists ErrorHandlingMiddleware alongside the new middleware, so built-in middlewares are discoverable from the top-level docs. Move HTTPResponseConvertible into the existing "Errors" section, where it belongs.
bd2a2a6 to
510dba9
Compare
simonjbeaumont
left a comment
There was a problem hiding this comment.
Great addition—thank you for the contribution! 🙏
QuerySpaceNormalizingMiddleware server middleware to handle clients using plus-encoding
Motivation
The OpenAPI specification follows RFC 3986, which requires spaces in query strings to be percent-encoded as
%20. However, many widely-used HTTP client libraries — including Pythonrequests, JavaURLEncoder, Scalasttp, and JavaScriptURLSearchParams— encode query parameter spaces as+, following theapplication/x-www-form-urlencodedconvention. Without any handling for this, query parameters sent from such clients arrive at generated server handlers with literal+characters instead of spaces.Per maintainer feedback, rather than loosening the library's default decoder (which would impose the cost on every user), this behavior should be provided as opt-in middleware — following the same pattern as
ErrorHandlingMiddleware. Servers that only need to cater to spec-compliant clients don't pay the cost, and servers that want lenience can configure it in a single line.Modifications
QuerySpaceNormalizingMiddleware, an opt-inServerMiddlewarethat rewrites unencoded+characters in the request's query string to%20before the request reaches the decoder. Only the query portion of the request path is modified; the path, fragment, and request body are untouched. Literal+characters that were correctly percent-encoded as%2Bby the client are unaffected.### Additional middlewarestopic in the DocC documentation that listsErrorHandlingMiddlewareand the newQuerySpaceNormalizingMiddleware, so built-in middlewares are discoverable from the top-level docs.HTTPResponseConvertibleinto the### Errorstopic (it's a protocol used byErrorHandlingMiddleware).Result
Server authors who need to accept
+-encoded spaces can opt in with a single line:Servers that don't register the middleware continue to behave exactly as before (spec-compliant %20 only).
Test Plan