proposal: trim query_range response to user-requested time window#7288
proposal: trim query_range response to user-requested time window#7288venkatchinmay wants to merge 1 commit intocortexproject:masterfrom
Conversation
Add docs/proposals/query-range-response-trimming.md proposing a new opt-in RangeTrimMiddleware (querier.trim-response-to-requested-range) that clips the final query_range response back to the exact [start, end] window requested by the user. This fixes data leakage outside the requested range caused by StepAlignMiddleware flooring start to the nearest step boundary, and SplitByIntervalMiddleware boundary rounding. Author: Chinmay Venkat Signed-off-by: Your Name <you@example.com>
|
@venkatchinmay |
yes @SungJin1212 |
friedrichg
left a comment
There was a problem hiding this comment.
Sorry for the late response. I appreciate the detail
|
|
||
| ## Proposed Solution | ||
|
|
||
| Add a `RangeTrimMiddleware` as the **outermost** middleware in the chain, controlled by a new opt-in configuration flag `querier.trim-response-to-requested-range`. |
There was a problem hiding this comment.
I prefer if we don't add any flag. It's a bugfix, if we are returning more samples than we should.
|
|
||
| When `querier.align-querier-with-step: true` is configured, the query frontend modifies the user's requested `start` time by flooring it to the nearest `step` boundary **before** passing the request to the results cache and downstream querier. | ||
|
|
||
| This causes the response to contain data points **outside** the user's originally requested time range. |
There was a problem hiding this comment.
This looks like a bug to me. can you create a test case to trigger this behavior ?.
If you can trigger the behavior and is buggy, you don't need a proposal. You just need a unit test that triggers it and the fix.
The only thing though is what happens if we have increased CPU usage or less cache efficiency. Those things will matter in this case too
Fixes #7289
What
Adds docs/proposals/query-range-response-trimming.md — a proposal for a new opt-in middleware that clips the
query_rangeresponse to the exact[start, end]window requested by the user.Problem
When
querier.align-querier-with-step: trueis enabled,StepAlignMiddlewarefloors the user'sstarttime to the nearest step boundary before the rest of the middleware chain runs:The original
start=09:01is permanently lost — no downstream middleware can recover it to trim the response. The user receives data from09:00even though they asked for09:01.The same drift can occur from SplitByIntervalMiddleware's nextIntervalBoundary rounding.
Proposed Fix
A new
RangeTrimMiddlewareas the outermost middleware, enabled via:It captures the original
start/endbefore any mutation, lets the full stack run internally (alignment, splitting, caching, sharding), then trims the final response using the existingExtractor.Extract()interface — which already handles per-sample and per-step-stats trimming correctly.Key Points
false, no behaviour change for existing deploymentsRelated
pkg/querier/tripperware/queryrange/results_cache.go