SOLR-18198 Add support for "missing" stats in rollup for streaming expressions#4286
SOLR-18198 Add support for "missing" stats in rollup for streaming expressions#4286KhushJain wants to merge 3 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new missing(field) metric to SolrJ streaming expressions and ensures it can be correctly rolled up when metrics are computed in parallel/tiered contexts.
Changes:
- Introduces
MissingMetric(new metric functionmissing) for counting null/missing values. - Registers the new metric in
Langand adds rollup support inParallelMetricsRollup. - Updates streaming rollup tests and adds a changelog entry.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| solr/solrj-streaming/src/test/org/apache/solr/client/solrj/io/stream/StreamingTest.java | Extends rollup/parallel rollup tests and fixtures to validate missing(b_f) behavior. |
| solr/solrj-streaming/src/java/org/apache/solr/client/solrj/io/stream/metrics/MissingMetric.java | Adds the new MissingMetric implementation and streaming-expression parsing/toExpression support. |
| solr/solrj-streaming/src/java/org/apache/solr/client/solrj/io/stream/ParallelMetricsRollup.java | Adds rollup aggregation logic for MissingMetric (sum across tiers). |
| solr/solrj-streaming/src/java/org/apache/solr/client/solrj/io/Lang.java | Registers missing as a streaming expression metric function. |
| changelog/unreleased/SOLR-18198-support-missing-stats-count-in-rollup.yml | Documents the feature addition in the changelog. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| @Override | ||
| public void update(Tuple tuple) { | ||
| if (tuple.get(columnName) == null) { |
There was a problem hiding this comment.
"" is a valid value for a field. tuple.get(columnName) would only be null when the field doesn't exist. Missing counts the number of documents that doesn't have the field itself. The behavior is consistent with how stats component computes missing.
|
Can you update the docs as well??? |
Yes, updated the doc. |
https://issues.apache.org/jira/browse/SOLR-18198
Description
Rollupfunction to supportmissingstatistics in/streamhandler.Solution
Add a new
MissingMetricto the streaming expressions framework:MissingMetric.java: New metric class modeled afterCountMetric. Increments a counter whentuple.get(columnName) == null. Registered as function namemissing.Lang.java: Registermissingfunction name.ParallelMetricsRollup.java: Added SumMetric aggregation.Tests
Updated existing tests in
StreamingTest.java:b_fto only 5 of the 10 documents inhelloDocsUpdateRequest(ids 0, 4, 5, 6, 9), leaving the other 5 without it. This avoids changing existinga_fassertions.
testRollupStreamandtestParallelRollupStream: AddedMissingMetric("b_f")to the metrics array and asserted.Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.