Skip to content

feat(functions-nested): add array_filter higher-order function#21895

Open
ologlogn wants to merge 3 commits intoapache:mainfrom
ologlogn:array-filter-lambda
Open

feat(functions-nested): add array_filter higher-order function#21895
ologlogn wants to merge 3 commits intoapache:mainfrom
ologlogn:array-filter-lambda

Conversation

@ologlogn
Copy link
Copy Markdown

@ologlogn ologlogn commented Apr 28, 2026

Which issue does this PR close?

Partially addresses #14509 (Add Native Support for List Functions) — implements list_filter / array_filter.

Rationale for this change

array_transform was recently added as the first HigherOrderUDF (#21679). array_filter is the natural companion: filter elements of an array using a boolean lambda predicate, analogous to Spark's filter and DuckDB's list_filter.

What changes are included in this PR?

  • New HigherOrderUDF ArrayFilter in datafusion/functions-nested/src/array_filter.rs
    • array_filter(array, x -> condition) with alias list_filter
    • Evaluates a boolean lambda per element; elements where the lambda returns true are kept
    • Null predicate results treated as false (element dropped), matching Spark semantics
    • Handles List and LargeList, sliced arrays, and null sublists correctly
    • Core logic: per-element boolean selection mask + recomputed per-sublist offsets via arrow::compute::filter
  • Shared HOF helpers extracted to macros_lambda.rs (value_lambda_pair, coerce_single_list_arg) to avoid duplication with array_transform
  • Shared test utilities in test_utils.rs (create_i32_list, eval_hof_on_i32_list, v())
  • Registered in all_default_higher_order_functions() and expr_fn

Are these changes tested?

Yes, unit tests cover:

  • Basic filtering (x -> x > 2)
  • Multiple sublists with different per-sublist filter counts
  • Sliced list arrays (unreachable values not evaluated)
  • Null sublists (predicate not evaluated on null sublist values; null preserved in output)
  • All-filtered-out (empty output sublist)

Are there any user-facing changes?

Yes — array_filter(array, lambda) and its alias list_filter(array, lambda) are now available as SQL functions.

@github-actions github-actions Bot added the functions Changes to functions implementation label Apr 28, 2026
@ologlogn ologlogn force-pushed the array-filter-lambda branch from 6ff8773 to 07e4548 Compare April 28, 2026 16:39
@ologlogn
Copy link
Copy Markdown
Author

Hi @gabotechs, could you please trigger CI? Thanks!

Implements array_filter(array, x -> condition) as a HigherOrderUDF,
with alias list_filter. Evaluates a boolean lambda per element and
reconstructs the list with recomputed offsets.
@ologlogn ologlogn force-pushed the array-filter-lambda branch from 07e4548 to 44715ac Compare April 28, 2026 18:24
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Apr 28, 2026
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant