Skip to content

bench: add key-only payload sort benchmarks#21837

Open
gratus00 wants to merge 1 commit intoapache:mainfrom
gratus00:add-sort-key-payload-benchmarks
Open

bench: add key-only payload sort benchmarks#21837
gratus00 wants to merge 1 commit intoapache:mainfrom
gratus00:add-sort-key-payload-benchmarks

Conversation

@gratus00
Copy link
Copy Markdown

Which issue does this PR close?

Partially addresses #21543.

Rationale for this change

#21543 points out that the existing sort benchmark does not cover an important ExternalSorter shape: sorting by a cheap key while carrying non-key payload columns through the sort

The existing tuple/string/dictionary cases sort by every column via make_sort_exprs(schema). That is useful coverage, but it does not model queries like ORDER BY key over rows that also contain UTF-8 or dictionary payload columns. This PR adds benchmark coverage for that case, following the discussion in #21543 and the benchmark/use-case direction from #21688

I used Codex to help draft parts of this benchmark change and the PR description. I reviewed and adjusted the resulting code locally before opening the PR

What changes are included in this PR?

  • Change the sort benchmark batch size from 1024 to 8192, matching DataFusion's default target batch size more closely
  • Add key-only sort benchmark cases:
    • i64 key utf8 payload
    • i64 key dictionary payload
  • Add selected-column sort helpers for those new benchmark cases, so they sort only by key while carrying payload columns
  • Keep the existing benchmark cases sorting all columns as before

The new cases are run across the existing four benchmark shapes:

  • merge sorted
  • sort merge
  • sort
  • sort partitioned

Are these changes tested?

Focused checks run locally:

cargo fmt --all --check
cargo check -p datafusion --bench sort
cargo clippy -p datafusion --bench sort --all-features -- -D warnings

Are there any user-facing changes?

No. This is a benchmark-only change

@github-actions github-actions Bot added the core Core DataFusion crate label Apr 25, 2026
@gratus00
Copy link
Copy Markdown
Author

@mbutrovich would really appreciate your review on this, I think there might be room for improvement here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant