Skip to content

GH-3464 Improve DeltaByteArrayWriter.writeBytes to avoid unnecessary allocation and scalar prefix comparison#3465

Open
arouel wants to merge 2 commits intoapache:masterfrom
arouel:dba-write-bytes
Open

GH-3464 Improve DeltaByteArrayWriter.writeBytes to avoid unnecessary allocation and scalar prefix comparison#3465
arouel wants to merge 2 commits intoapache:masterfrom
arouel:dba-write-bytes

Conversation

@arouel
Copy link
Copy Markdown

@arouel arouel commented Apr 6, 2026

Rationale for this change

DeltaByteArrayWriter.writeBytes() is on the hot path for DELTA_BYTE_ARRAY encoding and had avoidable overhead:

  1. Per-value allocation from getBytes().
    getBytes() always creates a new array. For prefix comparison this copy is unnecessary.

  2. Scalar prefix scan.
    The byte-by-byte loop is replaced with Arrays.mismatch(...), which maps to optimized JVM intrinsics.

In profiling (custom JFR benchmark on a large merge workload), this method was a top allocation hotspot before the change.

What changes are included in this PR?

In DeltaByteArrayWriter.writeBytes():

  • v.getBytes() -> v.getBytesUnsafe() for read-only prefix comparison.
  • Manual prefix loop -> Arrays.mismatch(previous, 0, length, vb, 0, length).
  • previous = vb -> previous = v.isBackingBytesReused() ? v.getBytes() : vb
    (defensive copy only when the backing bytes may be reused by the caller).

This preserves semantics while removing unnecessary allocations in the common case.

Benchmark signal (custom, directional)

On a custom JFR-profiled merge workload (180M rows, 4 binary columns), this change significantly reduced allocations and lowered CPU attributed to this path.
(Results are workload/JDK dependent and provided as directional evidence.)

Are these changes tested?

Yes.

  • Existing coverage: TestDeltaByteArray round-trip tests (including skip/skipN/reset paths).
  • Added regression test: testReusedBackingArrayRegression in TestDeltaByteArray to verify correctness when the same mutable backing array is reused across writes.

Are there any user-facing changes?

No API or format changes. This is a transparent performance optimization; encoded data remains compatible/interchangeable.

Closes #3464

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve DeltaByteArrayWriter.writeBytes to avoid unnecessary allocation and scalar prefix comparison

1 participant