[datafusion-spark] Support 2-argument ceil(value, scale)#21710
[datafusion-spark] Support 2-argument ceil(value, scale)#21710diegoQuinas wants to merge 6 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
@diegoQuinas
Thanks for the work here.
There are a few inconsistencies between the declared return types and the actual execution paths that could lead to planner/runtime mismatches. I’ve called those out below. Once those are addressed and we have a couple of type-contract tests in place, this should be in a much safer spot.
- Fix type contract: 2-arg ceil now preserves input type for any scale (incl. scale=0 on floats); decimal 2-arg surfaces NotImplemented - Implement integer ceil at negative scale (ceiling toward +inf, distinct from HALF_UP round); positive scale on integer is a no-op - Extract shared get_scale helper into math/scale.rs and reuse from round - Simplify Signature using TypeSignature::Numeric + Coercible([Numeric, Int32]) instead of hand-enumerated type lists - Move Rust unit tests into the SLT, including type-contract assertions via arrow_typeof for every 2-arg overload
There was a problem hiding this comment.
@diegoQuinas
Nice follow-up overall. This addresses all the original comments and the refactor looks clean. The type handling is much more consistent now and the SLT coverage is solid. I left a few small suggestions, mostly around the decimal 2-arg behavior and some minor readability and edge-case clarifications.
| match &arg_types[0] { | ||
| // 2-arg decimal is not yet supported; report input type so the planner | ||
| // does not reject the call before we surface a proper error at execution. | ||
| DataType::Decimal128(p, s) if has_scale => Ok(DataType::Decimal128(*p, *s)), |
There was a problem hiding this comment.
Good progress here, but there is still a mismatch between planning and execution for the 2-arg decimal case.
Right now return_type allows Decimal128(p, s), so the planner accepts the query. Then execution always throws not_impl_err. That means users only see the failure at runtime, which can be confusing.
It would be cleaner to fail early in return_type instead, so the planner rejects it with a clear message. Something like:
DataType::Decimal128(_, _) if has_scale => {
exec_err!("2-argument ceil is not yet supported for decimal inputs")
}
This keeps the schema contract honest and improves the developer experience.
| } | ||
| (value * factor).ceil() / factor | ||
| } else { | ||
| let factor = T::from(10.0f64.powi(-scale)).unwrap_or_else(T::infinity); |
There was a problem hiding this comment.
This edge case caught my eye. When scale is very negative and the factor overflows, we silently return 0.0.
I see this matches the integer path, but for floats it is not obvious why that is correct. It might help to add a short comment explaining the intent here, especially that extremely coarse rounding collapses everything to zero and that this is consistent with the integer behavior.
A tiny note would save future readers some head scratching.
| impl_integer_array_ceil!(input, UInt32Type, scale) | ||
| } | ||
| DataType::UInt64 if has_scale => { | ||
| let array = input.as_primitive::<UInt64Type>(); |
There was a problem hiding this comment.
The error construction here works, but it is a bit hard to read.
The (exec_err!(...) as Result<(), _>).unwrap_err() pattern feels a little roundabout. You could simplify this by mapping directly to a DataFusionError, similar to how other paths handle it.
Something like:
let v = i64::try_from(x).map_err(|_| {
datafusion_common::exec_datafusion_err!(
"ceil: UInt64 value {x} exceeds i64::MAX and cannot be processed"
)
})?;
Same behavior, but easier on the eyes.
| /// <https://github.com/apache/datafusion/issues/21560> | ||
| /// Two-argument form `ceil(expr, scale)` returns the same type as `expr` for | ||
| /// integer and floating-point inputs (regardless of `scale`). Decimal inputs | ||
| /// are not yet supported in the 2-arg form (see TODO in execution path). |
There was a problem hiding this comment.
I'd prefer not having this detail of decimals not being supported on the 2-arg path in the docstring; it should just be a TODO item for clear visibility
|
|
||
| match &arg_types[0] { | ||
| // 2-arg decimal is not yet supported; report input type so the planner | ||
| // does not reject the call before we surface a proper error at execution. |
There was a problem hiding this comment.
We should error at planning time instead of execution time if it is not supported
| // Decimal128 with positive scale (1-arg only). | ||
| ScalarValue::Decimal128(_, _, _) if has_scale => { | ||
| return not_impl_err!( | ||
| "2-argument ceil is not yet supported for decimal inputs" |
There was a problem hiding this comment.
If we aren't supporting this yet we should keep the original issue open instead of marking this PR as closing it
| } | ||
| } | ||
|
|
||
| fn out_of_range_err<T: std::fmt::Display>( |
There was a problem hiding this comment.
We can use exec_datafusion_err like so
| if scale >= 0 { | ||
| let factor = T::from(10.0f64.powi(scale)).unwrap_or_else(T::infinity); | ||
| if factor.is_infinite() { | ||
| return value; |
There was a problem hiding this comment.
Does this match with Spark behaviour?
Which issue does this PR close?
Closes #21560
Rationale for this change
The Spark
ceilfunction supports an optionalscaleparameter that controlsthe decimal position to round up to. This was not yet implemented in
datafusion-spark.
What changes are included in this PR?
Signatureto accept 1 or 2 arguments, following the same pattern asSparkRoundreturn_type: floats preserve their type when a scale is provided (instead of returningInt64); scale=0 preserves the original behaviorget_scale()helper to extract the optional scale argument, returningNonefor NULL scale (which produces a NULL result)ceil_float()helper for ceiling floats at arbitrary decimal positionsspark_ceil_scalarandspark_ceil_arrayto apply the scaleAre these changes tested?
Yes, unit tests covering:
Are there any user-facing changes?
Yes —
ceil(expr, scale)is now supported in addition to the existingceil(expr).