-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix array_repeat handling of null count values
#20102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| OffsetBuffer::new(offsets.into()), | ||
| repeated_values, | ||
| None, | ||
| Some(NullBuffer::new(nulls.finish())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can copy the nulls buffer from the count array instead of using a builder
| Some(NullBuffer::new(nulls.finish())), | |
| count_array.nulls().cloned(), |
| let total_repeated_values: usize = count_array | ||
| .values() | ||
| .iter() | ||
| .map(|&c| if c > 0 { c as usize } else { 0 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: technically the spec allows null slots to have non-0 values, so this could overestimate
| for row_idx in 0..count_array.len() { | ||
| let (count, count_is_valid) = get_count_with_validity(count_array, row_idx); | ||
| outer_nulls.append(count_is_valid); | ||
| let list_is_valid = !list_array.is_null(row_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let list_is_valid = !list_array.is_null(row_idx); | |
| let list_is_valid = list_array.is_valid(row_idx); |
| ), | ||
| Arc::new(inner_list), | ||
| None, | ||
| Some(NullBuffer::new(outer_nulls.finish())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note here about reusing input null buffer
Which issue does this PR close?
array_repeatfunction does not handle null values in count array #20075.Rationale for this change
The previous implementation of
array_repeatrelied on Arrow defaults when handling null and negative count values. As a result, null counts were implicitly treated as zero and returned empty arrays, which is a correctness issue.This PR makes the handling of these edge cases explicit and aligns the function with SQL null semantics.
What changes are included in this PR?
Int64Are these changes tested?
Yes, SLTs added and pass.
Are there any user-facing changes?
Yes. When the count value is null,
array_repeatnow returns a null array instead of an empty array.