-
Notifications
You must be signed in to change notification settings - Fork 384
feat: Support TimestampNs and TimestampTzNs` in bucket transform
#1150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonathanc-n for this pr, LGTM!
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
| .downcast_ref::<arrow_array::TimestampMicrosecondArray>() | ||
| .unwrap() | ||
| .unary(|v| self.bucket_timestamp(v)), | ||
| DataType::Time64(TimeUnit::Nanosecond) => input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in pyiceberg, TimestampNanoType in converted to micros precision before bucketing. This is to ensure that TimestampType and TimestampNanoType are transformed to the same value
I think we should add a test to ensure this here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, It wasn't performing this conversion. The test is added
|
@Fokko FYI new transform function we can use for apache/iceberg-python#1833 |
54d001b
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
| .as_any() | ||
| .downcast_ref::<arrow_array::Time64NanosecondArray>() | ||
| .unwrap() | ||
| .unary(|v| self.bucket_time(v / 1000)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was an issue with rounding when performing division for the hour transform (#1146)
but since we're dividing by 1000, i dont think the same issue applies here
| .transform_literal(&Datum::timestamp_nanos(ns_value)) | ||
| .unwrap() | ||
| .unwrap(), | ||
| Datum::int(79) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, since we're using the same time (1510871468000000) here
should the bucket value for bucket(100) transform be the same for this test and the test above (test_timestamptz_literal)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I applied the conversion to the transform_literal as well
ZENOTME
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @jonathanc-n
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for adding the additional checks
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonathanc-n for fixing this, and @kevinjqliu for the review
Which issue does this PR close?
What changes are included in this PR?
Add bucket transforms for
TimestampNsand TimestampTzNs`Are these changes tested?
Unit tests