feat!: add exp.Trunc for numeric truncation#6923
Conversation
Add exp.Trunc expression class for numeric truncation, filling a gap alongside existing Round, Floor, and Ceil functions (in AST since 2021). Implementation: - New exp.Trunc class with _sql_names = ["TRUNC", "TRUNCATE"] - Oracle: Type-annotation-aware builder (DATE/TIMESTAMP/DATETIME -> DateTrunc, numeric -> Trunc) - Exasol: Updated to use exp.Trunc instead of Anonymous; fixed to handle DATETIME type and single-arg date TRUNC (defaults to 'DD' like Oracle) - MySQL: Generates TRUNCATE(...); TRUNC alias normalizes to TRUNCATE - T-SQL: Generates ROUND(x, n, 1) - Other dialects (PostgreSQL, Snowflake, DuckDB, BigQuery): Auto-supported via _sql_names - Date-only dialects (Hive, Spark) unchanged Tests: - Oracle: exp.Trunc identity and type assertion (vs DateTrunc), single-arg TRUNC - MySQL: TRUNCATE identity and TRUNC alias normalization - T-SQL: ROUND(x, n, 1) generation from other dialects' TRUNC - Exasol: exp.Trunc type assertion, DATETIME handling, single-arg date TRUNC Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
SQLite's TRUNC() only accepts one argument. This change: - Simplifies TRUNC(x, 0) to TRUNC(x) for SQLite - Warns when non-zero decimals are used (unsupported)
VaggelisD
left a comment
There was a problem hiding this comment.
Hey @doripo, thanks for the contribution!
- Can we increase test coverage e.g in dialects like Spark to ensure that
TRUNC(foo)remainsexp.DateTrunc? - Snowflake's TRUNC is also overloaded, so we'd need to approach it similar to Oracle & Exasol. See this comment
- SQLite: Use @unsupported_args("decimals") for best-effort transpilation,
move trunc_sql inside Generator for auto-discovery, update tests
- Spark/Hive: Add tests to verify TRUNC remains DateTrunc/TimestampTrunc
Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
- Add build_trunc to dialect.py for Oracle/Exasol/Snowflake - Uses type annotation to distinguish date vs numeric truncation - Returns Anonymous if type cannot be determined - Add Snowflake support for overloaded TRUNC/TRUNCATE - Add comprehensive Snowflake TRUNC tests Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
Add validate_all tests for: - Oracle: Date truncation (DAY, WEEK, MONTH, QUARTER, YEAR), Timestamp truncation (HOUR, MINUTE, SECOND, DAY, MONTH, YEAR) - Exasol: Date/time truncation with Exasol-specific units (YYYY, MM, DD, HH, MI, SS, WW, Q) - Cross-dialect: Oracle, Snowflake, Postgres, BigQuery, DuckDB, TSQL, Spark Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
VaggelisD
left a comment
There was a problem hiding this comment.
Thank you for the quick & accurate iterations! Leaving a few more comments your way, should be close to merging soon (also, appreciate the a-b sorting of the imports as a cherry on top 😆)
add .assert_is(exp.Trunc) as chained function call Co-authored-by: Vaggelis Danias <daniasevangelos@gmail.com>
- Use TEMPORAL_TYPES instead of DATE/TIMESTAMP/DATETIME to correctly handle all temporal types (e.g. TIMESTAMPTZ, TIMESTAMPLTZ) - Group build_trunc conditions by return type (DateTrunc/Trunc/Anonymous) - Chain .assert_is() to validate_identity calls - Merge redundant validate_all tests - Add fallback tests for Anonymous case - Remove duplicate Spark TRUNC test Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
- test_trunc_type_inference: tests build_trunc discrimination logic (shared across Oracle, Exasol, Snowflake) - 5 cases covering temporal+?, ?+string, numeric+?, ?+int, ?+? fallback - test_trunc: Oracle-specific identity and transpilation tests Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
|
@VaggelisD modified according to your suggestions, thanks, please also let me know if the trailing comment in the main PR description ("Out of scope / potential follow-up:") is appropriate or better remove it from the commit message. |
|
Thank you!
It is fine for future reference, thanks for documenting it; The PR has already grown so it's great if we can now limit its scope to what's already there |
VaggelisD
left a comment
There was a problem hiding this comment.
Hey @doripo, I believe I have one final comment and a few nits left, really appreciate your cooperation. If anything else comes up after that we'll take it to the finish line by merging this PR and applying the fixes afterwards
I agree, we can definitely increase the test coverage by adding more TRUNC specific tests in oracle, snowflake & exasol files. We can add it as part of a separate PR since this is a big PR. Something like this for Oracle: Similarly we can do the same for snowflake (https://docs.snowflake.com/en/sql-reference/functions-date-time#label-supported-date-time-parts) Exasol: |
- Snowflake: date_trunc_requires_part=False (single-arg is numeric) - Oracle: default_date_trunc_unit="DD" (single-arg temporal uses DD) - Update tests for new behavior - Use validate_identity where SQL round-trips Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
- Use "truncation" (concept) instead of "TRUNC" (function name) - Consistent patterns: "Numeric truncation identity", "Date truncation with typed column and unit", "Cross-dialect numeric truncation transpilation" Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
|
The PR grew to more dialects so I attempted to aggregate their docs for our reference during reviewing:
|
VaggelisD
left a comment
There was a problem hiding this comment.
Final comments from me too, thank you!
| # Fallback to Anonymous (Exasol requires unit for date truncation) | ||
| self.validate_identity("TRUNC(CAST(x AS DATE))").assert_is(exp.Anonymous) |
There was a problem hiding this comment.
According to Exasol docs unit is optional so we could infer that this is date truncation here, right?
There was a problem hiding this comment.
I looked into Exasol's TRUNC default behavior. I can't say that the docs are 100% explicit, but:
- TRUNC/TRUNCATE docs say format is optional and "if a format is not specified, the value is truncated to days"
- DATE_TRUNC is described as "PostgreSQL compatible"
- The ROUND function docs reference "days from noon" as the rounding boundary, implying day-level granularity
This suggests DD (day truncation to midnight) matches Oracle's default, though not stated verbatim. Are you comfortable adding default_date_trunc_unit="DD" for Exasol, or would you prefer keeping the Anonymous fallback until we have firmer confirmation?
I have the fix ready (4 changed lines) just wanted to double check.
(continuing 77830a3#r2759813244)
There was a problem hiding this comment.
Related: opened new issue #6959 for DD / DAY unit transpilation
|
@VaggelisD that's a very good reference! I believe that these minor corrections to your table apply:
Based on my understanding. the current gaps are therefore quite small (about 20 lines of code, not including the tests):
I can address them either now or in a follow-up PR, as they are incremental improvements. |
|
Let's get any minor fixes in and then improve the parsing / transpilation in a follow-up to avoid increasing the scope more. |
|
Or, for Spark/Hive probably better to just cast to BIGINT with unsupported_args("decimals") - the 'smart' workaround is somewhat weird.. If I do that now, then all the code fixes are really minor - and just a few more tests. @georgesittas does that seem acceptable at this stage of the PR? |
|
Let's do it in a follow-up PR @doripo, we can apply any last one-liners here. |
fivetran-amrutabhimsenayachit
left a comment
There was a problem hiding this comment.
Thank you!
Style fixups Co-authored-by: Vaggelis Danias <daniasevangelos@gmail.com>
Add exp.Trunc expression class for numeric truncation, filling a gap alongside existing Round, Floor, and Ceil functions (in AST since 2021).
Implementation:
Tests:
Out of scope / potential follow-up: