Skip to content

[EPIC] Optimize performance for slow expressions #2986

@andygrove

Description

@andygrove

What is the problem the feature request solves?

The following expressions are slower with Comet enabled, according to the benchmarks in #2984

This epic is for tracking progress on optimizing these. Separate issues should be created and linked to from this table. Some issues already exist (look for issues tagged with the performance label).

Also, I'd like to point out that this table was generated by AI and contains some duplicate entries.

Benchmark File Expression Spark Time (ms) Comet Relative Slowdown
CometDatetimeExpressionBenchmark Timestamp Truncate - SECOND 30.0 0.4X 60.0%
CometDatetimeExpressionBenchmark Timestamp Truncate (Dictionary) - SECOND 25.0 0.4X 60.0%
CometStringExpressionBenchmark octet_length 373.0 0.4X 60.0%
CometStringExpressionBenchmark trim 435.0 0.4X 60.0%
CometStringExpressionBenchmark ltrim 434.0 0.4X 60.0%
CometStringExpressionBenchmark rtrim 436.0 0.4X 60.0%
CometStringExpressionBenchmark repeat 720.0 0.4X 60.0%
CometArrayExpressionBenchmark array_remove 12.0 0.5X 50.0%
CometArrayExpressionBenchmark array_compact 13.0 0.5X 50.0%
CometStringExpressionBenchmark concat 595.0 0.5X 50.0%
CometStringExpressionBenchmark startswith 396.0 0.5X 50.0%
CometDatetimeExpressionBenchmark Timestamp Truncate - YEAR 134.0 0.6X 40.0%
CometDatetimeExpressionBenchmark Timestamp Truncate - YYYY 127.0 0.6X 40.0%
CometDatetimeExpressionBenchmark Timestamp Truncate - YY 130.0 0.6X 40.0%
CometDatetimeExpressionBenchmark Timestamp Truncate - HOUR 82.0 0.6X 40.0%
CometDatetimeExpressionBenchmark Timestamp Truncate (Dictionary) - YEAR 126.0 0.6X 40.0%
CometDatetimeExpressionBenchmark Timestamp Truncate (Dictionary) - DD 97.0 0.6X 40.0%
CometDatetimeExpressionBenchmark Timestamp Truncate (Dictionary) - WEEK 107.0 0.6X 40.0%
CometHashExpressionBenchmark sha2_512 34.0 0.6X 40.0%
CometStringExpressionBenchmark ascii 405.0 0.6X 40.0%
CometStringExpressionBenchmark bit_length 451.0 0.6X 40.0%
CometStringExpressionBenchmark concat_ws 702.0 0.6X 40.0%
CometStringExpressionBenchmark instr 3805.0 0.6X 40.0%
CometStringExpressionBenchmark endswith 414.0 0.6X 40.0%
CometBitwiseExpressionBenchmark shift_left 10.0 0.7X 30.0%
CometDatetimeExpressionBenchmark Timestamp Truncate - MON 124.0 0.7X 30.0%
CometDatetimeExpressionBenchmark Timestamp Truncate - MONTH 126.0 0.7X 30.0%
CometDatetimeExpressionBenchmark Timestamp Truncate - MM 126.0 0.7X 30.0%
CometDatetimeExpressionBenchmark Timestamp Truncate - DAY 115.0 0.7X 30.0%
CometDatetimeExpressionBenchmark Timestamp Truncate - DD 117.0 0.7X 30.0%
CometDatetimeExpressionBenchmark Timestamp Truncate - WEEK 133.0 0.7X 30.0%
CometDatetimeExpressionBenchmark Timestamp Truncate (Dictionary) - MONTH 120.0 0.7X 30.0%
CometDatetimeExpressionBenchmark Timestamp Truncate (Dictionary) - MM 120.0 0.7X 30.0%
CometHashExpressionBenchmark sha2_384 34.0 0.7X 30.0%
CometMathExpressionBenchmark hex_int 11.0 0.7X 30.0%
CometArrayExpressionBenchmark array_max 13.0 0.8X 20.0%
CometArrayExpressionBenchmark array_min 12.0 0.8X 20.0%
CometBitwiseExpressionBenchmark bitwise_or 12.0 0.8X 20.0%
CometBitwiseExpressionBenchmark bitwise_xor 11.0 0.8X 20.0%
CometBitwiseExpressionBenchmark bitwise_not 10.0 0.8X 20.0%
CometBitwiseExpressionBenchmark shift_right 10.0 0.8X 20.0%
CometBitwiseExpressionBenchmark bit_count 10.0 0.8X 20.0%
CometCastStringToNumericBenchmark CAST String to BYTE 59.0 0.8X 20.0%
CometCastStringToNumericBenchmark CAST String to SHORT 59.0 0.8X 20.0%
CometCastStringToNumericBenchmark CAST String to INT 56.0 0.8X 20.0%
CometCastStringToNumericBenchmark CAST String to LONG 59.0 0.8X 20.0%
CometComparisonExpressionBenchmark greater_than 11.0 0.8X 20.0%
CometComparisonExpressionBenchmark is_null 10.0 0.8X 20.0%
CometComparisonExpressionBenchmark is_nan_float 10.0 0.8X 20.0%
CometConditionalExpressionBenchmark Case When Expr 41.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Truncate - YEAR 34.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Truncate - YYYY 31.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Truncate - YY 30.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Truncate - MON 27.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Truncate - MONTH 26.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Truncate - MM 27.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Truncate (Dictionary) - YEAR 29.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Truncate (Dictionary) - YYYY 29.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Truncate (Dictionary) - YY 29.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Truncate (Dictionary) - MON 26.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Truncate (Dictionary) - MM 26.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Timestamp Truncate - QUARTER 174.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Timestamp Truncate (Dictionary) - HOUR 102.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Timestamp Truncate (Dictionary) - QUARTER 170.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Timestamp Extract - year 61.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Timestamp Extract - month 61.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Timestamp Extract - day_of_month 62.0 0.8X 20.0%
CometHashExpressionBenchmark sha2_224 28.0 0.8X 20.0%
CometHashExpressionBenchmark sha2_256 29.0 0.8X 20.0%
CometMathExpressionBenchmark floor 10.0 0.8X 20.0%
CometMathExpressionBenchmark hex_long 11.0 0.8X 20.0%
CometMathExpressionBenchmark unhex 13.0 0.8X 20.0%
CometMathExpressionBenchmark unary_minus 10.0 0.8X 20.0%
CometPredicateExpressionBenchmark in Expr 42.0 0.8X 20.0%
CometStringExpressionBenchmark chr 27.0 0.8X 20.0%
CometStringExpressionBenchmark space 28.0 0.8X 20.0%
CometStringExpressionBenchmark translate 28908.0 0.8X 20.0%
CometArrayExpressionBenchmark array_contains 15.0 0.9X 10.0%
CometArrayExpressionBenchmark array_distinct 14.0 0.9X 10.0%
CometArrayExpressionBenchmark array_append 12.0 0.9X 10.0%
CometArrayExpressionBenchmark arrays_overlap 12.0 0.9X 10.0%
CometArrayExpressionBenchmark array_insert 11.0 0.9X 10.0%
CometArrayExpressionBenchmark array_join 13.0 0.9X 10.0%
CometBitwiseExpressionBenchmark shift_right_unsigned 10.0 0.9X 10.0%
CometComparisonExpressionBenchmark not_equal_to 13.0 0.9X 10.0%
CometComparisonExpressionBenchmark less_than 12.0 0.9X 10.0%
CometComparisonExpressionBenchmark less_than_or_equal 11.0 0.9X 10.0%
CometComparisonExpressionBenchmark greater_than_or_equal 11.0 0.9X 10.0%
CometComparisonExpressionBenchmark equal_null_safe 10.0 0.9X 10.0%
CometComparisonExpressionBenchmark is_not_null 10.0 0.9X 10.0%
CometComparisonExpressionBenchmark and 11.0 0.9X 10.0%
CometComparisonExpressionBenchmark or 11.0 0.9X 10.0%
CometComparisonExpressionBenchmark not 10.0 0.9X 10.0%
CometComparisonExpressionBenchmark in_list 10.0 0.9X 10.0%
CometComparisonExpressionBenchmark not_in_list 11.0 0.9X 10.0%
CometConditionalExpressionBenchmark If Expr 38.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Date Truncate (Dictionary) - MONTH 26.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Timestamp Truncate (Dictionary) - YYYY 189.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Timestamp Truncate (Dictionary) - MON 174.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Timestamp Truncate (Dictionary) - DAY 139.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Date Extract - year 24.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Date Extract - month 25.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Date Extract - day_of_month 25.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Date Extract (Dictionary) - year 24.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Date Extract (Dictionary) - month 25.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Date Extract (Dictionary) - day_of_month 24.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Timestamp Extract - hour 51.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Timestamp Extract - minute 52.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Timestamp Extract - second 51.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Date Arithmetic - date_add 24.0 0.9X 10.0%
CometHashExpressionBenchmark xxhash64_multi 15.0 0.9X 10.0%
CometHashExpressionBenchmark murmur3_hash_single 13.0 0.9X 10.0%
CometHashExpressionBenchmark murmur3_hash_multi 14.0 0.9X 10.0%
CometMathExpressionBenchmark ceil 11.0 0.9X 10.0%
CometMathExpressionBenchmark round 19.0 0.9X 10.0%
CometMathExpressionBenchmark atan2 11.0 0.9X 10.0%
CometMathExpressionBenchmark log 11.0 0.9X 10.0%
CometMathExpressionBenchmark log10 11.0 0.9X 10.0%
CometStringExpressionBenchmark initCap 4560.0 0.9X 10.0%
CometStringExpressionBenchmark rlike 3396.0 0.9X 10.0%

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions