Skip to content

[EPIC] Optimize performance for slow expressions #2986

@andygrove

Description

@andygrove

What is the problem the feature request solves?

The following expressions are slower with Comet enabled, according to the benchmarks in #2984

This epic is for tracking progress on optimizing these. Separate issues should be created and linked to from this table. Some issues already exist (look for issues tagged with the performance label).

Also, I'd like to point out that this table was generated by AI and contains some duplicate entries, and may also have errors.

Strings

Benchmark File Expression Spark Time (ms) Comet Relative Slowdown
CometStringExpressionBenchmark octet_length 373.0 0.4X 60.0%
CometStringExpressionBenchmark trim 435.0 0.4X 60.0%
CometStringExpressionBenchmark ltrim 434.0 0.4X 60.0%
CometStringExpressionBenchmark rtrim 436.0 0.4X 60.0%
CometStringExpressionBenchmark repeat 720.0 0.4X 60.0%
CometStringExpressionBenchmark concat 595.0 0.5X 50.0%
CometStringExpressionBenchmark startswith 396.0 0.5X 50.0%
CometStringExpressionBenchmark ascii 405.0 0.6X 40.0%
CometStringExpressionBenchmark bit_length 451.0 0.6X 40.0%
CometStringExpressionBenchmark concat_ws 702.0 0.6X 40.0%
CometStringExpressionBenchmark instr 3805.0 0.6X 40.0%
CometStringExpressionBenchmark endswith 414.0 0.6X 40.0%
CometStringExpressionBenchmark chr 27.0 0.8X 20.0%
CometStringExpressionBenchmark space 28.0 0.8X 20.0%
CometStringExpressionBenchmark translate 28908.0 0.8X 20.0%
CometStringExpressionBenchmark initCap 4560.0 0.9X 10.0%
CometStringExpressionBenchmark rlike 3396.0 0.9X 10.0%

Date/Timestamp

Benchmark File Expression Spark Time (ms) Comet Relative Slowdown
CometDatetimeExpressionBenchmark Timestamp Truncate 134.0 0.6X 40.0%
CometDatetimeExpressionBenchmark Date Truncate 34.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Timestamp Extract - year 61.0 0.8X 20.0%
CometDatetimeExpressionBenchmark Date Extract - year 24.0 0.9X 10.0%
CometDatetimeExpressionBenchmark Date Arithmetic - date_add 24.0 0.9X 10.0%

Arrays

Benchmark File Expression Spark Time (ms) Comet Relative Slowdown
CometArrayExpressionBenchmark array_remove 12.0 0.5X 50.0%
CometArrayExpressionBenchmark array_compact 13.0 0.5X 50.0%
CometArrayExpressionBenchmark array_max 13.0 0.8X 20.0%
CometArrayExpressionBenchmark array_min 12.0 0.8X 20.0%
CometArrayExpressionBenchmark array_contains 15.0 0.9X 10.0%
CometArrayExpressionBenchmark array_distinct 14.0 0.9X 10.0%
CometArrayExpressionBenchmark array_append 12.0 0.9X 10.0%
CometArrayExpressionBenchmark arrays_overlap 12.0 0.9X 10.0%
CometArrayExpressionBenchmark array_insert 11.0 0.9X 10.0%
CometArrayExpressionBenchmark array_join 13.0 0.9X 10.0%

Hash

Benchmark File Expression Spark Time (ms) Comet Relative Slowdown
CometHashExpressionBenchmark xxhash64_multi 15.0 0.9X 10.0%
CometHashExpressionBenchmark murmur3_hash_single 13.0 0.9X 10.0%
CometHashExpressionBenchmark murmur3_hash_multi 14.0 0.9X 10.0%
CometHashExpressionBenchmark sha2_224 28.0 0.8X 20.0%
CometHashExpressionBenchmark sha2_256 29.0 0.8X 20.0%
CometHashExpressionBenchmark sha2_512 34.0 0.6X 40.0%
CometHashExpressionBenchmark sha2_384 34.0 0.7X 30.0%

Bitwise

Benchmark File Expression Spark Time (ms) Comet Relative Slowdown
CometBitwiseExpressionBenchmark shift_right_unsigned 10.0 0.9X 10.0%
CometBitwiseExpressionBenchmark shift_left 10.0 0.7X 30.0%
CometBitwiseExpressionBenchmark bitwise_or 12.0 0.8X 20.0%
CometBitwiseExpressionBenchmark bitwise_xor 11.0 0.8X 20.0%
CometBitwiseExpressionBenchmark bitwise_not 10.0 0.8X 20.0%
CometBitwiseExpressionBenchmark shift_right 10.0 0.8X 20.0%
CometBitwiseExpressionBenchmark bit_count 10.0 0.8X 20.0%

Cast

Benchmark File Expression Spark Time (ms) Comet Relative Slowdown
CometCastStringToNumericBenchmark CAST String to BYTE 59.0 0.8X 20.0%
CometCastStringToNumericBenchmark CAST String to SHORT 59.0 0.8X 20.0%
CometCastStringToNumericBenchmark CAST String to INT 56.0 0.8X 20.0%
CometCastStringToNumericBenchmark CAST String to LONG 59.0 0.8X 20.0%

Comparison

Benchmark File Expression Spark Time (ms) Comet Relative Slowdown
CometComparisonExpressionBenchmark greater_than 11.0 0.8X 20.0%
CometComparisonExpressionBenchmark is_null 10.0 0.8X 20.0%
CometComparisonExpressionBenchmark is_nan_float 10.0 0.8X 20.0%
CometComparisonExpressionBenchmark not_equal_to 13.0 0.9X 10.0%
CometComparisonExpressionBenchmark less_than 12.0 0.9X 10.0%
CometComparisonExpressionBenchmark less_than_or_equal 11.0 0.9X 10.0%
CometComparisonExpressionBenchmark greater_than_or_equal 11.0 0.9X 10.0%
CometComparisonExpressionBenchmark equal_null_safe 10.0 0.9X 10.0%
CometComparisonExpressionBenchmark is_not_null 10.0 0.9X 10.0%
CometComparisonExpressionBenchmark and 11.0 0.9X 10.0%
CometComparisonExpressionBenchmark or 11.0 0.9X 10.0%
CometComparisonExpressionBenchmark not 10.0 0.9X 10.0%
CometComparisonExpressionBenchmark in_list 10.0 0.9X 10.0%
CometComparisonExpressionBenchmark not_in_list 11.0 0.9X 10.0%

Math

Benchmark File Expression Spark Time (ms) Comet Relative Slowdown
CometMathExpressionBenchmark hex_int 11.0 0.7X 30.0%
CometMathExpressionBenchmark floor 10.0 0.8X 20.0%
CometMathExpressionBenchmark hex_long 11.0 0.8X 20.0%
CometMathExpressionBenchmark unhex 13.0 0.8X 20.0%
CometMathExpressionBenchmark unary_minus 10.0 0.8X 20.0%
CometMathExpressionBenchmark ceil 11.0 0.9X 10.0%
CometMathExpressionBenchmark round 19.0 0.9X 10.0%
CometMathExpressionBenchmark atan2 11.0 0.9X 10.0%
CometMathExpressionBenchmark log 11.0 0.9X 10.0%
CometMathExpressionBenchmark log10 11.0 0.9X 10.0%

Others

Benchmark File Expression Spark Time (ms) Comet Relative Slowdown
CometConditionalExpressionBenchmark Case When Expr 41.0 0.8X 20.0%
CometPredicateExpressionBenchmark in Expr 42.0 0.8X 20.0%
CometConditionalExpressionBenchmark If Expr 38.0 0.9X 10.0%

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions