Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 6, 2026

Which issue does this PR close?

N/A

Rationale for this change

This follows on from #3017 and makes some additional optimizations.

See comment #3048 (comment) for criterion benchmark results.

What changes are included in this PR?

  • use trim_ascii instead of custom trim logic
  • use two loops in do_parse_string_to_int_legacy and remove mutable parse_sign_and_digits
  • stop matching on eval mode for every row (do it once per batch instead)

How are these changes tested?

Existing tests.


macro_rules! cast_utf8_to_int {
($array:expr, $eval_mode:expr, $array_type:ty, $cast_method:ident) => {{
($array:expr, $array_type:ty, $parse_fn:expr) => {{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass in the parsing function as a closure rather than passing in the eval mode and then matching on that for each row.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats super nice

Comment on lines -2246 to -2250
match eval_mode {
EvalMode::Legacy => do_parse_string_to_int_legacy(str, min_value),
EvalMode::Ansi => do_parse_string_to_int_ansi(str, type_name, min_value),
EvalMode::Try => do_parse_string_to_int_try(str, min_value),
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was happening once per row

@andygrove
Copy link
Member Author

Current performance compared to main branch:

cast_string_to_int/legacy/i8
                        time:   [5.9209 µs 5.9392 µs 5.9592 µs]
                        change: [−11.695% −11.370% −11.025%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
cast_string_to_int/legacy/i16
                        time:   [6.0520 µs 6.0623 µs 6.0752 µs]
                        change: [−19.180% −16.832% −14.698%] (p = 0.00 < 0.05)
                        Performance has improved.
cast_string_to_int/legacy/i32
                        time:   [10.351 µs 10.465 µs 10.588 µs]
                        change: [−8.5234% −7.6387% −6.8160%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  11 (11.00%) low mild
cast_string_to_int/legacy/i64
                        time:   [9.8147 µs 9.8431 µs 9.8750 µs]
                        change: [−20.510% −20.227% −19.963%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

cast_string_to_int/ansi/i8
                        time:   [5.9662 µs 5.9738 µs 5.9833 µs]
                        change: [+3.5123% +3.8189% +4.1332%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe
cast_string_to_int/ansi/i16
                        time:   [6.0278 µs 6.0633 µs 6.0965 µs]
                        change: [+4.7100% +5.2768% +5.8408%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  17 (17.00%) high mild
  1 (1.00%) high severe
cast_string_to_int/ansi/i32
                        time:   [11.561 µs 12.055 µs 12.488 µs]
                        change: [−4.3485% +0.7326% +5.7014%] (p = 0.77 > 0.05)
                        No change in performance detected.
cast_string_to_int/ansi/i64
                        time:   [8.9904 µs 9.0207 µs 9.0561 µs]
                        change: [−12.821% −12.447% −12.056%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  6 (6.00%) high mild
  15 (15.00%) high severe

cast_string_to_int/try/i8
                        time:   [5.3065 µs 5.3077 µs 5.3088 µs]
                        change: [−12.228% −11.963% −11.694%] (p = 0.00 < 0.05)
                        Performance has improved.
cast_string_to_int/try/i16
                        time:   [5.3181 µs 5.3339 µs 5.3510 µs]
                        change: [−14.996% −12.856% −11.115%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  9 (9.00%) high mild
  5 (5.00%) high severe
cast_string_to_int/try/i32
                        time:   [8.9089 µs 8.9433 µs 8.9790 µs]
                        change: [−23.521% −21.359% −19.212%] (p = 0.00 < 0.05)
                        Performance has improved.
cast_string_to_int/try/i64
                        time:   [8.9110 µs 8.9753 µs 9.0582 µs]
                        change: [−33.677% −30.213% −26.785%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

cast_string_to_int/legacy_decimals/i32
                        time:   [10.346 µs 10.361 µs 10.383 µs]
                        change: [−16.404% −15.946% −15.512%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  6 (6.00%) high mild
  10 (10.00%) high severe
cast_string_to_int/legacy_decimals/i64
                        time:   [10.409 µs 10.434 µs 10.467 µs]
                        change: [−18.687% −17.862% −17.208%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  7 (7.00%) high mild
  11 (11.00%) high severe

@codecov-commenter
Copy link

codecov-commenter commented Jan 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.50%. Comparing base (f09f8af) to head (1eb4b39).
⚠️ Report is 832 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3048      +/-   ##
============================================
+ Coverage     56.12%   59.50%   +3.37%     
- Complexity      976     1381     +405     
============================================
  Files           119      167      +48     
  Lines         11743    15522    +3779     
  Branches       2251     2575     +324     
============================================
+ Hits           6591     9236    +2645     
- Misses         4012     4989     +977     
- Partials       1140     1297     +157     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove changed the title minor: small improvements in cast from string to int [WIP] perf: Additional optimizations for cast from string to int [WIP] Jan 6, 2026
@andygrove andygrove changed the title perf: Additional optimizations for cast from string to int [WIP] perf: Additional optimizations for cast from string to int Jan 6, 2026
@andygrove andygrove marked this pull request as ready for review January 7, 2026 16:19
.map(|v| v as i16))
fn parse_string_to_i8_legacy(str: &str) -> SparkResult<Option<i8>> {
match do_parse_string_to_int_legacy::<i32>(str, i32::MIN)? {
None => Ok(None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
None => Ok(None),

?

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove it is similar to #2600 (comment) suggestion and thanks for addressing those issues

@andygrove
Copy link
Member Author

Thanks @andygrove it is similar to #2600 (comment) suggestion and thanks for addressing those issues

I'll take a look next.

@andygrove andygrove merged commit c7aad67 into apache:main Jan 8, 2026
119 checks passed
@andygrove andygrove deleted the cast-to-int-perf branch January 8, 2026 18:34
@coderfender
Copy link
Contributor

thank you @andygrove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants