⚡️ Speed up method Algorithms.fibonacci by 151%#1415
Closed
codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
Closed
⚡️ Speed up method Algorithms.fibonacci by 151%#1415codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
Algorithms.fibonacci by 151%#1415codeflash-ai[bot] wants to merge 1 commit intoomni-javafrom
Conversation
The optimized code achieves a **151% speedup (13.4ms → 5.35ms)** through three key micro-optimizations in the tight loop of the fast doubling Fibonacci algorithm: **1. Mask-based iteration eliminates variable shifts:** The original code uses `for (int i = highestBit; i >= 0; i--)` with `(n >>> i)` performing a variable-width right shift on every iteration. The optimized version replaces this with a pre-computed mask (`mask = 1 << highestBit`) that shifts right by a fixed amount (`mask >>= 1`). This change: - Eliminates the variable `i` and its decrement operation - Replaces expensive variable-width shifts (`n >>> i`) with a simpler fixed-width shift of the mask - Changes the bit test from `(n >>> i) & 1` to `n & mask`, reducing operations from shift+AND to just AND **2. Addition replaces left shift:** Changing `b << 1` to `b + b` is slightly faster on many processors because addition can execute in parallel with other ALU operations, whereas shifts may compete for the same execution units. While seemingly trivial, in a tight loop executing many iterations for large Fibonacci numbers, this compounds into measurable savings. **3. Eliminates repeated bit indexing:** The original code computes `(n >>> i) & 1` on each iteration, requiring both a shift of `n` and a mask operation. The optimized version tests `n & mask` once per iteration, where the mask is pre-positioned, eliminating the need to shift `n` repeatedly. **Why this matters:** For large Fibonacci indices, the loop executes O(log n) times (up to 31 iterations for 32-bit integers). These micro-optimizations—removing variable shifts, simplifying arithmetic, and reducing per-iteration operations—compound across iterations. The 2.5x runtime improvement demonstrates that even in algorithmically optimal O(log n) code, careful attention to low-level operations in hot loops can yield substantial performance gains. The optimizations preserve exact mathematical behavior, overflow semantics, and API compatibility while purely improving execution efficiency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 151% (1.51x) speedup for
Algorithms.fibonacciincode_to_optimize/java/src/main/java/com/example/Algorithms.java⏱️ Runtime :
13.4 milliseconds→5.35 milliseconds(best of5runs)📝 Explanation and details
The optimized code achieves a 151% speedup (13.4ms → 5.35ms) through three key micro-optimizations in the tight loop of the fast doubling Fibonacci algorithm:
1. Mask-based iteration eliminates variable shifts:
The original code uses
for (int i = highestBit; i >= 0; i--)with(n >>> i)performing a variable-width right shift on every iteration. The optimized version replaces this with a pre-computed mask (mask = 1 << highestBit) that shifts right by a fixed amount (mask >>= 1). This change:iand its decrement operationn >>> i) with a simpler fixed-width shift of the mask(n >>> i) & 1ton & mask, reducing operations from shift+AND to just AND2. Addition replaces left shift:
Changing
b << 1tob + bis slightly faster on many processors because addition can execute in parallel with other ALU operations, whereas shifts may compete for the same execution units. While seemingly trivial, in a tight loop executing many iterations for large Fibonacci numbers, this compounds into measurable savings.3. Eliminates repeated bit indexing:
The original code computes
(n >>> i) & 1on each iteration, requiring both a shift ofnand a mask operation. The optimized version testsn & maskonce per iteration, where the mask is pre-positioned, eliminating the need to shiftnrepeatedly.Why this matters:
For large Fibonacci indices, the loop executes O(log n) times (up to 31 iterations for 32-bit integers). These micro-optimizations—removing variable shifts, simplifying arithmetic, and reducing per-iteration operations—compound across iterations. The 2.5x runtime improvement demonstrates that even in algorithmically optimal O(log n) code, careful attention to low-level operations in hot loops can yield substantial performance gains.
The optimizations preserve exact mathematical behavior, overflow semantics, and API compatibility while purely improving execution efficiency.
✅ Correctness verification report:
⚙️ Click to see Existing Unit Tests
To edit these changes
git checkout codeflash/optimize-Algorithms.fibonacci-mlbg2v4tand push.