-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
factor:performance tuning #9261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
|
could you please run hyperfine with the three programs? gnu, without the patch and with the patch |
|
GNU testsuite comparison: |
Implementation DetailsGMP 6.3.0 and GNU coreutils 9.5 were built and installed from source Created factor_numbers_u128_repeat.txt (60 lines) as benchmark input, containing 6 composite numbers ranging from 64 to 128 bits repeated 10 times. Confirmed factorization completion across all 3 implementations and reran Hyperfine.
To reduce variance, we adjusted to 3 warm-ups + 12 measurements, but the Rust version still shows relatively high dispersion due to its randomized algorithm. For greater stability, consider running at times of low system load or using CPU pinning. For factor_numbers.txt (max ~260 bits), both the GNU version and the patched version achieved complete factorization. The old implementation returned |
src/uu/factor/src/factor.rs
Outdated
| return true; | ||
| } | ||
| // even check: candidate % 2 == 0 | ||
| if (candidate & BigUint::from_u32(1).unwrap()).is_zero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe create a function is_even
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/uu/factor/src/factor.rs
Outdated
| let mut odd_component = candidate - &one; | ||
| let mut power_of_two = 0u32; | ||
| // while odd_component is even | ||
| while (&odd_component & BigUint::from_u32(1).unwrap()).is_zero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
esp as it is done here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/uu/factor/src/factor.rs
Outdated
| // Use a deterministic LCG to generate parameter sequences. | ||
| fn lcg_next(x: &mut u128) { | ||
| *x = x | ||
| .wrapping_mul(6364136223846793005) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move this magic number into a variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/uu/factor/src/factor.rs
Outdated
| fn lcg_next(x: &mut u128) { | ||
| *x = x | ||
| .wrapping_mul(6364136223846793005) | ||
| .wrapping_add(1442695040888963407); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| // Search parameters: choose bounds based on bit length. | ||
| // Avoid overly large limits; when exhausted, treat as failure to find a factor. | ||
| let max_tries: u64 = 16; | ||
| let max_iter: u64 = (bits * bits).clamp(10_000, 200_000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why these values ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, could this overflow ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're setting this number for now as we fine-tune and determine the value.
Since the maximum number of times is set, it will stop.
src/uu/factor/src/factor.rs
Outdated
| let max_tries: u64 = 16; | ||
| let max_iter: u64 = (bits * bits).clamp(10_000, 200_000); | ||
|
|
||
| let mut seed: u128 = 0x9e3779b97f4a7c15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, add comment explain what it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/uu/factor/src/factor.rs
Outdated
|
|
||
| while current_gcd == one && iter < max_iter { | ||
| // Brent variant: use batched gcd. | ||
| let mut inner_iter = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename this variable for something more meaningful
like
batch_iter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/uu/factor/src/factor.rs
Outdated
|
|
||
| // If n is small enough, use num_prime's factorize128 for speed. | ||
| if n.bits() <= 128 { | ||
| if let Ok(x128) = n.to_string().parse::<u128>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe investigate using a BigUint function directly here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
CodSpeed Performance ReportMerging #9261 will improve performance by 19.21%Comparing Summary
Benchmarks breakdown
Footnotes
|
|
GNU testsuite comparison: |
09d5c51 to
fe34cf2
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
Any idea why codspeed does not detect it? |
If I were to consider it, I would create test cases with large integers. |
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
- Add num-integer dependency to support enhanced numeric operations. - Refactor factorization logic to avoid redundant parsing and optimize u64/u128 paths. - Improve handling of non-positive and invalid inputs to align with GNU factor behavior. - Enhance large BigUint factoring with additional algorithms and clearer limitations.
0dfc79b to
aea625a
Compare
- Integrate jemalloc allocator in factor benchmark suite for better memory profiling - Add jemalloc-ctl and jemallocator dependencies with OS-specific dev-dependencies - Implement logging of allocated and resident memory stats before benchmark runs - Update CI workflow to show output for uu_factor benchmarks without suppressing it - Enables precise memory usage tracking on Linux, macOS, and FreeBSD during benchmarking
Add technical terms for memory allocation libraries to the cspell dictionary to prevent false positives in spellchecking.
|
GNU testsuite comparison: |
Performance improvement for large numbers
fix this issue
https://bugs.launchpad.net/ubuntu/+source/rust-coreutils/+bug/2131212