Skip to content

Conversation

@mattsu2020
Copy link
Contributor

@mattsu2020 mattsu2020 commented Nov 13, 2025

Performance improvement for large numbers

fix this issue
https://bugs.launchpad.net/ubuntu/+source/rust-coreutils/+bug/2131212

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

1 similar comment
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

@sylvestre
Copy link
Contributor

could you please run hyperfine with the three programs? gnu, without the patch and with the patch
and share the full results here? thanks :)

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

@mattsu2020
Copy link
Contributor Author

could you please run hyperfine with the three programs? gnu, without the patch and with the patch and share the full results here? thanks :)

Implementation Details

GMP 6.3.0 and GNU coreutils 9.5 were built and installed from source

Created factor_numbers_u128_repeat.txt (60 lines) as benchmark input, containing 6 composite numbers ranging from 64 to 128 bits repeated 10 times. Confirmed factorization completion across all 3 implementations and reran Hyperfine.
All commands used the release profile (target/profiling/factor).
Hyperfine execution results
Command: hyperfine --warmup 3 --runs 12 “ < factor_numbers_u128_repeat.txt”

Implementation   Average time (s) Standard deviation (s) Minimum–Maximum (s)
GNU coreutils 9.5 ( local-gnu/bin/factor) 6.718 0.106 6.594 – 7.020
Old implementation (prev_worktree/target/profiling/factor) 6.125 1.942 2.648 – 8.508
After patch application (target/profiling/factor) 6.993 1.585 4.299 – 9.457

To reduce variance, we adjusted to 3 warm-ups + 12 measurements, but the Rust version still shows relatively high dispersion due to its randomized algorithm. For greater stability, consider running at times of low system load or using CPU pinning.
Behavior with inputs exceeding 128 bits

For factor_numbers.txt (max ~260 bits), both the GNU version and the patched version achieved complete factorization. The old implementation returned factor: Factorization incomplete. Remainders exist. and exited with exit code 1. This confirms the improvement in support for large integers.
factor_numbers_u128_repeat.txt

return true;
}
// even check: candidate % 2 == 0
if (candidate & BigUint::from_u32(1).unwrap()).is_zero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe create a function is_even

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let mut odd_component = candidate - &one;
let mut power_of_two = 0u32;
// while odd_component is even
while (&odd_component & BigUint::from_u32(1).unwrap()).is_zero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

esp as it is done here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Use a deterministic LCG to generate parameter sequences.
fn lcg_next(x: &mut u128) {
*x = x
.wrapping_mul(6364136223846793005)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this magic number into a variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

fn lcg_next(x: &mut u128) {
*x = x
.wrapping_mul(6364136223846793005)
.wrapping_add(1442695040888963407);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Search parameters: choose bounds based on bit length.
// Avoid overly large limits; when exhausted, treat as failure to find a factor.
let max_tries: u64 = 16;
let max_iter: u64 = (bits * bits).clamp(10_000, 200_000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these values ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, could this overflow ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're setting this number for now as we fine-tune and determine the value.
Since the maximum number of times is set, it will stop.

let max_tries: u64 = 16;
let max_iter: u64 = (bits * bits).clamp(10_000, 200_000);

let mut seed: u128 = 0x9e3779b97f4a7c15;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, add comment explain what it is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


while current_gcd == one && iter < max_iter {
// Brent variant: use batched gcd.
let mut inner_iter = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename this variable for something more meaningful
like
batch_iter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// If n is small enough, use num_prime's factorize128 for speed.
if n.bits() <= 128 {
if let Ok(x128) = n.to_string().parse::<u128>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe investigate using a BigUint function directly here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@codspeed-hq
Copy link

codspeed-hq bot commented Nov 16, 2025

CodSpeed Performance Report

Merging #9261 will improve performance by 19.21%

Comparing mattsu2020:factor_fix (c0333f3) with main (502f3b1)

Summary

⚡ 1 improvement
✅ 126 untouched
⏩ 6 skipped1

Benchmarks breakdown

Benchmark BASE HEAD Efficiency
factor_multiple_u64s[2] 212.4 ms 178.2 ms +19.21%

Footnotes

  1. 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

@sylvestre
Copy link
Contributor

Any idea why codspeed does not detect it?

@mattsu2020
Copy link
Contributor Author

Any idea why codspeed does not detect it?

If I were to consider it, I would create test cases with large integers.

@github-actions
Copy link

github-actions bot commented Dec 7, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

1 similar comment
@github-actions
Copy link

github-actions bot commented Dec 8, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

- Add num-integer dependency to support enhanced numeric operations.
- Refactor factorization logic to avoid redundant parsing and optimize u64/u128 paths.
- Improve handling of non-positive and invalid inputs to align with GNU factor behavior.
- Enhance large BigUint factoring with additional algorithms and clearer limitations.
- Integrate jemalloc allocator in factor benchmark suite for better memory profiling
- Add jemalloc-ctl and jemallocator dependencies with OS-specific dev-dependencies
- Implement logging of allocated and resident memory stats before benchmark runs
- Update CI workflow to show output for uu_factor benchmarks without suppressing it
- Enables precise memory usage tracking on Linux, macOS, and FreeBSD during benchmarking
Add technical terms for memory allocation libraries to the cspell dictionary to prevent false positives in spellchecking.
@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/tail/follow-name is no longer failing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants