-
-
Notifications
You must be signed in to change notification settings - Fork 14.3k
JumpThreading: compute place and value indices on-demand #142881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
JumpThreading: compute place and value indices on-demand Perf experiment r? `@ghost`
This comment has been minimized.
This comment has been minimized.
|
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (1587bef): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 3.6%, secondary 2.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary -0.1%, secondary 0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 689.082s -> 684.233s (-0.70%) |
This comment has been minimized.
This comment has been minimized.
|
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
|
Some changes occurred in coverage tests. cc @Zalathar |
|
r? compiler |
d08b44c to
f2018e5
Compare
|
☔ The latest upstream changes (presumably #144469) made this pull request unmergeable. Please resolve the merge conflicts. |
This comment has been minimized.
This comment has been minimized.
|
☔ The latest upstream changes (presumably #142821) made this pull request unmergeable. Please resolve the merge conflicts. |
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
JumpThreading: compute place and value indices on-demand
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (27d3ab5): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 0.5%, secondary -2.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -1.9%, secondary -4.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.2%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 479.155s -> 476.389s (-0.58%) |
|
The perf results are quite compelling. Well done! @bors r+ |
|
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 38ed770 (parent) -> c7aa99f (this PR) Test differencesShow 3 test diffs3 doctest diffs were found. These are ignored, as they are noisy. Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard c7aa99f36ccad2c127edec5be493cab0b152f436 --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (c7aa99f): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowOur benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.0%, secondary -3.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 4.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.3%, secondary -0.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 486.937s -> 480.868s (-1.25%) |
|
Improvements in icounts heavily outweigh the one regression (which is also a proc-macro in an opt scenario: not a default cargo config) @rustbot label: +perf-regression-triaged |
Profiling JumpThreading reveals that a large part of the runtime happens constructing the place and value
Map. This is unfortunate, as jump-threading may end up not even doing anything.The cause for this large up-front cost is following:
Mapattempts to create aPlaceIndexfor each place that may hold a relevant value. This means all places that appear in MIR, but also all places whose value is accessed by a projection of a copy of a larger place.This PR refactors the creation of
Mapto happen on-demand: place and value indices are created when threading computation happens.The up-front mode is still relevant for DataflowConstProp, so is not touched.