Fix a +200% Windows performance regression caused by PR #4897. #8251
+11
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix a +200% Windows performance regression caused by PR #4897. On Windows with Visual Studio compiler, the constructor of a std::stringstream object is extremely slow, as it incurs a call to a std::locale ctor(), which in turn causes an access to some kind of process global locale mutex lock. (maybe to get the current system locale?).
Visual Studio profiler showed this hotspot as:
And Markus Stange's fantastic Samply profiler highlighted the size of the issue as
where 79% of total time in
wasm-optwas taken by thestd::stringstreamconstructor on Windows.Live link to the above profile: https://share.firefox.dev/4a2wen8
The slow behavior occurred with at least the following command lines:
and
The main issue is on the very hot function PassRunner::runPassOnFunction() that increased wasm-opt link times from ~20 seconds to ~60 seconds after #4897 on Windows when
wasm-opt -O2optimizing a large 33MB .wasm file.std::stringctor does not share this performance problem on Windows.Searched through the codebase for other possible uses of
std::stringstreamthat were "optional", and the functionPassRunner::run()also had a similar usage pattern, so fixed that too for consistency (even though that call site did not show up in profiles as hot).After this change, the wasm-opt time returned from ~60s to ~20s.