Summary
Bulk runtimes are good. Understanding where a benchmark is spending most of its time is better.
I've looked at a few free open source tools
cProfile (included with python) + snakeviz for getting hotspot profiles and visualizing the results
lineProfiler for getting detailed line-by-line profiles
viztracer - to get trace and hotspot profiles
It'd be good to establish some practices we'd like to adopt to make comparisons of performance results a bit easier to understand; part of this is determining what tools we'll add to our toolkit and how we'll use them.
TLDR
I recommend using viztracer for general profiling aimed at identifying hotspots and understanding execution dependencies in code. Trace profiling visualized with perfetto is quite nice. Perfetto allows for generation of hotspot profiles on-the-fly; the default "timeline" view of the code execution makes it easy to see call stack relationships and concurrency while also being able to get a feel for where the most time is spent during execution.
Profilers
cProfile + snakeviz
cProfile is a nice sampling profiler that is built-in to python. It does not require any instrumentation in software to use, simply do the following to get a profile
python -m cProfile -o output.prof /path/to/program.py
The problem with this is that a ton of other boilerplate gets captured in the profile, including import calls, etc. To get around this, you can enclose a section of the code you want to profile with cProfile calls to start and stop profiling, e.g.
import cProfile
pr = cProfile.Profile()
pr.enable()
pset.execute(
runtime=np.timedelta64(24, "h"),
dt=np.timedelta64(60, "s"),
pyfunc=AdvectionEE,
verbose_progress=verbose_progress,
)
pr.disable()
# Write the profile to file
pr.dump_stats('pset-execute.prof')
visualization of the hotspot profile can be done with snakeviz. See example in NERSC documentation
The main issue I have with this is that some python calls can run concurrently; additionally, deep call stacks can become quite confusing in the icicle or "sunburst" viewers in snakeviz.
lineprofiler
Lineprofiler is useful when you have narrowed down regions of code you want to focus in on to get wall-times of each line of code. It has a bit of overhead for execution and is best when focusing in on specifc regions of code. I suspect this will shine when trying to optimize hotspot kernels, but may not be beneficial for initial application profiling.
viztracer
viztracer is a nice sampling profiler that collects detailed trace profiles during code execution. It requires no code instrumentation and can be used simply by doing
viztracer /path/to/program.py
And profiles can be viewed with vizviewer. Alternatively, viztracer has a VS Code extension that correlates lines of code with the graphical representation of the trace profile directly in vscode. Under the hood, vizviewer uses Perfetto . By selecting regions of time in the trace view, you can quickly get hotspot profiles for select regions of time, which can really help us understand what kernels are occupying the most wall-time.
Summary
Bulk runtimes are good. Understanding where a benchmark is spending most of its time is better.
I've looked at a few free open source tools
cProfile(included with python) +snakevizfor getting hotspot profiles and visualizing the resultslineProfilerfor getting detailed line-by-line profilesviztracer- to get trace and hotspot profilesIt'd be good to establish some practices we'd like to adopt to make comparisons of performance results a bit easier to understand; part of this is determining what tools we'll add to our toolkit and how we'll use them.
TLDR
I recommend using
viztracerfor general profiling aimed at identifying hotspots and understanding execution dependencies in code. Trace profiling visualized with perfetto is quite nice. Perfetto allows for generation of hotspot profiles on-the-fly; the default "timeline" view of the code execution makes it easy to see call stack relationships and concurrency while also being able to get a feel for where the most time is spent during execution.Profilers
cProfile + snakeviz
cProfile is a nice sampling profiler that is built-in to python. It does not require any instrumentation in software to use, simply do the following to get a profile
The problem with this is that a ton of other boilerplate gets captured in the profile, including
importcalls, etc. To get around this, you can enclose a section of the code you want to profile withcProfilecalls to start and stop profiling, e.g.visualization of the hotspot profile can be done with
snakeviz. See example in NERSC documentationThe main issue I have with this is that some python calls can run concurrently; additionally, deep call stacks can become quite confusing in the icicle or "sunburst" viewers in snakeviz.
lineprofiler
Lineprofiler is useful when you have narrowed down regions of code you want to focus in on to get wall-times of each line of code. It has a bit of overhead for execution and is best when focusing in on specifc regions of code. I suspect this will shine when trying to optimize hotspot kernels, but may not be beneficial for initial application profiling.
viztracer
viztracer is a nice sampling profiler that collects detailed trace profiles during code execution. It requires no code instrumentation and can be used simply by doing
And profiles can be viewed with
vizviewer. Alternatively,viztracerhas a VS Code extension that correlates lines of code with the graphical representation of the trace profile directly in vscode. Under the hood,vizvieweruses Perfetto . By selecting regions of time in the trace view, you can quickly get hotspot profiles for select regions of time, which can really help us understand what kernels are occupying the most wall-time.