[Discussion] Getting profiles of benchmark runs

## Summary
Bulk runtimes are good. Understanding where a benchmark is spending most of its time is better.
I've looked at a few free open source tools

* `cProfile` (included with python) + `snakeviz` for getting hotspot profiles and visualizing the results
* `lineProfiler` for getting detailed line-by-line profiles
* [`viztracer`](https://github.com/gaogaotiantian/viztracer) - to get trace and hotspot profiles 

It'd be good to establish some practices we'd like to adopt to make comparisons of performance results a bit easier to understand; part of this is determining what tools we'll add to our toolkit and how we'll use them.

## TLDR
I recommend using `viztracer` for general profiling aimed at identifying hotspots and understanding execution dependencies in code. Trace profiling visualized with perfetto is quite nice. Perfetto allows for generation of hotspot profiles on-the-fly; the default "timeline" view of the code execution makes it easy to see call stack relationships and concurrency while also being able to get a feel for where the most time is spent during execution.

## Profilers

### cProfile + snakeviz
cProfile is a nice sampling profiler that is built-in to python. It does not require any instrumentation in software to use, simply do the following to get a profile
```
python -m cProfile -o output.prof /path/to/program.py
```
The problem with this is that a ton of other boilerplate gets captured in the profile, including `import` calls, etc. To get around this, you can enclose a section of the code you want to profile with `cProfile` calls to start and stop profiling, e.g.

```
    import cProfile
    pr = cProfile.Profile()
    pr.enable()
    pset.execute(
        runtime=np.timedelta64(24, "h"),
        dt=np.timedelta64(60, "s"),
        pyfunc=AdvectionEE,
        verbose_progress=verbose_progress,
    )
    pr.disable()
    # Write the profile to file
    pr.dump_stats('pset-execute.prof')
```

visualization of the hotspot profile can be done with `snakeviz`. See example in [NERSC documentation](https://docs.nersc.gov/development/languages/python/profiling-debugging-python/#snakeviz)

The main issue I have with this is that some python calls can run concurrently; additionally, deep call stacks can become quite confusing in the icicle or "sunburst" viewers in snakeviz.

### lineprofiler
Lineprofiler is useful when you have narrowed down regions of code you want to focus in on to get wall-times of each line of code. It has a bit of overhead for execution and is best when focusing in on specifc regions of code. I suspect this will shine when trying to optimize hotspot kernels, but may not be beneficial for initial application profiling.

### viztracer
viztracer is a nice sampling profiler that collects detailed trace profiles during code execution. It requires no code instrumentation and can be used simply by doing
```
viztracer /path/to/program.py
```
And profiles can be viewed with `vizviewer`. Alternatively, `viztracer` has a [VS Code extension](https://marketplace.visualstudio.com/items?itemName=gaogaotiantian.viztracer-vscode) that correlates lines of code with the graphical representation of the trace profile directly in vscode. Under the hood, `vizviewer` uses [Perfetto](https://perfetto.dev/) . By selecting regions of time in the trace view, you can quickly get hotspot profiles for select regions of time, which can really help us understand what kernels are occupying the most wall-time.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Getting profiles of benchmark runs #3

Summary

TLDR

Profilers

cProfile + snakeviz

lineprofiler

viztracer

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Discussion] Getting profiles of benchmark runs #3

Description

Summary

TLDR

Profilers

cProfile + snakeviz

lineprofiler

viztracer

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions