Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
submission.*
target/
scratch.md
*claude
*.zip
116 changes: 116 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ dirs = "5.0"
serde_yaml = "0.9"
webbrowser = "0.8"
base64-url = "3.0.0"
base64 = "0.22"
chrono = "0.4"
urlencoding = "2.1.3"
bytes = "1.10.1"
futures-util = "0.3.31"
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ A command-line interface tool for submitting solutions to the [Popcorn Discord B

Tested on linux and mac but should just work on Windows as well.

## New: Nsight Compute Profiling

Profile your kernels with `--mode profile` and get detailed metrics. Currently only available for the NVFP4 Blackwell competition (Modal, which we use for other competitions, does not support NCU). See [docs/profiling.md](docs/profiling.md) for details.

## Installation

### Option 1: Using pre-built binaries (Recommended)
Expand Down
65 changes: 65 additions & 0 deletions docs/profiling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Nsight Compute Profiling

Profile your kernels directly from the CLI and get detailed Nsight Compute metrics. This is particularly useful for the NVIDIA NVFP4 Blackwell competition where you need to optimize tensor core utilization.

**Note:** Profiling is currently only available for the NVFP4 Blackwell competition. Modal, which we use for other competitions, does not support NCU.

## Quick Start

```bash
popcorn-cli submit submission.py --leaderboard nvfp4_dual_gemm --gpu NVIDIA --mode profile --no-tui
```

## Expected Output

The profiler returns three key metric tables for each benchmark:

**GPU Throughput** - Overall utilization:
```
Metric Name Metric Unit Metric Value
---------------- ----------- ------------
Memory [%] % 32.48
Compute (SM) [%] % 13.23
```

**Pipe Utilization** - Which pipelines are active:
```
Metric Name Metric Unit Metric Value
-------------------- ----------- ------------
TC % 16.67
TMEM (Tensor Memory) % 15.27
Tensor (FP) % 12.58
ALU % 2.38
TMA % 0.29
```

**Warp State** - Where your warps are stalling:
```
Metric Name Metric Unit Metric Value
------------------------ ----------- ------------
Stall Long Scoreboard inst 18.31
Stall Wait inst 1.88
Stall Short Scoreboard inst 1.23
Selected inst 1.00
Stall Barrier inst 0.75
```

## Trace Files

After profiling, a zip file is saved to your current directory:
```
profile_20260113_031052_run0.zip
```

This contains a `.ncu-rep` file (the full Nsight Compute report):
```
$ unzip -l profile_20260113_031052_run0.zip
Length Date Time Name
--------- ---------- ----- ----
2178383 01-13-2026 03:10 profile.ncu-rep
```

You can open this file in the Nsight Compute GUI for detailed analysis:
```bash
ncu-ui profile.ncu-rep
```
2 changes: 1 addition & 1 deletion src/cmd/submit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ impl App {
),
SubmissionModeItem::new(
"Profile".to_string(),
"Profile is currently supported only via Discord. We'll add this feature to the CLI soon.".to_string(),
"Profile the solution using Nsight Compute (NVIDIA) or rocPROF (AMD). Downloads profiling data to current directory.".to_string(),
"profile".to_string(),
),
];
Expand Down
Loading