Use num_chiplets by dhernandez0 · Pull Request #2168 · ROCm/rocMLIR

dhernandez0 · 2025-12-11T13:55:40Z

Motivation

Use num_chiplets for XCD remapping and during tuning.

Please review ROCm/MITuna#1018 as well

Technical Details

Added num_chiplets to:

ConvGenerator
rocmlir-gen: add num_chiplets when generating a kernel
getNumChiplets() and getNumChipletsValue() to GetRockInfo.cpp
GridLayoutEmitter.cpp: revert hack and use num_chiplets
RockTuningImpl.cpp: add num_chiplets to tuning file
gridwise_gemm_accel_lowering.mlir: update MI308 XCD remapping test
python scripts: added num_chiplets while tuning

Test Plan

PR test and nightly.

Test Result

PR CI: https://ml-ci-internal.amd.com/job/MLIR/job/mlir/job/PR-2168/11/pipeline-overview/
nightly CI: https://ml-ci-internal.amd.com/job/MLIR/job/mlir/job/PR-2168/17/pipeline-overview/

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

umangyadav

These changes seem more intrusive as we need to keep track of num_chiplets and various places including changing problem config string.

Did we try using "native" ?

umangyadav · 2025-12-11T14:15:58Z

    if (succeeded(maybeNumCU)) {
      gpuFunc->setAttr("num_cu", b.getI64IntegerAttr(maybeNumCU.value()));
    }
+    FailureOr<int64_t> maybeChiplets = rock::getNumChiplets(theFunc);


Why does it need num_chiplets attribute when lowering to GPU ?

you're right, this doesn't have to be here. Removing it.

umangyadav · 2025-12-11T14:20:28Z

    case rock::TransformType::Pad:
-    case rock::TransformType::Slice:
    case rock::TransformType::Embed:
+    case rock::TransformType::Slice:


This change seems unrelated to PR

Copilot

Pull request overview

This PR adds num_chiplets support throughout the ROCm MLIR codebase for XCD (chiplet) remapping and tuning. The changes enable proper chiplet awareness for multi-chiplet GPUs like MI300 series, replacing previous hardcoded heuristics with explicit chiplet count parameters.

Key Changes

Added num_chiplets parameter to Python tuning infrastructure and all configuration classes (Conv, GEMM, Attention, etc.)
Implemented getNumChiplets() and getNumChipletsValue() functions in C++ to retrieve chiplet information from operations
Updated GridLayoutEmitter to use explicit numChiplets parameter instead of hardcoded logic for XCD remapping

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
mlir/utils/performance/tuningRunner.py	Added num_chiplets to Options class and propagated to configuration parsing
mlir/utils/performance/reportUtils.py	Updated all test parameter lists to include numChiplets field
mlir/utils/performance/perfRunner.py	Added num_chiplets to all configuration classes and their methods; implemented get_num_chiplets() function
mlir/utils/performance/parameterSweeps.py	Added num_chiplets to Options and imported get_num_chiplets function
mlir/utils/performance/attentionSweeps.py	Imported and used get_num_chiplets for attention configuration
mlir/utils/performance/handleNewConfigs.py	Removed unused global variables (ARCH, CHIP, NUM_CU)
mlir/utils/performance/common/benchmarkUtils.cpp	Updated command-line parsing comments to include num_chiplets
mlir/tools/rocmlir-gen/rocmlir-gen.cpp	Added numChiplets command-line option and attribute handling for all kernel types
mlir/lib/Dialect/Rock/IR/GetRockInfo.cpp	Implemented getNumChiplets() and getNumChipletsValue() functions
mlir/lib/Dialect/Rock/Transforms/GridLayoutEmitter.cpp	Removed hardcoded getNumChiplets() function; now uses explicit parameter
mlir/lib/Dialect/Rock/Transforms/GridLayoutEmitter.h	Updated function signatures to accept numChiplets parameter
mlir/lib/Dialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp	Updated all grid layout calls to pass num_chiplets value
mlir/lib/Dialect/Rock/Tuning/RockTuningImpl.cpp	Added numChiplets to tuning problem strings
mlir/lib/Dialect/Rock/Tuning/ConvContext.cpp	Added numChiplets to ConvolutionContext structure
mlir/lib/Dialect/Rock/Generator/ConvGenerator.cpp	Added num_chiplets support to ConvGenerator configuration
mlir/include/mlir/Dialect/Rock/Tuning/ConvContext.h	Added numChiplets field to ConvolutionContext
mlir/include/mlir/Dialect/Rock/IR/GetRockInfo.h	Added function declarations for getNumChiplets() and getNumChipletsValue()
mlir/include/mlir/Dialect/Rock/Generator/ConvGenerator.h	Added num_chiplets to Config struct and constructor
mlir/include/mlir-c/Dialect/MIGraphX.h	Updated comment to mention num_chiplets requirement
mlir/test/Dialect/Rock/gridwise_gemm_accel_lowering.mlir	Updated test attributes to include num_chiplets
mlir/test/CAPI/reduce_fusible.cpp	Updated test MLIR to include num_chiplets attribute

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-11T15:08:06Z

-            f'--dilation_w={self.dilation_w}', f'--conv_stride_h={self.conv_stride_h}',
-            f'--conv_stride_w={self.conv_stride_w}', f'--padding_h={self.padding_h}',
-            f'--padding_w={self.padding_w}', f'--groupsize={self.group}', f'--gemmO={self.o}',
+            f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}'


Missing comma between f-strings. The string concatenation on lines 1195-1196 is missing a comma after the num_chiplets f-string, which will cause the two strings to be concatenated without a space, resulting in malformed command-line arguments like "--num_chiplets=8--fil_layout=...".

Suggested change

f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}'

f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}',

@dhernandez0

Copilot · 2025-12-11T15:08:07Z

+  AmdArchInfo archInfo = rock::lookupArchInfo(arch);
+  if (numChiplets.getValue().getSExtValue() > archInfo.maxNumXCC) {
+    return op->emitError() << "num_chiplets=" << numChiplets
+                           << " cannot be greather than arch maxNumXCC="


Spelling error: "greather" should be "greater".

Suggested change

<< " cannot be greather than arch maxNumXCC="

<< " cannot be greater than arch maxNumXCC="

dhernandez0 · 2025-12-11T15:15:39Z

These changes seem more intrusive as we need to keep track of num_chiplets and various places including changing problem config string.

Did we try using "native" ?

So, in tuningRunner.py we decide what num_cu and num_chiplets we are tuning for. Then, I think it's a good idea to save it, otherwise if we save "native" or "gfx942" only in the tuning file: was gfx942 tuned in CPX mode? was it MI308?

justinrosner

As part of your testing have you run perfRunner to make sure that everything is working as intended?

justinrosner · 2025-12-12T14:34:33Z

-            f'--dilation_w={self.dilation_w}', f'--conv_stride_h={self.conv_stride_h}',
-            f'--conv_stride_w={self.conv_stride_w}', f'--padding_h={self.padding_h}',
-            f'--padding_w={self.padding_w}', f'--groupsize={self.group}', f'--gemmO={self.o}',
+            f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}'


@dhernandez0

justinrosner · 2025-12-12T14:38:02Z

+def get_num_chiplets(chip, num_cu):
+    # TODO: use AmdArchDb python bindings
+    if "gfx942" in chip and num_cu == 304:
+        return 8
+    if "gfx942" in chip and num_cu == 80:
+        return 4
+    if "gfx950" in chip:
+        return 8
+
+    return 1


I thought that we already have support for the AmdArchDb python bindings? Is there something else blocking us from making that switch?

This may not work for DPX, CPX modes

I don't think they are used anywhere yet. I wasn't able to make them work. It appears they need ninja amd_arch_db so we will have to change Jenkinsfile etc. This is out of the scope of this PR IMO, we want a PR to enable using it (there are lots of TODOs that need to be fixed anyway regarding python bindings).

This may not work for DPX, CPX modes

when python bindings work we'll be able to do query "native" here to get the number of chiplets.

pabloantoniom · 2025-12-12T15:31:55Z

+FailureOr<int64_t> mlir::rock::getNumChiplets(Operation *op) {
+  FailureOr<StringAttr> maybeArch = getArch(op);
+  if (failed(maybeArch)) {
+    return failure();


Can we add LLVM_DEBUG here?

using rock::getArchValue() instead

pabloantoniom · 2025-12-12T15:32:05Z

+  FailureOr<IntegerAttr> maybeNumChiplets =
+      getAttrFromOpOrParents<IntegerAttr>(op, "num_chiplets");
+  if (failed(maybeNumChiplets)) {
+    return failure();


pabloantoniom · 2025-12-12T15:34:00Z


  // Test whether the module is fusible
  const bool isFusible = mlirIsModuleFusible(moduleOp, perfStr);
- 


nit: Unwanted change

pabloantoniom · 2025-12-12T15:41:50Z

           : b.getI64IntegerAttr(
                 rock::lookupArchInfo(archAttr.getValue()).minNumCU));
+
+  IntegerAttr numChipletsAttr =


nit: It's not implemented for numCu, but I would check here that we are getting a valid value, e.g., if numChiplets.getNumOccurrences() > 0, throw an error if numChiplets < 0.

Similarly, I would add also throw an error if numChiplets > numCU. Or even if numChiplets >= numCU?

I'll add the numChiplet>0 check to getNumChiplets() where we already check numChiplets<arch.maxChiplets

Copilot

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-15T09:31:58Z

                                                            // How to check out into specific directory, according to stackoverflow.
                                                            dir('MITuna') {
-                                                                git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git'
+                                                                git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git'


The MITuna branch reference has been changed from "pf-tuna-rocmlir-3" to "num_chiplets". This appears to be a temporary development branch. Consider changing this back to the main/stable branch before merging to production, or document why this specific branch is needed.

Suggested change

git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git'

// Use the stable branch for MITuna. If you need to use a feature branch, document why.

git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git'

Copilot · 2025-12-15T09:31:58Z

                                                        buildProject('check-rocmlir-build-only ci-performance-scripts', '')
                                                        dir('MITuna') {
-                                                            git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git'
+                                                            git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git'


The MITuna branch reference has been changed from "pf-tuna-rocmlir-3" to "num_chiplets". This appears to be a temporary development branch. Consider changing this back to the main/stable branch before merging to production, or document why this specific branch is needed.

Suggested change

git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git'

git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git'

Copilot · 2025-12-15T09:31:59Z

    // It was empirically found that two chiplets as a group
    // computing a spatial mxn tile has better locality throughout.
-    int64_t numChipletsPerGroup = std::ceil(numChiplets / 2);
+    int64_t numChipletsPerGroup = std::ceil(info.numChiplets / 2);


Integer division issue: std::ceil(info.numChiplets / 2) performs integer division before calling ceil, which makes the ceil call ineffective. For example, with numChiplets=8, this computes ceil(8/2)=ceil(4)=4, which is correct. However, with numChiplets=7, it would compute ceil(7/2)=ceil(3)=3 instead of the expected 4. This should be std::ceil(info.numChiplets / 2.0) to ensure floating-point division before ceiling.

Suggested change

int64_t numChipletsPerGroup = std::ceil(info.numChiplets / 2);

int64_t numChipletsPerGroup = static_cast<int64_t>(std::ceil(info.numChiplets / 2.0));

Same here. Assert numchiplets is multiple of 2 or =1.

Copilot · 2025-12-15T09:31:59Z

  if (numChiplets > 1) {
    // It was empirically found that two chiplets as a group
    // computing a spatial mxn tile has better locality throughout.


Integer division issue: In the subsequent line 133, std::ceil(numChiplets / 2) performs integer division before calling ceil, which makes the ceil call ineffective. For example, with numChiplets=8, this computes ceil(8/2)=ceil(4)=4, which is correct. However, with numChiplets=7, it would compute ceil(7/2)=ceil(3)=3 instead of the expected 4. Line 133 should use std::ceil(numChiplets / 2.0) to ensure floating-point division before ceiling.

This comment looks valid. @dhernandez0 perhaps just assert that numChiplets is always an even number as a sanity check or it can also be just =1

dhernandez0 · 2025-12-15T10:07:25Z

As part of your testing have you run perfRunner to make sure that everything is working as intended?

I've tested it manually, however, it's part of CI (selected-*-configs)

pabloantoniom

LGTM. I would just maybe add a little context in the PR description to explain why we want this

umangyadav

Run nightly and check if perf reports table is being generated correctly or not.
Run weekly tuning to make sure it is not broken with these changes.

umangyadav · 2025-12-15T16:39:44Z

                   "gfx906(60/64), gfx908(120)"),
    llvm::cl::value_desc("compute unit value"), llvm::cl::init(0));

+static llvm::cl::opt<int> numChiplets("num_chiplets",


add a test with non-default num_chiplets to see if it is being applied correctly as the attribute or not.

umangyadav · 2025-12-15T16:42:42Z

  if (numChiplets > 1) {
    // It was empirically found that two chiplets as a group
    // computing a spatial mxn tile has better locality throughout.


This comment looks valid. @dhernandez0 perhaps just assert that numChiplets is always an even number as a sanity check or it can also be just =1

umangyadav · 2025-12-15T16:43:05Z

    // It was empirically found that two chiplets as a group
    // computing a spatial mxn tile has better locality throughout.
-    int64_t numChipletsPerGroup = std::ceil(numChiplets / 2);
+    int64_t numChipletsPerGroup = std::ceil(info.numChiplets / 2);


Same here. Assert numchiplets is multiple of 2 or =1.

umangyadav · 2025-12-15T16:47:16Z

+  // Number of chiplets
+  problemOS << numChiplets << tab;


add some tests for problem config changes.

dhernandez0 · 2025-12-16T08:40:30Z

check if perf reports table is being generated correctly or not.

I'm running nightly, but I don't understand, why manually? CI detects errors while running tuning automatically, doesn't it?

dhernandez0 self-assigned this Dec 11, 2025

dhernandez0 requested a review from causten as a code owner December 11, 2025 13:55

umangyadav reviewed Dec 11, 2025

View reviewed changes

dhernandez0 requested review from Copilot, justinrosner, mirza-halilcevic, pabloantoniom and umangyadav December 11, 2025 15:01

Copilot started reviewing on behalf of dhernandez0 December 11, 2025 15:03 View session

dhernandez0 changed the title ~~[DRAFT] Use num_chiplets~~ Use num_chiplets Dec 11, 2025

Copilot AI reviewed Dec 11, 2025

View reviewed changes

dhernandez0 requested a review from dorde-antic December 12, 2025 10:21

dhernandez0 mentioned this pull request Dec 12, 2025

Add num_chiplets ROCm/MITuna#1018

Merged

1 task

dhernandez0 force-pushed the 2186-accept-number-of-chiplets-from-migraphx branch from a84633f to 3a8fd74 Compare December 12, 2025 14:24

justinrosner reviewed Dec 12, 2025

View reviewed changes

pabloantoniom reviewed Dec 12, 2025

View reviewed changes

dhernandez0 force-pushed the 2186-accept-number-of-chiplets-from-migraphx branch from dc778cb to c84e069 Compare December 15, 2025 08:54

dhernandez0 requested review from Copilot, justinrosner and pabloantoniom December 15, 2025 09:22

Copilot started reviewing on behalf of dhernandez0 December 15, 2025 09:24 View session

Copilot AI reviewed Dec 15, 2025

View reviewed changes

justinrosner approved these changes Dec 15, 2025

View reviewed changes

pabloantoniom approved these changes Dec 15, 2025

View reviewed changes

umangyadav reviewed Dec 15, 2025

View reviewed changes

dhernandez0 added 3 commits January 7, 2026 14:21

Use num_chiplets information

01bd187

Temporary commit to test MITuna branch

cc8c296

Addressing PR comments

e4efec0

dhernandez0 force-pushed the 2186-accept-number-of-chiplets-from-migraphx branch from b09a96c to e4efec0 Compare January 7, 2026 13:21

dhernandez0 added 3 commits January 7, 2026 16:20

Merge branch 'develop' into 2186-accept-number-of-chiplets-from-migraphx

8801dfd

Merge branch 'develop' into 2186-accept-number-of-chiplets-from-migraphx

89b6508

Merge branch 'develop' into 2186-accept-number-of-chiplets-from-migraphx

cff2be8

dhernandez0 merged commit ca26d61 into develop Jan 12, 2026
15 checks passed

dhernandez0 deleted the 2186-accept-number-of-chiplets-from-migraphx branch January 12, 2026 10:15

dhernandez0 mentioned this pull request Jan 12, 2026

Update MITuna branch #2204

Merged

1 task

	f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}'
	f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}',

	<< " cannot be greather than arch maxNumXCC="
	<< " cannot be greater than arch maxNumXCC="


		// Test whether the module is fusible
		const bool isFusible = mlirIsModuleFusible(moduleOp, perfStr);

	git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git'
	// Use the stable branch for MITuna. If you need to use a feature branch, document why.
	git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git'

	int64_t numChipletsPerGroup = std::ceil(info.numChiplets / 2);
	int64_t numChipletsPerGroup = static_cast<int64_t>(std::ceil(info.numChiplets / 2.0));

Conversation

dhernandez0 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

umangyadav left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

dhernandez0 commented Dec 11, 2025

Uh oh!

justinrosner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

dhernandez0 commented Dec 11, 2025 •

edited

Loading