Skip to content

Use num_chiplets#2168

Merged
dhernandez0 merged 6 commits intodevelopfrom
2186-accept-number-of-chiplets-from-migraphx
Jan 12, 2026
Merged

Use num_chiplets#2168
dhernandez0 merged 6 commits intodevelopfrom
2186-accept-number-of-chiplets-from-migraphx

Conversation

@dhernandez0
Copy link
Copy Markdown
Contributor

@dhernandez0 dhernandez0 commented Dec 11, 2025

Motivation

Use num_chiplets for XCD remapping and during tuning.

Please review ROCm/MITuna#1018 as well

Technical Details

Added num_chiplets to:

  • ConvGenerator
  • rocmlir-gen: add num_chiplets when generating a kernel
  • getNumChiplets() and getNumChipletsValue() to GetRockInfo.cpp
  • GridLayoutEmitter.cpp: revert hack and use num_chiplets
  • RockTuningImpl.cpp: add num_chiplets to tuning file
  • gridwise_gemm_accel_lowering.mlir: update MI308 XCD remapping test
  • python scripts: added num_chiplets while tuning

Test Plan

PR test and nightly.

Test Result

PR CI: https://ml-ci-internal.amd.com/job/MLIR/job/mlir/job/PR-2168/11/pipeline-overview/
nightly CI: https://ml-ci-internal.amd.com/job/MLIR/job/mlir/job/PR-2168/17/pipeline-overview/

Submission Checklist

@dhernandez0 dhernandez0 self-assigned this Dec 11, 2025
@dhernandez0 dhernandez0 requested a review from causten as a code owner December 11, 2025 13:55
Copy link
Copy Markdown
Member

@umangyadav umangyadav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem more intrusive as we need to keep track of num_chiplets and various places including changing problem config string.

Did we try using "native" ?

if (succeeded(maybeNumCU)) {
gpuFunc->setAttr("num_cu", b.getI64IntegerAttr(maybeNumCU.value()));
}
FailureOr<int64_t> maybeChiplets = rock::getNumChiplets(theFunc);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need num_chiplets attribute when lowering to GPU ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, this doesn't have to be here. Removing it.

case rock::TransformType::Pad:
case rock::TransformType::Slice:
case rock::TransformType::Embed:
case rock::TransformType::Slice:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated to PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@dhernandez0 dhernandez0 changed the title [DRAFT] Use num_chiplets Use num_chiplets Dec 11, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds num_chiplets support throughout the ROCm MLIR codebase for XCD (chiplet) remapping and tuning. The changes enable proper chiplet awareness for multi-chiplet GPUs like MI300 series, replacing previous hardcoded heuristics with explicit chiplet count parameters.

Key Changes

  • Added num_chiplets parameter to Python tuning infrastructure and all configuration classes (Conv, GEMM, Attention, etc.)
  • Implemented getNumChiplets() and getNumChipletsValue() functions in C++ to retrieve chiplet information from operations
  • Updated GridLayoutEmitter to use explicit numChiplets parameter instead of hardcoded logic for XCD remapping

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
mlir/utils/performance/tuningRunner.py Added num_chiplets to Options class and propagated to configuration parsing
mlir/utils/performance/reportUtils.py Updated all test parameter lists to include numChiplets field
mlir/utils/performance/perfRunner.py Added num_chiplets to all configuration classes and their methods; implemented get_num_chiplets() function
mlir/utils/performance/parameterSweeps.py Added num_chiplets to Options and imported get_num_chiplets function
mlir/utils/performance/attentionSweeps.py Imported and used get_num_chiplets for attention configuration
mlir/utils/performance/handleNewConfigs.py Removed unused global variables (ARCH, CHIP, NUM_CU)
mlir/utils/performance/common/benchmarkUtils.cpp Updated command-line parsing comments to include num_chiplets
mlir/tools/rocmlir-gen/rocmlir-gen.cpp Added numChiplets command-line option and attribute handling for all kernel types
mlir/lib/Dialect/Rock/IR/GetRockInfo.cpp Implemented getNumChiplets() and getNumChipletsValue() functions
mlir/lib/Dialect/Rock/Transforms/GridLayoutEmitter.cpp Removed hardcoded getNumChiplets() function; now uses explicit parameter
mlir/lib/Dialect/Rock/Transforms/GridLayoutEmitter.h Updated function signatures to accept numChiplets parameter
mlir/lib/Dialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp Updated all grid layout calls to pass num_chiplets value
mlir/lib/Dialect/Rock/Tuning/RockTuningImpl.cpp Added numChiplets to tuning problem strings
mlir/lib/Dialect/Rock/Tuning/ConvContext.cpp Added numChiplets to ConvolutionContext structure
mlir/lib/Dialect/Rock/Generator/ConvGenerator.cpp Added num_chiplets support to ConvGenerator configuration
mlir/include/mlir/Dialect/Rock/Tuning/ConvContext.h Added numChiplets field to ConvolutionContext
mlir/include/mlir/Dialect/Rock/IR/GetRockInfo.h Added function declarations for getNumChiplets() and getNumChipletsValue()
mlir/include/mlir/Dialect/Rock/Generator/ConvGenerator.h Added num_chiplets to Config struct and constructor
mlir/include/mlir-c/Dialect/MIGraphX.h Updated comment to mention num_chiplets requirement
mlir/test/Dialect/Rock/gridwise_gemm_accel_lowering.mlir Updated test attributes to include num_chiplets
mlir/test/CAPI/reduce_fusible.cpp Updated test MLIR to include num_chiplets attribute

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mlir/utils/performance/perfRunner.py Outdated
f'--dilation_w={self.dilation_w}', f'--conv_stride_h={self.conv_stride_h}',
f'--conv_stride_w={self.conv_stride_w}', f'--padding_h={self.padding_h}',
f'--padding_w={self.padding_w}', f'--groupsize={self.group}', f'--gemmO={self.o}',
f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}'
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comma between f-strings. The string concatenation on lines 1195-1196 is missing a comma after the num_chiplets f-string, which will cause the two strings to be concatenated without a space, resulting in malformed command-line arguments like "--num_chiplets=8--fil_layout=...".

Suggested change
f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}'
f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}',

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AmdArchInfo archInfo = rock::lookupArchInfo(arch);
if (numChiplets.getValue().getSExtValue() > archInfo.maxNumXCC) {
return op->emitError() << "num_chiplets=" << numChiplets
<< " cannot be greather than arch maxNumXCC="
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "greather" should be "greater".

Suggested change
<< " cannot be greather than arch maxNumXCC="
<< " cannot be greater than arch maxNumXCC="

Copilot uses AI. Check for mistakes.
@dhernandez0
Copy link
Copy Markdown
Contributor Author

These changes seem more intrusive as we need to keep track of num_chiplets and various places including changing problem config string.

Did we try using "native" ?

So, in tuningRunner.py we decide what num_cu and num_chiplets we are tuning for. Then, I think it's a good idea to save it, otherwise if we save "native" or "gfx942" only in the tuning file: was gfx942 tuned in CPX mode? was it MI308?

@dhernandez0 dhernandez0 mentioned this pull request Dec 12, 2025
1 task
@dhernandez0 dhernandez0 force-pushed the 2186-accept-number-of-chiplets-from-migraphx branch from a84633f to 3a8fd74 Compare December 12, 2025 14:24
Copy link
Copy Markdown
Contributor

@justinrosner justinrosner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of your testing have you run perfRunner to make sure that everything is working as intended?

Comment thread mlir/utils/performance/perfRunner.py Outdated
f'--dilation_w={self.dilation_w}', f'--conv_stride_h={self.conv_stride_h}',
f'--conv_stride_w={self.conv_stride_w}', f'--padding_h={self.padding_h}',
f'--padding_w={self.padding_w}', f'--groupsize={self.group}', f'--gemmO={self.o}',
f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +2187 to +2196
def get_num_chiplets(chip, num_cu):
# TODO: use AmdArchDb python bindings
if "gfx942" in chip and num_cu == 304:
return 8
if "gfx942" in chip and num_cu == 80:
return 4
if "gfx950" in chip:
return 8

return 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that we already have support for the AmdArchDb python bindings? Is there something else blocking us from making that switch?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not work for DPX, CPX modes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they are used anywhere yet. I wasn't able to make them work. It appears they need ninja amd_arch_db so we will have to change Jenkinsfile etc. This is out of the scope of this PR IMO, we want a PR to enable using it (there are lots of TODOs that need to be fixed anyway regarding python bindings).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not work for DPX, CPX modes

when python bindings work we'll be able to do query "native" here to get the number of chiplets.

FailureOr<int64_t> mlir::rock::getNumChiplets(Operation *op) {
FailureOr<StringAttr> maybeArch = getArch(op);
if (failed(maybeArch)) {
return failure();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add LLVM_DEBUG here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using rock::getArchValue() instead

FailureOr<IntegerAttr> maybeNumChiplets =
getAttrFromOpOrParents<IntegerAttr>(op, "num_chiplets");
if (failed(maybeNumChiplets)) {
return failure();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here


// Test whether the module is fusible
const bool isFusible = mlirIsModuleFusible(moduleOp, perfStr);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Unwanted change

: b.getI64IntegerAttr(
rock::lookupArchInfo(archAttr.getValue()).minNumCU));

IntegerAttr numChipletsAttr =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's not implemented for numCu, but I would check here that we are getting a valid value, e.g., if numChiplets.getNumOccurrences() > 0, throw an error if numChiplets < 0.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I would add also throw an error if numChiplets > numCU. Or even if numChiplets >= numCU?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the numChiplet>0 check to getNumChiplets() where we already check numChiplets<arch.maxChiplets

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// How to check out into specific directory, according to stackoverflow.
dir('MITuna') {
git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git'
git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git'
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MITuna branch reference has been changed from "pf-tuna-rocmlir-3" to "num_chiplets". This appears to be a temporary development branch. Consider changing this back to the main/stable branch before merging to production, or document why this specific branch is needed.

Suggested change
git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git'
// Use the stable branch for MITuna. If you need to use a feature branch, document why.
git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git'

Copilot uses AI. Check for mistakes.
buildProject('check-rocmlir-build-only ci-performance-scripts', '')
dir('MITuna') {
git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git'
git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git'
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MITuna branch reference has been changed from "pf-tuna-rocmlir-3" to "num_chiplets". This appears to be a temporary development branch. Consider changing this back to the main/stable branch before merging to production, or document why this specific branch is needed.

Suggested change
git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git'
git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git'

Copilot uses AI. Check for mistakes.
// It was empirically found that two chiplets as a group
// computing a spatial mxn tile has better locality throughout.
int64_t numChipletsPerGroup = std::ceil(numChiplets / 2);
int64_t numChipletsPerGroup = std::ceil(info.numChiplets / 2);
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer division issue: std::ceil(info.numChiplets / 2) performs integer division before calling ceil, which makes the ceil call ineffective. For example, with numChiplets=8, this computes ceil(8/2)=ceil(4)=4, which is correct. However, with numChiplets=7, it would compute ceil(7/2)=ceil(3)=3 instead of the expected 4. This should be std::ceil(info.numChiplets / 2.0) to ensure floating-point division before ceiling.

Suggested change
int64_t numChipletsPerGroup = std::ceil(info.numChiplets / 2);
int64_t numChipletsPerGroup = static_cast<int64_t>(std::ceil(info.numChiplets / 2.0));

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Assert numchiplets is multiple of 2 or =1.

Comment on lines 130 to 132
if (numChiplets > 1) {
// It was empirically found that two chiplets as a group
// computing a spatial mxn tile has better locality throughout.
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer division issue: In the subsequent line 133, std::ceil(numChiplets / 2) performs integer division before calling ceil, which makes the ceil call ineffective. For example, with numChiplets=8, this computes ceil(8/2)=ceil(4)=4, which is correct. However, with numChiplets=7, it would compute ceil(7/2)=ceil(3)=3 instead of the expected 4. Line 133 should use std::ceil(numChiplets / 2.0) to ensure floating-point division before ceiling.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks valid. @dhernandez0 perhaps just assert that numChiplets is always an even number as a sanity check or it can also be just =1

@dhernandez0
Copy link
Copy Markdown
Contributor Author

As part of your testing have you run perfRunner to make sure that everything is working as intended?

I've tested it manually, however, it's part of CI (selected-*-configs)

Copy link
Copy Markdown
Contributor

@pabloantoniom pabloantoniom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I would just maybe add a little context in the PR description to explain why we want this

Copy link
Copy Markdown
Member

@umangyadav umangyadav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Run nightly and check if perf reports table is being generated correctly or not.
  • Run weekly tuning to make sure it is not broken with these changes.

"gfx906(60/64), gfx908(120)"),
llvm::cl::value_desc("compute unit value"), llvm::cl::init(0));

static llvm::cl::opt<int> numChiplets("num_chiplets",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test with non-default num_chiplets to see if it is being applied correctly as the attribute or not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 130 to 132
if (numChiplets > 1) {
// It was empirically found that two chiplets as a group
// computing a spatial mxn tile has better locality throughout.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks valid. @dhernandez0 perhaps just assert that numChiplets is always an even number as a sanity check or it can also be just =1

// It was empirically found that two chiplets as a group
// computing a spatial mxn tile has better locality throughout.
int64_t numChipletsPerGroup = std::ceil(numChiplets / 2);
int64_t numChipletsPerGroup = std::ceil(info.numChiplets / 2);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Assert numchiplets is multiple of 2 or =1.

Comment on lines +1243 to +1244
// Number of chiplets
problemOS << numChiplets << tab;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some tests for problem config changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dhernandez0
Copy link
Copy Markdown
Contributor Author

  • check if perf reports table is being generated correctly or not.

I'm running nightly, but I don't understand, why manually? CI detects errors while running tuning automatically, doesn't it?

@dhernandez0 dhernandez0 force-pushed the 2186-accept-number-of-chiplets-from-migraphx branch from b09a96c to e4efec0 Compare January 7, 2026 13:21
@dhernandez0 dhernandez0 merged commit ca26d61 into develop Jan 12, 2026
15 checks passed
@dhernandez0 dhernandez0 deleted the 2186-accept-number-of-chiplets-from-migraphx branch January 12, 2026 10:15
@dhernandez0 dhernandez0 mentioned this pull request Jan 12, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants