Use num_chiplets#2168
Conversation
umangyadav
left a comment
There was a problem hiding this comment.
These changes seem more intrusive as we need to keep track of num_chiplets and various places including changing problem config string.
Did we try using "native" ?
| if (succeeded(maybeNumCU)) { | ||
| gpuFunc->setAttr("num_cu", b.getI64IntegerAttr(maybeNumCU.value())); | ||
| } | ||
| FailureOr<int64_t> maybeChiplets = rock::getNumChiplets(theFunc); |
There was a problem hiding this comment.
Why does it need num_chiplets attribute when lowering to GPU ?
There was a problem hiding this comment.
you're right, this doesn't have to be here. Removing it.
| case rock::TransformType::Pad: | ||
| case rock::TransformType::Slice: | ||
| case rock::TransformType::Embed: | ||
| case rock::TransformType::Slice: |
There was a problem hiding this comment.
This change seems unrelated to PR
There was a problem hiding this comment.
Pull request overview
This PR adds num_chiplets support throughout the ROCm MLIR codebase for XCD (chiplet) remapping and tuning. The changes enable proper chiplet awareness for multi-chiplet GPUs like MI300 series, replacing previous hardcoded heuristics with explicit chiplet count parameters.
Key Changes
- Added
num_chipletsparameter to Python tuning infrastructure and all configuration classes (Conv, GEMM, Attention, etc.) - Implemented
getNumChiplets()andgetNumChipletsValue()functions in C++ to retrieve chiplet information from operations - Updated GridLayoutEmitter to use explicit
numChipletsparameter instead of hardcoded logic for XCD remapping
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| mlir/utils/performance/tuningRunner.py | Added num_chiplets to Options class and propagated to configuration parsing |
| mlir/utils/performance/reportUtils.py | Updated all test parameter lists to include numChiplets field |
| mlir/utils/performance/perfRunner.py | Added num_chiplets to all configuration classes and their methods; implemented get_num_chiplets() function |
| mlir/utils/performance/parameterSweeps.py | Added num_chiplets to Options and imported get_num_chiplets function |
| mlir/utils/performance/attentionSweeps.py | Imported and used get_num_chiplets for attention configuration |
| mlir/utils/performance/handleNewConfigs.py | Removed unused global variables (ARCH, CHIP, NUM_CU) |
| mlir/utils/performance/common/benchmarkUtils.cpp | Updated command-line parsing comments to include num_chiplets |
| mlir/tools/rocmlir-gen/rocmlir-gen.cpp | Added numChiplets command-line option and attribute handling for all kernel types |
| mlir/lib/Dialect/Rock/IR/GetRockInfo.cpp | Implemented getNumChiplets() and getNumChipletsValue() functions |
| mlir/lib/Dialect/Rock/Transforms/GridLayoutEmitter.cpp | Removed hardcoded getNumChiplets() function; now uses explicit parameter |
| mlir/lib/Dialect/Rock/Transforms/GridLayoutEmitter.h | Updated function signatures to accept numChiplets parameter |
| mlir/lib/Dialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp | Updated all grid layout calls to pass num_chiplets value |
| mlir/lib/Dialect/Rock/Tuning/RockTuningImpl.cpp | Added numChiplets to tuning problem strings |
| mlir/lib/Dialect/Rock/Tuning/ConvContext.cpp | Added numChiplets to ConvolutionContext structure |
| mlir/lib/Dialect/Rock/Generator/ConvGenerator.cpp | Added num_chiplets support to ConvGenerator configuration |
| mlir/include/mlir/Dialect/Rock/Tuning/ConvContext.h | Added numChiplets field to ConvolutionContext |
| mlir/include/mlir/Dialect/Rock/IR/GetRockInfo.h | Added function declarations for getNumChiplets() and getNumChipletsValue() |
| mlir/include/mlir/Dialect/Rock/Generator/ConvGenerator.h | Added num_chiplets to Config struct and constructor |
| mlir/include/mlir-c/Dialect/MIGraphX.h | Updated comment to mention num_chiplets requirement |
| mlir/test/Dialect/Rock/gridwise_gemm_accel_lowering.mlir | Updated test attributes to include num_chiplets |
| mlir/test/CAPI/reduce_fusible.cpp | Updated test MLIR to include num_chiplets attribute |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| f'--dilation_w={self.dilation_w}', f'--conv_stride_h={self.conv_stride_h}', | ||
| f'--conv_stride_w={self.conv_stride_w}', f'--padding_h={self.padding_h}', | ||
| f'--padding_w={self.padding_w}', f'--groupsize={self.group}', f'--gemmO={self.o}', | ||
| f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}' |
There was a problem hiding this comment.
Missing comma between f-strings. The string concatenation on lines 1195-1196 is missing a comma after the num_chiplets f-string, which will cause the two strings to be concatenated without a space, resulting in malformed command-line arguments like "--num_chiplets=8--fil_layout=...".
| f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}' | |
| f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}', |
| AmdArchInfo archInfo = rock::lookupArchInfo(arch); | ||
| if (numChiplets.getValue().getSExtValue() > archInfo.maxNumXCC) { | ||
| return op->emitError() << "num_chiplets=" << numChiplets | ||
| << " cannot be greather than arch maxNumXCC=" |
There was a problem hiding this comment.
Spelling error: "greather" should be "greater".
| << " cannot be greather than arch maxNumXCC=" | |
| << " cannot be greater than arch maxNumXCC=" |
So, in tuningRunner.py we decide what num_cu and num_chiplets we are tuning for. Then, I think it's a good idea to save it, otherwise if we save "native" or "gfx942" only in the tuning file: was gfx942 tuned in CPX mode? was it MI308? |
a84633f to
3a8fd74
Compare
justinrosner
left a comment
There was a problem hiding this comment.
As part of your testing have you run perfRunner to make sure that everything is working as intended?
| f'--dilation_w={self.dilation_w}', f'--conv_stride_h={self.conv_stride_h}', | ||
| f'--conv_stride_w={self.conv_stride_w}', f'--padding_h={self.padding_h}', | ||
| f'--padding_w={self.padding_w}', f'--groupsize={self.group}', f'--gemmO={self.o}', | ||
| f'--num_cu={self.num_cu}', f'--num_chiplets={self.num_chiplets}' |
| def get_num_chiplets(chip, num_cu): | ||
| # TODO: use AmdArchDb python bindings | ||
| if "gfx942" in chip and num_cu == 304: | ||
| return 8 | ||
| if "gfx942" in chip and num_cu == 80: | ||
| return 4 | ||
| if "gfx950" in chip: | ||
| return 8 | ||
|
|
||
| return 1 |
There was a problem hiding this comment.
I thought that we already have support for the AmdArchDb python bindings? Is there something else blocking us from making that switch?
There was a problem hiding this comment.
This may not work for DPX, CPX modes
There was a problem hiding this comment.
I don't think they are used anywhere yet. I wasn't able to make them work. It appears they need ninja amd_arch_db so we will have to change Jenkinsfile etc. This is out of the scope of this PR IMO, we want a PR to enable using it (there are lots of TODOs that need to be fixed anyway regarding python bindings).
There was a problem hiding this comment.
This may not work for DPX, CPX modes
when python bindings work we'll be able to do query "native" here to get the number of chiplets.
| FailureOr<int64_t> mlir::rock::getNumChiplets(Operation *op) { | ||
| FailureOr<StringAttr> maybeArch = getArch(op); | ||
| if (failed(maybeArch)) { | ||
| return failure(); |
There was a problem hiding this comment.
Can we add LLVM_DEBUG here?
There was a problem hiding this comment.
using rock::getArchValue() instead
| FailureOr<IntegerAttr> maybeNumChiplets = | ||
| getAttrFromOpOrParents<IntegerAttr>(op, "num_chiplets"); | ||
| if (failed(maybeNumChiplets)) { | ||
| return failure(); |
|
|
||
| // Test whether the module is fusible | ||
| const bool isFusible = mlirIsModuleFusible(moduleOp, perfStr); | ||
There was a problem hiding this comment.
nit: Unwanted change
| : b.getI64IntegerAttr( | ||
| rock::lookupArchInfo(archAttr.getValue()).minNumCU)); | ||
|
|
||
| IntegerAttr numChipletsAttr = |
There was a problem hiding this comment.
nit: It's not implemented for numCu, but I would check here that we are getting a valid value, e.g., if numChiplets.getNumOccurrences() > 0, throw an error if numChiplets < 0.
There was a problem hiding this comment.
Similarly, I would add also throw an error if numChiplets > numCU. Or even if numChiplets >= numCU?
There was a problem hiding this comment.
I'll add the numChiplet>0 check to getNumChiplets() where we already check numChiplets<arch.maxChiplets
dc778cb to
c84e069
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // How to check out into specific directory, according to stackoverflow. | ||
| dir('MITuna') { | ||
| git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git' | ||
| git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git' |
There was a problem hiding this comment.
The MITuna branch reference has been changed from "pf-tuna-rocmlir-3" to "num_chiplets". This appears to be a temporary development branch. Consider changing this back to the main/stable branch before merging to production, or document why this specific branch is needed.
| git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git' | |
| // Use the stable branch for MITuna. If you need to use a feature branch, document why. | |
| git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git' |
| buildProject('check-rocmlir-build-only ci-performance-scripts', '') | ||
| dir('MITuna') { | ||
| git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git' | ||
| git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git' |
There was a problem hiding this comment.
The MITuna branch reference has been changed from "pf-tuna-rocmlir-3" to "num_chiplets". This appears to be a temporary development branch. Consider changing this back to the main/stable branch before merging to production, or document why this specific branch is needed.
| git branch: "num_chiplets", poll: false, url: 'https://github.com/ROCm/MITuna.git' | |
| git branch: "pf-tuna-rocmlir-3", poll: false, url: 'https://github.com/ROCm/MITuna.git' |
| // It was empirically found that two chiplets as a group | ||
| // computing a spatial mxn tile has better locality throughout. | ||
| int64_t numChipletsPerGroup = std::ceil(numChiplets / 2); | ||
| int64_t numChipletsPerGroup = std::ceil(info.numChiplets / 2); |
There was a problem hiding this comment.
Integer division issue: std::ceil(info.numChiplets / 2) performs integer division before calling ceil, which makes the ceil call ineffective. For example, with numChiplets=8, this computes ceil(8/2)=ceil(4)=4, which is correct. However, with numChiplets=7, it would compute ceil(7/2)=ceil(3)=3 instead of the expected 4. This should be std::ceil(info.numChiplets / 2.0) to ensure floating-point division before ceiling.
| int64_t numChipletsPerGroup = std::ceil(info.numChiplets / 2); | |
| int64_t numChipletsPerGroup = static_cast<int64_t>(std::ceil(info.numChiplets / 2.0)); |
There was a problem hiding this comment.
Same here. Assert numchiplets is multiple of 2 or =1.
| if (numChiplets > 1) { | ||
| // It was empirically found that two chiplets as a group | ||
| // computing a spatial mxn tile has better locality throughout. |
There was a problem hiding this comment.
Integer division issue: In the subsequent line 133, std::ceil(numChiplets / 2) performs integer division before calling ceil, which makes the ceil call ineffective. For example, with numChiplets=8, this computes ceil(8/2)=ceil(4)=4, which is correct. However, with numChiplets=7, it would compute ceil(7/2)=ceil(3)=3 instead of the expected 4. Line 133 should use std::ceil(numChiplets / 2.0) to ensure floating-point division before ceiling.
There was a problem hiding this comment.
This comment looks valid. @dhernandez0 perhaps just assert that numChiplets is always an even number as a sanity check or it can also be just =1
I've tested it manually, however, it's part of CI (selected-*-configs) |
pabloantoniom
left a comment
There was a problem hiding this comment.
LGTM. I would just maybe add a little context in the PR description to explain why we want this
umangyadav
left a comment
There was a problem hiding this comment.
- Run nightly and check if perf reports table is being generated correctly or not.
- Run weekly tuning to make sure it is not broken with these changes.
| "gfx906(60/64), gfx908(120)"), | ||
| llvm::cl::value_desc("compute unit value"), llvm::cl::init(0)); | ||
|
|
||
| static llvm::cl::opt<int> numChiplets("num_chiplets", |
There was a problem hiding this comment.
add a test with non-default num_chiplets to see if it is being applied correctly as the attribute or not.
| if (numChiplets > 1) { | ||
| // It was empirically found that two chiplets as a group | ||
| // computing a spatial mxn tile has better locality throughout. |
There was a problem hiding this comment.
This comment looks valid. @dhernandez0 perhaps just assert that numChiplets is always an even number as a sanity check or it can also be just =1
| // It was empirically found that two chiplets as a group | ||
| // computing a spatial mxn tile has better locality throughout. | ||
| int64_t numChipletsPerGroup = std::ceil(numChiplets / 2); | ||
| int64_t numChipletsPerGroup = std::ceil(info.numChiplets / 2); |
There was a problem hiding this comment.
Same here. Assert numchiplets is multiple of 2 or =1.
| // Number of chiplets | ||
| problemOS << numChiplets << tab; |
There was a problem hiding this comment.
add some tests for problem config changes.
I'm running nightly, but I don't understand, why manually? CI detects errors while running tuning automatically, doesn't it? |
b09a96c to
e4efec0
Compare
Motivation
Use num_chiplets for XCD remapping and during tuning.
Please review ROCm/MITuna#1018 as well
Technical Details
Added num_chiplets to:
Test Plan
PR test and nightly.
Test Result
PR CI: https://ml-ci-internal.amd.com/job/MLIR/job/mlir/job/PR-2168/11/pipeline-overview/
nightly CI: https://ml-ci-internal.amd.com/job/MLIR/job/mlir/job/PR-2168/17/pipeline-overview/
Submission Checklist