Skip to content

[Metal] Fix top level argument buffer indexing when creating descriptors#998

Closed
kcloudy0717 wants to merge 4 commits intollvm:mainfrom
kcloudy0717:kcloudy0717/metal/TopLevelArgumentBuffer
Closed

[Metal] Fix top level argument buffer indexing when creating descriptors#998
kcloudy0717 wants to merge 4 commits intollvm:mainfrom
kcloudy0717:kcloudy0717/metal/TopLevelArgumentBuffer

Conversation

@kcloudy0717
Copy link
Copy Markdown
Contributor

I looked deeper into why some tests fails on Metal and I think found the reason why.

The Metal backend generates a automatic linear layout of the resources into the top-level argument buffer, but the actual heap index of the resources seems to be sorted by the resource type in the metal shader reflection. Given the following declarations:

StructuredBuffer<int16_t4> In: register(t0);
RWStructuredBuffer<int16_t4> Out1 : register(u1);
RWStructuredBuffer<int16_t4> Out2 : register(u2);
RWStructuredBuffer<int16_t4> Out3 : register(u3);
RWStructuredBuffer<int16_t4> Out4 : register(u4);

StructuredBuffer<uint16_t4> UIn: register(t5);
RWStructuredBuffer<uint16_t4> UOut1 : register(u6);
RWStructuredBuffer<uint16_t4> UOut2 : register(u7);
RWStructuredBuffer<uint16_t4> UOut3 : register(u8);
RWStructuredBuffer<uint16_t4> UOut4 : register(u9);

when compiled with -metal and -Fre options, we get the following in the metal json reflection:

{
  ...
  "TopLevelArgumentBuffer": [
    { "EltOffset": 0, "Name": "", "Size": 24, "Slot": 0, "Space": 0, "Type": "SRV" },
    { "EltOffset": 24, "Name": "", "Size": 24, "Slot": 5, "Space": 0, "Type": "SRV" },
    { "EltOffset": 48, "Name": "", "Size": 24, "Slot": 1, "Space": 0, "Type": "UAV" },
    { "EltOffset": 72, "Name": "", "Size": 24, "Slot": 2, "Space": 0, "Type": "UAV" },
    { "EltOffset": 96, "Name": "", "Size": 24, "Slot": 3, "Space": 0, "Type": "UAV" },
    { "EltOffset": 120, "Name": "", "Size": 24, "Slot": 4, "Space": 0, "Type": "UAV" },
    { "EltOffset": 144, "Name": "", "Size": 24, "Slot": 6, "Space": 0, "Type": "UAV" },
    { "EltOffset": 168, "Name": "", "Size": 24, "Slot": 7, "Space": 0, "Type": "UAV" },
    { "EltOffset": 192, "Name": "", "Size": 24, "Slot": 8, "Space": 0, "Type": "UAV" },
    { "EltOffset": 216, "Name": "", "Size": 24, "Slot": 9, "Space": 0, "Type": "UAV" }
  ],
  ...
}

From the reflection data we can see In and UIn SRV resource descriptors should be created at index 0 and 1, but this is not the case in the backend because it assumes heap index is 1:1 to the order they are declared in the DescriptorSets -> Resources field.

The PR fixes this by adding a hash map using Slot and Space values in the reflection as keys that maps to the actual index of the top level argument buffer when creating descriptors.

With the above changes, I get the following on my local machine:

Unexpectedly Passed Tests (16):
  OffloadTest-mtl :: Feature/ResourcesInStructs/res-in-struct-mix.test
  OffloadTest-mtl :: WaveOps/QuadReadAcrossX.32.test
  OffloadTest-mtl :: WaveOps/QuadReadAcrossX.int16.test
  OffloadTest-mtl :: WaveOps/WaveActiveMax.fp16.test
  OffloadTest-mtl :: WaveOps/WaveActiveMax.fp32.test
  OffloadTest-mtl :: WaveOps/WaveActiveMax.int16.test
  OffloadTest-mtl :: WaveOps/WaveActiveMax.int32.test
  OffloadTest-mtl :: WaveOps/WaveActiveSum.int16.test
  OffloadTest-mtl :: WaveOps/WaveActiveSum.int32.test
  OffloadTest-mtl :: WaveOps/WavePrefixSum.int16.test
  OffloadTest-mtl :: WaveOps/WaveReadLaneAt.16.test
  OffloadTest-mtl :: WaveOps/WaveReadLaneAt.32.test
  OffloadTest-mtl :: WaveOps/WaveReadLaneFirst.fp16.test
  OffloadTest-mtl :: WaveOps/WaveReadLaneFirst.fp32.test
  OffloadTest-mtl :: WaveOps/WaveReadLaneFirst.int16.test
  OffloadTest-mtl :: WaveOps/WaveReadLaneFirst.int32.test

@kcloudy0717
Copy link
Copy Markdown
Contributor Author

This should resolve: #642, #989, #393, #960, #351

@kcloudy0717
Copy link
Copy Markdown
Contributor Author

Pinging @bob80905, @farzonl, @llvm-beanz for review.

@kcloudy0717
Copy link
Copy Markdown
Contributor Author

Would it also be possible to integrate metal_irconverter in third-party so their reflection API can be used? Now there are two occasions (this PR, as well as reading tg_size) that we need to manually parse the json which could be prone to errors.

@kcloudy0717
Copy link
Copy Markdown
Contributor Author

Thinking about this more, I think we should consider using the explicit root signature binding model from the metal shader converter as that will give us most flexibility when configuring resources. Using the explicit model will make future tests such as unbound resource arrays or dynamic resource indexing via ResourceDescriptorHeap/SamplerDescriptorHeap easier.

Metal's explicit root signature binding model is exactly the same as D3D12's root signature model, the only caveat is that we need to convert DXIL -> AIR on the fly because setting the root signature is only available during shader conversion through either the API or command line. See https://developer.apple.com/metal/shader-converter/#binding-model for more info.

@damyanp
Copy link
Copy Markdown
Contributor

damyanp commented Mar 23, 2026

@kcloudy0717 - this PR is marked as draft. If you're ready for it to be reviewed, can you click the "Ready for review" button please?

@kcloudy0717 kcloudy0717 marked this pull request as ready for review March 23, 2026 18:03
@kcloudy0717 kcloudy0717 force-pushed the kcloudy0717/metal/TopLevelArgumentBuffer branch from 0784078 to 7fa395b Compare March 31, 2026 16:50
Comment thread lib/API/MTL/MTLDevice.cpp
Comment on lines +76 to +77
// From metal_irconverter.h which is not part of the third-party
enum IRResourceType {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include metal_irconvert.h as a dependency in third-party?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be preferred. This task was originally for looking into mac issues, adding a whole 3rd party library probably should in a separate task?

Comment thread lib/API/MTL/MTLDevice.cpp
if (auto Err = createDescriptor(R, IS, HeapIndex++))
return Err;

if (P.isCompute()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only applicable to compute pipelines? I would imagine the same issue would pop up for graphics and ray tracing pipelines.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's applicable to other stages too, but that was something I didn't look into because the reflection generated is separate. Ideally we use explicit root signature binding model when using the Metal Shader Converter as suggested in #998 (comment).

…s for different types

Tests that are failing have same register but they are separate resource types (SRV vs UAV), the TLABIndexMap didn't account for that which caused resources with same register & space to be created in the same location.
@kcloudy0717 kcloudy0717 force-pushed the kcloudy0717/metal/TopLevelArgumentBuffer branch from ae9446d to d807b29 Compare April 1, 2026 15:14
@bogner
Copy link
Copy Markdown
Contributor

bogner commented Apr 2, 2026

Would it also be possible to integrate metal_irconverter in third-party so their reflection API can be used? Now there are two occasions (this PR, as well as reading tg_size) that we need to manually parse the json which could be prone to errors.

Yes, we can and should integrate metal_irconverter

Thinking about this more, I think we should consider using the explicit root signature binding model from the metal shader converter as that will give us most flexibility when configuring resources. Using the explicit model will make future tests such as unbound resource arrays or dynamic resource indexing via ResourceDescriptorHeap/SamplerDescriptorHeap easier.

I agree here too. Explicit root signatures is a better mapping of how these tests are written than trying to remap the bindings like we're doing here, and I would prefer that approach. Are you willing to take that on? Should we abandon this PR in preference of using that approach, or does it make sense to go with this in the short term and follow up with that later?

@kcloudy0717
Copy link
Copy Markdown
Contributor Author

Would it also be possible to integrate metal_irconverter in third-party so their reflection API can be used? Now there are two occasions (this PR, as well as reading tg_size) that we need to manually parse the json which could be prone to errors.

Yes, we can and should integrate metal_irconverter

Thinking about this more, I think we should consider using the explicit root signature binding model from the metal shader converter as that will give us most flexibility when configuring resources. Using the explicit model will make future tests such as unbound resource arrays or dynamic resource indexing via ResourceDescriptorHeap/SamplerDescriptorHeap easier.

I agree here too. Explicit root signatures is a better mapping of how these tests are written than trying to remap the bindings like we're doing here, and I would prefer that approach. Are you willing to take that on? Should we abandon this PR in preference of using that approach, or does it make sense to go with this in the short term and follow up with that later?

Sure, I can take on implementing the explicit root signature approach. I think we can abandon this PR, this PR served it's purpose on figuring out why most of the metal tests are failing. I don't think it make sense for this PR to be merged in the short term since once we integrated the RS approach, the code would most likely be removed.

@kcloudy0717
Copy link
Copy Markdown
Contributor Author

Closing this PR in favor of implementing the explicit RS approach for Metal shader converter that I'll be working on.

@kcloudy0717 kcloudy0717 closed this Apr 2, 2026
@bogner
Copy link
Copy Markdown
Contributor

bogner commented Apr 2, 2026

I've gone ahead and reworded #351 to explain what's actually happening here and duplicated #642, #989, #393, and #960 to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants