Develop upstream sync 251224 #3170

mmakevic-amd · 2025-12-24T15:16:02Z

Motivation

Bi-weekly sync from TensorFlow upstream

Disabled tests:

# @local_xla//xla/service/gpu:gpu_compiler_test_amdgpu_any
PersistedAutotuningTest.SingleOperationGetsAutotuned

Submission Checklist

[ x ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

+ Allow the chain to start from <transpose, reshape, bitcast> instead of only reshape + Add a layout sensitive mode to the simplification PiperOrigin-RevId: 846150097

Imported from GitHub PR openxla/xla#35479 Add clangd files and directories to .gitignore Copybara import of the project: -- 2999b064c6b756dfc0355d863b863aff1bdea2fa by Eugene Zhulenev <ezv@amazon.com>: Add clangd files and directories to .gitignore Add clangd files and directories to .gitignore Merging this change closes tensorflow#35479 PiperOrigin-RevId: 846156873

PiperOrigin-RevId: 846167560

…intExpression. Helps with narrowing down which constraints are unsat. There can be many constraints (e.g. WGMMA in Mosaic), and while debugging it's unclear which one is violated at a glance. As a follow up, we can also introduce names to each Constraint to make the identification even easier. PiperOrigin-RevId: 846168559

PiperOrigin-RevId: 846171859

PiperOrigin-RevId: 846173555

…TF normalization in emitters 0) Fix a bug (?) in normalization util when normalized dim contains a single dimension 1) Perform normalization OTF for Transpose emitter selection 2) Use normalized shape for unrolling decision in kLoop emitter 3) Use normalized shape to detect slow transposes in triton fusion rewriter PiperOrigin-RevId: 846191206

…t.cc This change updates custom_call_test.cc to dynamically register custom call targets and FFI handlers using the runtime-determined platform name (CUDA or ROCM). This replaces the use of static registration macros, allowing the tests to run correctly across different GPU platforms and the reference interpreter. This way we can avoid compile time branches like `#ifdef GOOGLE_CUDA` and similar. Also: 1. Converts usage of raw CUDA driver API functions to StreamExecutor functionality 2. Replaces some legacy CustomCalls by FFI 3. Converts the while test target to HloRunnerPjRt 4. Removes a test case from the Token tests with a nested type in the output type, since that's not supported by our PjRt implementation. PiperOrigin-RevId: 846196106

The `fd.Size()` check doesn't work when the file descriptor is invalid and only the path was given. PiperOrigin-RevId: 846207406

PiperOrigin-RevId: 846213195

PiperOrigin-RevId: 846214738

PiperOrigin-RevId: 846217449

PiperOrigin-RevId: 846221230

PiperOrigin-RevId: 846221752

The ROCm code path doesn't go through NcclCollectives anymore. Therefore these checks are obsolete. PiperOrigin-RevId: 846226180

PiperOrigin-RevId: 846226345

PiperOrigin-RevId: 846231902

PiperOrigin-RevId: 846234559

PiperOrigin-RevId: 846238886

This migrates `builder.create<Op>()` => `Op::create()` PiperOrigin-RevId: 846246070

This change moves the definition of `AotCompilationResult` into a new header file `compiled_module.h` and renames the class to `CompiledModule`. `CompilationResult` would have been the preferred name, but it's already in-use elsewhere. The original `AotCompilationResult` is kept as a deprecated alias. PiperOrigin-RevId: 846246415

…ests, rather than on the original dimensions. These are simpler both to write and to think about. No behavior changes are intended. PiperOrigin-RevId: 846253300

PiperOrigin-RevId: 846257722

… its allocation later Imported from GitHub PR openxla/xla#35510 📝 Summary of Changes Initialize collectives pointer to nullptr 🎯 Justification Gpu runtime options are initialized in TF and transferred to XLA to execute thunks. Since the memory is not cleared collectives point to an uninitialized memory resulting in segfault during nccl collective initialization and operation. 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix, Copybara import of the project: -- 2bfc6fbddbf2f9a926dd504169c56be45d2f1a0a by Harsha HS <Harsha.HavanurShamsundara@amd.com>: [ROCm] Initialze collectives to nullptr to force its allocation later Merging this change closes tensorflow#35510 PiperOrigin-RevId: 846266642

This migrates `builder.create<Op>()` => `Op::create()` PiperOrigin-RevId: 846268375

…utor_test. The local_defines for CUDA/ROCM are not required for this test. Added explicit includes for headers used in gpu_executor_test.cc. PiperOrigin-RevId: 846269233

Imported from GitHub PR openxla/xla#35482 Sometime json incorrectly parse compile commands from bazel, and we end up passing them as ``` "-isystem path/to/includes" ``` to `clangd`, and these flags parsed incorrectly Copybara import of the project: -- adf291e21b098d79fa3be4065ee02fafdf5c660a by Eugene Zhulenev <ezhulenev@google.com>: Correctly generate compile_commands.json Merging this change closes tensorflow#35482 PiperOrigin-RevId: 846269357

Depending on the compiler, `testing::TempDir() + __FUNCTION__` may generate and invalid file name. PiperOrigin-RevId: 846275995

…iguous send/recv buffers Imported from GitHub PR openxla/xla#35463 With latest NCCL we can use `ncclAlltoall` API directly without having to launch grouped send and recv operations. Copybara import of the project: -- 0630f4d48049b211442dcb1754e521a4b1f37f7b by Eugene Zhulenev <ezv@amazon.com>: [xla:gpu] Support ncclAlltoall directly for contiguous send/recv buffers Merging this change closes tensorflow#35463 PiperOrigin-RevId: 846277559

…is supported by libraries. PiperOrigin-RevId: 846299624

PiperOrigin-RevId: 848297480

Modify Thunk's serialization PiperOrigin-RevId: 848309350

PiperOrigin-RevId: 848310259

Modify Thunk's serialization PiperOrigin-RevId: 848323137

…lectiveDeviceListBase in place of vector<vector<int>> and reduce cognitive complexity in `GetDefaultCollectiveOpsCreator`. PiperOrigin-RevId: 848356290

PiperOrigin-RevId: 848375547

PiperOrigin-RevId: 848382953

PiperOrigin-RevId: 848387274

PiperOrigin-RevId: 848393091

PiperOrigin-RevId: 848423026

PiperOrigin-RevId: 848429925

PiperOrigin-RevId: 848434764

PiperOrigin-RevId: 848441651

…stub. The `xtile_compiler` target now acts as a selector, depending on either `xtile_compiler_impl` or `xtile_compiler_stub` based on whether CUDA or ROCm is configured. The full implementation is moved to the new `xtile_compiler_impl` target, while `xtile_compiler_stub` provides a minimal version for other configurations. This has the advantage that build_cleaner can run on xtile_compiler_impl. (Doing that removed around 20 dependencies) PiperOrigin-RevId: 848442213

PiperOrigin-RevId: 848455572

PiperOrigin-RevId: 848467225

PiperOrigin-RevId: 848467272

PiperOrigin-RevId: 848475361

It has to become a part of Compiler::CompilerOptions, but CompilerOptions should not depend on PJRT. So, moving it here. PiperOrigin-RevId: 848523186

PiperOrigin-RevId: 848534440

…sync-251224

i-chaochen · 2026-01-05T10:29:47Z

This test is failed, seems backend config (h100_sxm) is incorrect

@local_xla//xla/tools:xla_gpu_compile_lib_test_amdgpu_any                FAILED in 13.4s

[2025-12-30T13:08:37.673Z] [ RUN      ] XlaCompileLibTest.CompilesForGpuWithoutDevice
[2025-12-30T13:08:37.673Z] external/local_xla/xla/tools/xla_gpu_compile_lib_test.cc:80: Failure
[2025-12-30T13:08:37.673Z] Value of: (tsl::ReadTextProto(tsl::Env::Default(), target_config_path, &target_config))
[2025-12-30T13:08:37.673Z] Expected: is OK
[2025-12-30T13:08:37.673Z]   Actual: NOT_FOUND: /root/.cache/bazel/_bazel_root/f14ffb85b056b92f87114ec3419b920b/execroot/org_tensorflow/bazel-out/k8-opt/bin/external/local_xla/xla/tools/xla_gpu_compile_lib_test_amdgpu_any.runfiles/org_tensorflow/xla/backends/gpu/target_config/specs/h100_sxm.txtpb; No such file or directory (of type absl::lts_20250814::Status)
[2025-12-30T13:08:37.673Z] 
[2025-12-30T13:08:37.673Z] [  FAILED  ] XlaCompileLibTest.CompilesForGpuWithoutDevice (0 ms)

thcmbs and others added 30 commits December 18, 2025 02:31

[XLA] Extend reshape-transpose chain removal to include bitcasts.

2df2c4f

+ Allow the chain to start from <transpose, reshape, bitcast> instead of only reshape + Add a layout sensitive mode to the simplification PiperOrigin-RevId: 846150097

Automated Code Change

fe216f0

PiperOrigin-RevId: 846167560

Automated Code Change

3580807

PiperOrigin-RevId: 846171859

Automated Code Change

9024ef1

PiperOrigin-RevId: 846173555

Add a function to check for empty/non existing files.

69cd9be

The `fd.Size()` check doesn't work when the file descriptor is invalid and only the path was given. PiperOrigin-RevId: 846207406

Update XNNPack version

08d6df5

PiperOrigin-RevId: 846213195

Automated Code Change

f17984d

PiperOrigin-RevId: 846214738

Automated Code Change

d5820b3

PiperOrigin-RevId: 846217449

When opening a file, check that the file path is not null.

d393372

PiperOrigin-RevId: 846221230

Automated Code Change

6286fcc

PiperOrigin-RevId: 846221752

Remove forgotten ROCM version checks from NcclCollectives

b64c84f

The ROCm code path doesn't go through NcclCollectives anymore. Therefore these checks are obsolete. PiperOrigin-RevId: 846226180

Automated Code Change

17fa72a

PiperOrigin-RevId: 846226345

Automated Code Change

6457884

PiperOrigin-RevId: 846231902

Automated Code Change

5e49ee5

PiperOrigin-RevId: 846234559

[XLA:GPU] Support partitioned across replicas modules

4e34cc6

PiperOrigin-RevId: 846238886

Apply llvm-use-new-mlir-op-builder fixes

50c19ba

This migrates `builder.create<Op>()` => `Op::create()` PiperOrigin-RevId: 846246070

[PJRT] Change the two optimizations in Transpose to operate on Loop n…

9d0d22d

…ests, rather than on the original dimensions. These are simpler both to write and to think about. No behavior changes are intended. PiperOrigin-RevId: 846253300

Reverts 408bf09

a59ffc0

PiperOrigin-RevId: 846257722

Apply llvm-use-new-mlir-op-builder fixes

434dd85

This migrates `builder.create<Op>()` => `Op::create()` PiperOrigin-RevId: 846268375

Remove unnecessary local_defines and add missing includes in gpu_exec…

fbfba09

…utor_test. The local_defines for CUDA/ROCM are not required for this test. Added explicit includes for headers used in gpu_executor_test.cc. PiperOrigin-RevId: 846269233

In FileDescriptor tests, improve temporary file path generation.

f4c5fe5

Depending on the compiler, `testing::TempDir() + __FUNCTION__` may generate and invalid file name. PiperOrigin-RevId: 846275995

[xla:cpu] Do not expand convolution feature group if the convolution …

5db58f8

…is supported by libraries. PiperOrigin-RevId: 846299624

ermilovmaxim and others added 28 commits December 23, 2025 14:34

Add proto serialization for SendThunk

c16ae6c

PiperOrigin-RevId: 848297480

Add Shape to ConvolutionReorderThunk buffer_uses

13f6a37

Modify Thunk's serialization PiperOrigin-RevId: 848309350

Reverts 69c656e

989c58d

PiperOrigin-RevId: 848310259

Add Shape to TriangularSolveThunk buffer_uses

0cebad6

Modify Thunk's serialization PiperOrigin-RevId: 848323137

[ReplicaGroupV3][Refactor][6/n] Update rest of spmd/ dir to use Col…

93ae7c2

…lectiveDeviceListBase in place of vector<vector<int>> and reduce cognitive complexity in `GetDefaultCollectiveOpsCreator`. PiperOrigin-RevId: 848356290

Automated Code Change

5398834

PiperOrigin-RevId: 848375547

Automated Code Change

7d0d534

PiperOrigin-RevId: 848382953

Reverts 93ae7c2

ac8d8c3

PiperOrigin-RevId: 848387274

Automated Code Change

5e56a93

PiperOrigin-RevId: 848393091

Automated Code Change

4272c47

PiperOrigin-RevId: 848423026

Fix test when it launched on the machine with 8 devices.

2b19036

PiperOrigin-RevId: 848429925

Automated Code Change

354860e

PiperOrigin-RevId: 848434764

Automated Code Change

4a2a5ae

PiperOrigin-RevId: 848441651

Automated Code Change

1a037d6

PiperOrigin-RevId: 848455572

Update GraphDef version to 2451.

c8d1b4e

PiperOrigin-RevId: 848467225

compat: Update forward compatibility horizon to 2025-12-24

6b075d4

PiperOrigin-RevId: 848467272

Automated Code Change

f77105b

PiperOrigin-RevId: 848475361

[XLA] Move xla.GpuTopology proto out of PJRT to XLA.

8e84202

It has to become a part of Compiler::CompilerOptions, but CompilerOptions should not depend on PJRT. So, moving it here. PiperOrigin-RevId: 848523186

Automated Code Change

fcc2b82

PiperOrigin-RevId: 848534440

Merge remote-tracking branch 'upstream/master' into develop-upstream-…

d040694

…sync-251224

Fix merge conflicts

5d50048

Revert 692e221

14e1429

Remove leftover diff symbols

422ffee

Fix gpu_device_info_test

1feb80c

Fix amdgpu_register_spilling_test

cd67c4f

Use googletest status assert macros patches in tf workspace2.bzl too

1787300

Disable PersistedAutotuningTest.SingleOperationGetsAutotuned

aeda463

i-chaochen self-requested a review December 30, 2025 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Develop upstream sync 251224 #3170

Develop upstream sync 251224 #3170

mmakevic-amd commented Dec 24, 2025 •

edited

Loading

Uh oh!

i-chaochen commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Develop upstream sync 251224 #3170

Are you sure you want to change the base?

Develop upstream sync 251224 #3170

Conversation

mmakevic-amd commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Submission Checklist

Uh oh!

i-chaochen commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

mmakevic-amd commented Dec 24, 2025 •

edited

Loading