Skip to content

Conversation

@mmakevic-amd
Copy link

@mmakevic-amd mmakevic-amd commented Dec 24, 2025

Motivation

Bi-weekly sync from TensorFlow upstream

Disabled tests:

# @local_xla//xla/service/gpu:gpu_compiler_test_amdgpu_any
PersistedAutotuningTest.SingleOperationGetsAutotuned

Submission Checklist

thcmbs and others added 30 commits December 18, 2025 02:31
+ Allow the chain to start from <transpose, reshape, bitcast> instead of only reshape
+ Add a layout sensitive mode to the simplification

PiperOrigin-RevId: 846150097
Imported from GitHub PR openxla/xla#35479

Add clangd files and directories to .gitignore
Copybara import of the project:

--
2999b064c6b756dfc0355d863b863aff1bdea2fa by Eugene Zhulenev <ezv@amazon.com>:

Add clangd files and directories to .gitignore

Add clangd files and directories to .gitignore

Merging this change closes tensorflow#35479

PiperOrigin-RevId: 846156873
PiperOrigin-RevId: 846167560
…intExpression.

Helps with narrowing down which constraints are unsat. There can be many constraints (e.g. WGMMA in Mosaic), and while debugging it's unclear which one is violated at a glance.

As a follow up, we can also introduce names to each Constraint to make the identification even easier.

PiperOrigin-RevId: 846168559
PiperOrigin-RevId: 846171859
PiperOrigin-RevId: 846173555
…TF normalization in emitters

0) Fix a bug (?) in normalization util when normalized dim contains a single dimension
1) Perform normalization OTF for Transpose emitter selection
2) Use normalized shape for unrolling decision in kLoop emitter
3) Use normalized shape to detect slow transposes in triton fusion rewriter

PiperOrigin-RevId: 846191206
…t.cc

This change updates custom_call_test.cc to dynamically register custom call targets and FFI handlers using the runtime-determined platform name (CUDA or ROCM). This replaces the use of static registration macros, allowing the tests to run correctly across different GPU platforms and the reference interpreter.

This way we can avoid compile time branches like `#ifdef GOOGLE_CUDA` and similar.

Also:

1. Converts usage of raw CUDA driver API functions to StreamExecutor functionality
2. Replaces some legacy CustomCalls by FFI
3. Converts the while test target to HloRunnerPjRt
4. Removes a test case from the Token tests with a nested type in the output type, since that's not supported by our PjRt implementation.

PiperOrigin-RevId: 846196106
The `fd.Size()` check doesn't work when the file descriptor is invalid and only
the path was given.

PiperOrigin-RevId: 846207406
PiperOrigin-RevId: 846213195
PiperOrigin-RevId: 846214738
PiperOrigin-RevId: 846217449
PiperOrigin-RevId: 846221752
The ROCm code path doesn't go through NcclCollectives anymore. Therefore these checks are obsolete.

PiperOrigin-RevId: 846226180
PiperOrigin-RevId: 846226345
PiperOrigin-RevId: 846231902
PiperOrigin-RevId: 846234559
This migrates `builder.create<Op>()` => `Op::create()`

PiperOrigin-RevId: 846246070
This change moves the definition of `AotCompilationResult` into a new header file `compiled_module.h` and renames the class to `CompiledModule`. `CompilationResult` would have been the preferred name, but it's already in-use elsewhere.

The original `AotCompilationResult` is kept as a deprecated alias.

PiperOrigin-RevId: 846246415
…ests, rather than on the original dimensions.

These are simpler both to write and to think about.

No behavior changes are intended.

PiperOrigin-RevId: 846253300
PiperOrigin-RevId: 846257722
… its allocation later

Imported from GitHub PR openxla/xla#35510

📝 Summary of Changes
Initialize collectives pointer to nullptr

🎯 Justification

Gpu runtime options are initialized in TF and transferred to XLA to execute thunks. Since the memory is not cleared collectives point to an uninitialized memory resulting in segfault during nccl collective initialization and operation.

🚀 Kind of Contribution
Please remove what does not apply: 🐛 Bug Fix,

Copybara import of the project:

--
2bfc6fbddbf2f9a926dd504169c56be45d2f1a0a by Harsha HS <Harsha.HavanurShamsundara@amd.com>:

[ROCm] Initialze collectives to nullptr to force its allocation later

Merging this change closes tensorflow#35510

PiperOrigin-RevId: 846266642
This migrates `builder.create<Op>()` => `Op::create()`

PiperOrigin-RevId: 846268375
…utor_test.

The local_defines for CUDA/ROCM are not required for this test. Added explicit includes for headers used in gpu_executor_test.cc.

PiperOrigin-RevId: 846269233
Imported from GitHub PR openxla/xla#35482

Sometime json incorrectly parse compile commands from bazel, and we end up passing them as

```
"-isystem path/to/includes"
```

to `clangd`, and these flags parsed incorrectly
Copybara import of the project:

--
adf291e21b098d79fa3be4065ee02fafdf5c660a by Eugene Zhulenev <ezhulenev@google.com>:

Correctly generate compile_commands.json

Merging this change closes tensorflow#35482

PiperOrigin-RevId: 846269357
Depending on the compiler, `testing::TempDir() + __FUNCTION__` may generate and
invalid file name.

PiperOrigin-RevId: 846275995
…iguous send/recv buffers

Imported from GitHub PR openxla/xla#35463

With latest NCCL we can use `ncclAlltoall` API directly without having to launch grouped send and recv operations.
Copybara import of the project:

--
0630f4d48049b211442dcb1754e521a4b1f37f7b by Eugene Zhulenev <ezv@amazon.com>:

[xla:gpu] Support ncclAlltoall directly for contiguous send/recv buffers

Merging this change closes tensorflow#35463

PiperOrigin-RevId: 846277559
…is supported by libraries.

PiperOrigin-RevId: 846299624
ermilovmaxim and others added 28 commits December 23, 2025 14:34
PiperOrigin-RevId: 848297480
Modify Thunk's serialization

PiperOrigin-RevId: 848309350
PiperOrigin-RevId: 848310259
Modify Thunk's serialization

PiperOrigin-RevId: 848323137
…lectiveDeviceListBase in place of vector<vector<int>> and reduce cognitive complexity in `GetDefaultCollectiveOpsCreator`.

PiperOrigin-RevId: 848356290
PiperOrigin-RevId: 848375547
PiperOrigin-RevId: 848382953
PiperOrigin-RevId: 848387274
PiperOrigin-RevId: 848393091
PiperOrigin-RevId: 848423026
PiperOrigin-RevId: 848434764
PiperOrigin-RevId: 848441651
…stub.

The `xtile_compiler` target now acts as a selector, depending on either `xtile_compiler_impl` or `xtile_compiler_stub` based on whether CUDA or ROCm is configured. The full implementation is moved to the new `xtile_compiler_impl` target, while `xtile_compiler_stub` provides a minimal version for other configurations.

This has the advantage that build_cleaner can run on xtile_compiler_impl. (Doing that removed around 20 dependencies)

PiperOrigin-RevId: 848442213
PiperOrigin-RevId: 848455572
PiperOrigin-RevId: 848467225
PiperOrigin-RevId: 848475361
It has to become a part of Compiler::CompilerOptions, but CompilerOptions should not depend on PJRT. So, moving it here.

PiperOrigin-RevId: 848523186
PiperOrigin-RevId: 848534440
@i-chaochen i-chaochen self-requested a review December 30, 2025 11:53
@i-chaochen
Copy link
Collaborator

This test is failed, seems backend config (h100_sxm) is incorrect

@local_xla//xla/tools:xla_gpu_compile_lib_test_amdgpu_any                FAILED in 13.4s

[2025-12-30T13:08:37.673Z] [ RUN      ] XlaCompileLibTest.CompilesForGpuWithoutDevice
[2025-12-30T13:08:37.673Z] external/local_xla/xla/tools/xla_gpu_compile_lib_test.cc:80: Failure
[2025-12-30T13:08:37.673Z] Value of: (tsl::ReadTextProto(tsl::Env::Default(), target_config_path, &target_config))
[2025-12-30T13:08:37.673Z] Expected: is OK
[2025-12-30T13:08:37.673Z]   Actual: NOT_FOUND: /root/.cache/bazel/_bazel_root/f14ffb85b056b92f87114ec3419b920b/execroot/org_tensorflow/bazel-out/k8-opt/bin/external/local_xla/xla/tools/xla_gpu_compile_lib_test_amdgpu_any.runfiles/org_tensorflow/xla/backends/gpu/target_config/specs/h100_sxm.txtpb; No such file or directory (of type absl::lts_20250814::Status)
[2025-12-30T13:08:37.673Z] 
[2025-12-30T13:08:37.673Z] [  FAILED  ] XlaCompileLibTest.CompilesForGpuWithoutDevice (0 ms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.