Guard float64 atomic add for CUDA with GOOGLE_CUDA macro by hsharsha · Pull Request #3179 · ROCm/tensorflow-upstream

hsharsha · 2026-03-03T17:17:03Z

Motivation

Put cuda specific code under GOOGLE_CUDA macro
Solves slowness seen as reported int https://amd-hub.atlassian.net/browse/ROCM-3072

hsharsha requested review from draganmladjenovic, i-chaochen, jayfurmanek, mmakevic-amd and nurmukhametov March 3, 2026 17:17

Guard float64 atomic add for CUDA with GOOGLE_CUDA macro

34351a8

hsharsha force-pushed the r2.18-gaurd_atomics_with_cuda_macro branch from 9c6d8f0 to 34351a8 Compare March 4, 2026 11:00

i-chaochen approved these changes Mar 4, 2026

View reviewed changes

nurmukhametov approved these changes Mar 4, 2026

View reviewed changes

i-chaochen merged commit e52cdda into r2.18-rocm-enhanced Mar 6, 2026
5 of 7 checks passed