Skip to content

Guard float64 atomic add for CUDA with GOOGLE_CUDA macro#3175

Open
hsharsha wants to merge 1 commit intodevelop-upstreamfrom
devel_upst_gaurd_atomics_with_cuda_macro
Open

Guard float64 atomic add for CUDA with GOOGLE_CUDA macro#3175
hsharsha wants to merge 1 commit intodevelop-upstreamfrom
devel_upst_gaurd_atomics_with_cuda_macro

Conversation

@hsharsha
Copy link

@hsharsha hsharsha commented Mar 3, 2026

Motivation

Put cuda specific code under GOOGLE_CUDA macro
Solves slowness seen as reported int https://amd-hub.atlassian.net/browse/ROCM-3072

Submission Checklist

@hsharsha hsharsha force-pushed the devel_upst_gaurd_atomics_with_cuda_macro branch from 62972ea to 892fece Compare March 4, 2026 10:56
Copy link
Collaborator

@i-chaochen i-chaochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you upstream this as well. Although we know it might take a long while to be accepted in TF upstream....let's put a PR at least..

return detail::GpuAtomicCasHelper(ptr,
[value](double a) { return a + value; });
}
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to add // CUDA_ARCH < 600 for this #endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants