Skip to content

Conversation

@bjarthur
Copy link
Collaborator

@bjarthur bjarthur commented Apr 25, 2023

fixes #1.

not merged yet because benchmarks are slower by ~10%:

Screenshot 2023-04-25 at 4 35 39 PM

the huge regression in batched_dot can partially be fixed by specifying CUDABackend(prefer_blocks=true), but this then is not vendor agnostic. see https://discourse.julialang.org/t/kernelabstractions-get-backend-keyword-arguments/97895

@bjarthur
Copy link
Collaborator Author

second pass at KA:

Screenshot 2024-06-11 at 2 35 19 PM

num threads hard-coded at 32 in the first (only) dimension to maximize block utilization mostly alleviates regression in bdot.

see JuliaGPU/KernelAbstractions.jl#479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor to be vendor agnostic

2 participants