Feature: Implement cal_force_op for sincos parallel #347

jieli-matrix · 2025-05-30T07:13:48Z

This PR introduces specialized GPU operators to accelerate the sincos computation bottlenecks in force calculations. The implementation targets the most computationally intensive loops in cal_force_loc and cal_force_ew functions, where ModuleBase::libm::sincos has been identified as the primary CPU hotspot.

Done:

Operator interface design
CPU reference implementations
CUDA/HIP GPU kernels
Code Integration and Calling Interface

ToDos:

AtomicAdd Optimization
Comment in English

optimization in davidson-subspcae algorithm - add k continuity initialization strategy in planewave basis - implement heterogenous computation branching between CPU and DCU - implement optimized eigenvalue operations for GPU & DCU - implement optimized preconditioner for GPU & DCU - implement optimized normalization op for GPU & DCU

jieli-matrix added 4 commits April 24, 2025 11:10

implement gpu op for sincos loops

c605cae

add cpu kernel for cal_force_loc & cal_force_ew

fbfc91a

fix sincos op for gpu&cpu

8a6339c

jieli-matrix closed this May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Implement cal_force_op for sincos parallel #347

Feature: Implement cal_force_op for sincos parallel #347

Uh oh!

jieli-matrix commented May 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feature: Implement cal_force_op for sincos parallel #347

Feature: Implement cal_force_op for sincos parallel #347

Uh oh!

Conversation

jieli-matrix commented May 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant