Skip to content

Conversation

@jieli-matrix
Copy link
Contributor

This PR introduces specialized GPU operators to accelerate the sincos computation bottlenecks in force calculations. The implementation targets the most computationally intensive loops in cal_force_loc and cal_force_ew functions, where ModuleBase::libm::sincos has been identified as the primary CPU hotspot.

Done:

  • Operator interface design
  • CPU reference implementations
  • CUDA/HIP GPU kernels
  • Code Integration and Calling Interface

ToDos:

  • AtomicAdd Optimization
  • Comment in English

optimization in davidson-subspcae algorithm

- add k continuity initialization strategy in planewave basis
- implement heterogenous computation branching between CPU and DCU
- implement optimized eigenvalue operations for GPU & DCU
- implement optimized preconditioner for GPU & DCU
- implement optimized normalization op for GPU & DCU
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant