Optimize the rotm kernel with RVV intrinsic. #5038

tingboliao · 2024-12-31T02:51:42Z

Based on the scalar implementation of rotm, we optimized it by using RVV 1.0 Intrinsic.
Subsequently, we developed related cases for the functional and performance verifications on K230 and K1.

The performance data are shown as below:

Parameter setting: OPENBLAS_LOOPS = 10000.

K230 [C908, vlen = 128]@1.6GHz:
| Cases | Scalar / MFlops | Optimized RVV / MFlops |
| srotm.goto | 875.57 | 1536.78 |
| drotm.goto | 799.77 | 1408.70 |
K1 [C908, vlen = 256]@1.6GHz:
| Cases | Scalar / MFlops | Optimized RVV / MFlops |
| srotm.goto | 880.02 | 1490.44 |
| drotm.goto | 811.13 | 1541.92 |

In the above data, the bigger value is, the better performance is.

Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>

martin-frbg · 2024-12-31T22:40:28Z

Thanks - the numbers are very compelling, but I'm not entirely sure having that much architecture-specific code at the interface level is a good idea. At least I don't think we've done this before, and if every architecture ifdef'd their specific intrinsics implementation into it, the file would get unwieldy rather quickly. (Need some time to think about alternatives though - not sure if it's easy to add a kernel mapping for just riscv64 either...)

tingboliao · 2025-01-02T00:57:29Z

Thanks, we will further consider new alternatives, and submit a new Pull Request (PR) later if possible.

tingbo.liao added 2 commits December 31, 2024 10:32

Optimize the rotm kernel with RVV intrinsic.

2afd741

Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>

Correct the usage conditions of the macro RISCV_SIMD.

c2271f2

Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>

Merge branch 'OpenMathLib:develop' into dev_rotm_1231

da8af30

tingboliao closed this Jan 2, 2025

tingboliao reopened this Jan 7, 2025

tingboliao closed this Jan 7, 2025

tingboliao mentioned this pull request Jan 7, 2025

Rearranged the rotm kernel to adapt to the architecture. #5053

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize the rotm kernel with RVV intrinsic. #5038

Optimize the rotm kernel with RVV intrinsic. #5038

Uh oh!

tingboliao commented Dec 31, 2024

Uh oh!

martin-frbg commented Dec 31, 2024

Uh oh!

tingboliao commented Jan 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimize the rotm kernel with RVV intrinsic. #5038

Optimize the rotm kernel with RVV intrinsic. #5038

Uh oh!

Conversation

tingboliao commented Dec 31, 2024

Uh oh!

martin-frbg commented Dec 31, 2024

Uh oh!

tingboliao commented Jan 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants