Skip to content

Conversation

@tingboliao
Copy link

Hi, martin-frbg:
In accordance with the suggestion put forward in PR #5038, we rearranged the rotm kernel to ensure its compatibility with the architecture. Additionally, we developed relevant test cases for conducting functional and performance verifications on K230 and K1 platforms.

The latest performance data are shown as below:
Parameter setting: OPENBLAS_LOOPS = 10000.

K230 [C908, vlen = 128]@1.6GHz:
| Cases | Scalar / MFlops | Optimized RVV / MFlops |
| srotm.goto | 872.52 | 1545.43 |
| drotm.goto | 797.53 | 1410.64 |

K1 [C908, vlen = 256]@1.6GHz:
| Cases | Scalar / MFlops | Optimized RVV / MFlops |
| srotm.goto | 896.42 | 1512.47 |
| drotm.goto | 819.14 | 1576.13 |

In the above data, the bigger value is, the better performance is.

Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
@tingboliao tingboliao closed this Jan 7, 2025
@tingboliao tingboliao reopened this Jan 7, 2025
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
@tingboliao tingboliao closed this Jan 8, 2025
@martin-frbg
Copy link
Collaborator

Right, this should work (though I lacked the time to test it, and it is a fairly invasive change), thanks. If you want to pursue this, I'd prefer merging this after the 0.3.29 release ? And placing the plain non-RVV kernel in kernel/generic rather than kernel/riscv64 would probably be a better choice. (Though it is already quite chaotic with half of the "generic" kernels living in arm and/or having duplicates in mips or elsewhere - there has been some wild growth in the past decade)

@tingboliao
Copy link
Author

Right, this should work (though I lacked the time to test it, and it is a fairly invasive change), thanks. If you want to pursue this, I'd prefer merging this after the 0.3.29 release ? And placing the plain non-RVV kernel in kernel/generic rather than kernel/riscv64 would probably be a better choice. (Though it is already quite chaotic with half of the "generic" kernels living in arm and/or having duplicates in mips or elsewhere - there has been some wild growth in the past decade)

Hi, martin-frbg:
I found that when testing the submitted code on the system, there are many failures on other architectures such as ARM, MIPS, and so on. Just as you mentioned, this is a fairly invasive change. Therefore, I withdrew the submitted code and closed the related Pull Request (PR). I'll further modify the code architecture when possible to make it fit all architectures. Thanks.

@martin-frbg
Copy link
Collaborator

I see - offhand you'll need to add defines for SROTM_K/DROTM_K connecting them to gotoblas->srotm_k (or drotm_k) , and entries srotm_kTS, drotm_KTS in kernel/setparam-ref.c for DYNAMIC_ARCH builds to work with your changes. I cannot see right away why single-target builds would be broken, need to apply your patch and test locally for that when I have time

@tingboliao
Copy link
Author

I see - offhand you'll need to add defines for SROTM_K/DROTM_K connecting them to gotoblas->srotm_k (or drotm_k) , and entries srotm_kTS, drotm_KTS in kernel/setparam-ref.c for DYNAMIC_ARCH builds to work with your changes. I cannot see right away why single-target builds would be broken, need to apply your patch and test locally for that when I have time

Hi, @martin-frbg:
I've updated the latest code to the PR #5069.
If you have time, you can go to the PR #5069, apply the newest patch and conduct local tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants