Skip to content

fix: correct batched matmul stride for bs=1 and integrate CTest into build pipeline#157

Merged
kilinchange merged 1 commit into
masterfrom
fix/hot-fix-ctest
May 25, 2026
Merged

fix: correct batched matmul stride for bs=1 and integrate CTest into build pipeline#157
kilinchange merged 1 commit into
masterfrom
fix/hot-fix-ctest

Conversation

@chen2021673
Copy link
Copy Markdown
Contributor

@chen2021673 chen2021673 commented May 21, 2026

Summary

  • 修复 matmul 单 batch 时 stride 计算错误:当 bs == 1 时将 stride 设为 0,
    使 cuBLAS 正确广播单个矩阵,避免非法内存访问或结果错误。涉及 forward、backward_input、backward_other 三个函数。
  • 构建流水线集成 CTest:在 run_models_and_profile.bash 中新增 RUN_CTEST / CTEST_CMD
    可配置项,构建完成后自动执行测试;test_config.json 添加对应默认值。
  • 新增 googletest 子模块third_party/googletest),为 CTest 提供测试框架支持。
  • 文档更新:cmake 示例补充 -DUSE_NCCL=ON 标志。

Changes

文件 改动
infini_train/src/kernels/cuda/matmul.cu stride 在 bs==1 时设为 0(forward/backward)
scripts/run_models_and_profile.bash 新增 RUN_CTEST 开关和 CTEST_CMD,构建后条件执行
scripts/test_config.json 添加 RUN_CTEST、CTEST_CMD 默认配置
docs/test_usage_guide.md cmake 命令加 -DUSE_NCCL=ON
third_party/googletest 添加 googletest 子模块

Test

ctest 测例全部通过,写入对应文件。此处将测试按照非CUDA和CUDA标签来执行(ctest --output-on-failure -LE cuda -j$(nproc) && ctest --output-on-failure -L cuda -j1 ),目的是非CUDA测试并行加速。CUDA测试并行会抢占资源,待后续优化修复。
非CUDA测试:
image

CUDA测试:
image

loss对比:
image

性能对比:
image

@chen2021673 chen2021673 changed the title fix: correct batched matmul strides for bs=1 and integrate CTest into… fix: correct batched matmul stride for bs=1 and integrate CTest into build pipeline May 21, 2026
@kilinchange
Copy link
Copy Markdown
Collaborator

请贴测试通过截图。

@chen2021673 chen2021673 requested a review from kilinchange May 25, 2026 02:51
… build pipeline

Set stride to 0 when batch_size is 1 to enable proper broadcasting in cuBLAS,
and add configurable CTest execution after builds with googletest submodule.
@kilinchange kilinchange merged commit d2ee257 into master May 25, 2026
2 checks passed
@kilinchange kilinchange deleted the fix/hot-fix-ctest branch May 25, 2026 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants