Skip to content

fix cast from f32 to f16#9

Open
Desjajja wants to merge 1 commit intoInfiniTensor:mainfrom
Desjajja:fix-cast
Open

fix cast from f32 to f16#9
Desjajja wants to merge 1 commit intoInfiniTensor:mainfrom
Desjajja:fix-cast

Conversation

@Desjajja
Copy link
Copy Markdown

原有的 _f32_to_f16 (llaisys/src/utils/types.cpp)没有对fp16向上取整的逻辑,导致rms_norm在fp16的测试用例无法通过(atol=1e-3)
在本pr进行了修改。

xsmccc added a commit to xsmccc/llaisys that referenced this pull request Apr 6, 2026
- §1: Add KV Cache INT8 (InfiniTensor#4) and CUDA Graph (InfiniTensor#5) to project intro (7→9 optimizations)
- §32: Rewrite optimization InfiniTensor#8 from 'failed CUDA Graph' to successful KV Cache INT8 (+55%)
- §32: Add optimization InfiniTensor#9 CUDA Graph static capture (+12.2%, 118→132 tok/s)
- §32: Update acceleration breakdown table (330× complete, FP32 4.4×)
- §24.5: Fix perf numbers (57.3→57.5, FP32 33.6→~30, add final 132 tok/s)
- §40: Update quantization Q&A with full pipeline data
- §43: Rewrite cudaGraph section with project-specific implementation details
- Clean up duplicate INT4 paragraph, fix title counts (七→九项)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant