Skip to content

[Metax] support FLASH_ATTN#7914

Open
Tryorish wants to merge 14 commits into
PaddlePaddle:developfrom
Tryorish:migrate-from-rel2.5
Open

[Metax] support FLASH_ATTN#7914
Tryorish wants to merge 14 commits into
PaddlePaddle:developfrom
Tryorish:migrate-from-rel2.5

Conversation

@Tryorish
Copy link
Copy Markdown
Contributor

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

xiaozude and others added 8 commits May 25, 2026 09:41
(cherry picked from commit 8130e7c5a77ba39fdb47cce4db586257a3cf10e0)

# Conflicts:
#	custom_ops/metax_ops/apply_rope_qkv.cu
#	custom_ops/metax_ops/maca_version.h
#	fastdeploy/spec_decode/mtp.py
#	fastdeploy/worker/input_batch.py
#	fastdeploy/worker/metax_model_runner.py
(cherry picked from commit 49a405b5ab0867d297c1a74643fdf83e3bb1bed5)
support cuda graph

(cherry picked from commit f78cbfbe0b69eac20bad4f5b1ed7aec25f12ce73)
(cherry picked from commit a0ca9aef03a1e7fa50a205c1737dcdf084f18685)
(cherry picked from commit e8bfe916642e78ff317a398317b846a7bd448772)

# Conflicts:
#	fastdeploy/envs.py
#	fastdeploy/worker/input_batch.py
(cherry picked from commit 0890acc6f4f94740c14e9788903ada9bbdaaf469)
(cherry picked from commit 712fd9c106109e54a7cba4e93ee90e8181d87a3d)
Copilot AI review requested due to automatic review settings May 25, 2026 08:09
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 25, 2026

Thanks for your contribution!

@paddle-bot paddle-bot Bot added the contributor External developers label May 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 面向 Metax(MACA) 平台从 rel2.5 迁移,补齐/替换注意力后端与相关自定义算子,并在 Worker/SpecDecode 路径中接入新的 forward meta 与输入缓存字段,以支持新的 FlashAttention/Triton Attention 计算链路。

Changes:

  • 在 Metax 平台新增/切换注意力后端(FlashAttention + Triton),并扩展 MetaxForwardMeta 支持 rotary_embs_bf16
  • Worker / MTP 推理链路补充 rope_emb_bf16、routing replay 初始化,以及 MTP reorder/insert 与 index_to_batch_id 的联动。
  • 扩展并接入多份 Metax 自定义算子(RoPE、KV cache 写入、FlashAttention),同时调整 custom ops 编译链接参数。

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
fastdeploy/worker/metax_worker.py cache 初始化时按配置初始化 routing replay manager
fastdeploy/worker/metax_model_runner.py 切到 MetaxForwardMeta,补充 rope_emb_bf16 并调整 MTP 调用参数
fastdeploy/worker/input_batch.py MACA 下禁用部分 pin_memory;ProposerInputBatch 补充 pre_ids/平台判断
fastdeploy/spec_decode/mtp.py MACA 条件引入 MetaxForwardMetarope_emb_bf16
fastdeploy/spec_decode/mtp_cuda.py MACA 下 forward_meta 使用 MetaxForwardMeta 并传入 rotary_embs_bf16
fastdeploy/platforms/maca.py 扩展可选注意力后端(FLASH/TRITON),并更新提示文案
fastdeploy/platforms/base.py _Backend 枚举新增 TRITON_ATTN
fastdeploy/model_executor/layers/backends/metax/attention/triton_attn_metax_backend.py 新增 Metax Triton 注意力后端(Python 侧封装)
fastdeploy/model_executor/layers/backends/metax/attention/triton_attn_kernels.py 新增 Triton kernel:unified attention(prefill/decode)
fastdeploy/model_executor/layers/backends/metax/attention/flash_attn_metax_backend.py 新增 Metax FlashAttention 后端(split/mix 两种 PD 模式)
fastdeploy/model_executor/layers/backends/metax/init.py 导出新增的 Metax Flash/Triton attention backend
fastdeploy/model_executor/forward_meta.py 新增 MetaxForwardMeta,扩展 rotary_embs_bf16 字段
fastdeploy/envs.py 新增 Metax FA split 开关与 KV cache lock 开关
custom_ops/setup_ops.py 增加 Metax 新算子源文件与链接库/头文件路径
custom_ops/metax_ops/write_cache_kv.cu 新增:将 K/V 写入 paged KV cache 的算子
custom_ops/metax_ops/write_cache_kv_with_rope.cu 新增:带 RoPE 的写 cache(含 speculate 分支)算子
custom_ops/metax_ops/rotary_position_embedding.cu 新增:可变长/Neox/partial rotary 的 RoPE 算子
custom_ops/metax_ops/flash_attention.cu 新增:对接 mcFlashAttn 的 varlen/kvcache 前向算子
custom_ops/metax_ops/maca_version.h 删除:MACA 版本宏头文件
custom_ops/metax_ops/fused_moe_gemm_kernels.h 移除 MACA_VERSION 条件分支,统一调用参数类型
custom_ops/metax_ops/apply_rope_qkv.cu 删除:旧的 apply_rope_qkv 实现
custom_ops/gpu_ops/gelu_tanh.cu 修正 block 线程数计算(避免超过 1024)
Comments suppressed due to low confidence (1)

custom_ops/metax_ops/flash_attention.cu:400

  • 同上:这里同样没有真正抛出错误,失败时会静默继续执行,可能导致 NaN/越界等后续问题。建议改为 PD_THROW 直接终止并暴露错误码。
  if (status != MCFLASHATTN_STATUS_SUCCESS) {
    phi::errors::External("Error in McFlashAttn, error code is %d", status);
  }

Comment thread fastdeploy/model_executor/layers/backends/metax/attention/triton_attn_kernels.py Outdated
Comment thread custom_ops/metax_ops/flash_attention.cu
Comment thread custom_ops/gpu_ops/gelu_tanh.cu
Comment thread fastdeploy/worker/input_batch.py Outdated
Comment thread fastdeploy/platforms/maca.py
@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 25, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-28 11:03:14

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

Required 任务仍有 2 个失败,其中 1 个为覆盖率阈值失败、1 个需要人工 Approval;请优先处理 Required 失败任务后再合入。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 36 6 0 0 0

2 任务状态汇总

日志列说明:失败任务直接使用日志链接,运行中任务使用 Job 链接。

2.1 Required任务 : 8/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 1h25m PR问题:新增 FLASH_ATTN 分支覆盖率不足 补充 MACA FLASH_ATTN 单测 Job -
Approval 18s 需要 Approval 请通过人工审批 Job -
其余 8 个必选任务通过 - - - - -

2.2 可选任务 — 28/32 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 2m18s Job -
Check PR Template 29s Job -
CI_HPU 1h4m Job -
Trigger Jenkins for PR 12s Job -
其余 28 个可选任务通过 - - -

3 失败详情(仅 required)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率阈值(置信度: 高)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

  • 状态: ❌ 失败
  • 错误类型: 覆盖率阈值
  • 置信度: 高
  • 根因摘要: 新增 FLASH_ATTN 分支覆盖率不足
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例: 无。日志显示 TEST_EXIT_CODE: 0 且输出 All tests passed,失败发生在覆盖率校验步骤。

根因详情:
diff_coverage.json 显示本次 diff 总覆盖率为 71%,低于 80% 阈值;其中 fastdeploy/platforms/maca.py 覆盖率仅 50%,违规行为 L64-L65。结合源码可见 PR 新增/修改了 MACAPlatform.get_attention_backend_cls()_Backend.FLASH_ATTN 分支,返回 MetaxFlashAttentionBackend,但现有 tests/platforms/test_platforms.py::TestMACAPlatform 仅覆盖 NATIVE/APPEND/INVALID,未覆盖该新增分支。

关键日志:

Coverage generation failed (exit code 9)
GPU Patch Coverage Details:
{"src_stats": {"fastdeploy/platforms/maca.py": {"percent_covered": 50.0,
"violation_lines": [64, 65], "covered_lines": [61, 63]},
"fastdeploy/model_executor/forward_meta.py": {"percent_covered": 100.0}},
"total_num_lines": 7, "total_num_violations": 2,
"total_percent_covered": 71}

修复建议:

  1. tests/platforms/test_platforms.pyTestMACAPlatform 中补充 FLASH_ATTN 分支单测,例如在 L94 附近新增断言:self.assertIn("MetaxFlashAttentionBackend", MACAPlatform.get_attention_backend_cls(_Backend.FLASH_ATTN))
  2. 修复后重新触发 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage,确认 diff_coverage.json 总覆盖率 ≥ 80%。

修复建议摘要: 补充 MACA FLASH_ATTN 单测

关联变更: fastdeploy/platforms/maca.py L63-L65;建议补测 tests/platforms/test_platforms.py L94 附近。
链接: 查看日志

Approval — 人工审批(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

PaddlePaddle-bot

This comment was marked as outdated.

Copilot AI review requested due to automatic review settings May 26, 2026 02:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.

Comment thread fastdeploy/model_executor/forward_meta.py
Comment thread custom_ops/metax_ops/write_cache_kv_with_rope.cu
Comment thread custom_ops/metax_ops/flash_attention.cu
Comment thread custom_ops/metax_ops/flash_attention.cu
Comment thread fastdeploy/platforms/maca.py
Comment thread fastdeploy/platforms/maca.py
Copilot AI review requested due to automatic review settings May 26, 2026 06:22
@Tryorish Tryorish changed the title [Metax] Migrate from rel2.5 [Metax] support FLASH_ATTN May 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 13 comments.

Comment thread custom_ops/metax_ops/flash_attention.cu
Comment thread custom_ops/metax_ops/flash_attention.cu
Comment thread custom_ops/metax_ops/flash_attention.cu
Comment thread custom_ops/setup_ops.py
Comment thread custom_ops/setup_ops.py
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 26, 2026

Codecov Report

❌ Patch coverage is 0.95238% with 208 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@91ca3d1). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...ckends/metax/attention/flash_attn_metax_backend.py 0.00% 196 Missing ⚠️
fastdeploy/model_executor/forward_meta.py 25.00% 3 Missing ⚠️
fastdeploy/platforms/maca.py 25.00% 2 Missing and 1 partial ⚠️
fastdeploy/worker/metax_model_runner.py 0.00% 3 Missing ⚠️
fastdeploy/worker/metax_worker.py 0.00% 2 Missing ⚠️
...y/model_executor/layers/backends/metax/__init__.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7914   +/-   ##
==========================================
  Coverage           ?   63.60%           
==========================================
  Files              ?      468           
  Lines              ?    65244           
  Branches           ?     9987           
==========================================
  Hits               ?    41496           
  Misses             ?    20945           
  Partials           ?     2803           
Flag Coverage Δ
GPU 72.86% <25.00%> (?)
XPU 7.04% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

This comment was marked as outdated.

Copilot AI review requested due to automatic review settings May 26, 2026 09:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

Comment thread custom_ops/metax_ops/write_cache_kv_with_rope.cu
Comment thread custom_ops/metax_ops/write_cache_kv_with_rope.cu
Comment thread custom_ops/metax_ops/write_cache_kv_with_rope.cu
Comment on lines +180 to +182
if (status != MCFLASHATTN_STATUS_SUCCESS) {
phi::errors::External("Error in McFlashAttn, error code is %d", status);
}
Comment thread custom_ops/metax_ops/flash_attention.cu
Comment thread fastdeploy/platforms/maca.py
Comment thread fastdeploy/platforms/maca.py
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copilot AI review requested due to automatic review settings May 27, 2026 02:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 15 comments.

Comment thread custom_ops/gpu_ops/gelu_tanh.cu
Comment on lines +180 to +182
if (status != MCFLASHATTN_STATUS_SUCCESS) {
phi::errors::External("Error in McFlashAttn, error code is %d", status);
}
Comment on lines +398 to +400
if (status != MCFLASHATTN_STATUS_SUCCESS) {
phi::errors::External("Error in McFlashAttn, error code is %d", status);
}
Comment on lines +192 to +193
if num_requests < self.max_num_seqs:
self.block_tables_buffer[num_requests:] = self.block_tables_buffer[num_requests - 1]
return "fastdeploy.model_executor.layers.attention.PaddleNativeAttnBackend"
elif selected_backend == _Backend.APPEND_ATTN:
logger.info("Using FLASH ATTN backend to instead of attend attention.")
logger.info("Using FLASH ATTN backend to instead of APPEND ATTN.")
Comment thread fastdeploy/platforms/maca.py
Comment thread custom_ops/setup_ops.py
extra_compile_args=metax_extra_compile_args,
library_dirs=[os.path.join(maca_path, "lib")],
extra_link_args=["-lruntime_cu", "-lmctlassEx"],
extra_link_args=["-lruntime_cu", "-lmctlassEx", "-lmcFlashAttn"],
Comment thread custom_ops/setup_ops.py
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-27 10:32:53

📋 Review 摘要

PR 概述:为沐曦(Metax)GPU 新增 Flash Attention 支持,替换旧版 RoPE 实现,并修复 gelu_tanh block size 计算错误。
变更范围custom_ops/metax_ops/fastdeploy/worker/fastdeploy/model_executor/layers/backends/metax/
影响面 Tag[Metax] [OP]

问题

级别 文件 概述
🔴 Bug custom_ops/metax_ops/flash_attention.cu:182 phi::errors::External(...) 仅构造错误对象,不抛出异常,mha 调用失败后静默继续执行
🔴 Bug custom_ops/metax_ops/flash_attention.cu flash_attn_kvcache_forward 中同样存在相同问题(mha_fwd_kvcache 失败后不抛出)

📝 PR 规范检查

PR 标题缺少功能性 Tag([Metax] 单独使用语义不完整,建议补充 [Feature]),且 PR 描述所有章节均为空模板,未填写任何实质内容。

标题建议(可直接复制):

  • [Metax][Feature] Support Flash Attention for Metax GPU
PR 描述建议(点击展开,可直接复制)
## Motivation
为沐曦(Metax)GPU 添加 Flash Attention 支持,使用 McFlashAttn 库替换原有自定义 RoPE+Attention 实现,提升推理性能。同时修复 gelu_tanh kernel 中 block size 上限计算错误(`std::max``std::min`),并移除已过时的 MACA 版本兼容代码(最低版本要求提升至 > 3.3.2.0)。

## Modifications
- `custom_ops/metax_ops/flash_attention.cu`:新增 Flash Attention 算子,支持 `flash_attn_varlen_forward`(变长序列 prefill)和 `flash_attn_kvcache_forward`(decode 阶段 KV Cache)两种模式
- `custom_ops/metax_ops/rotary_position_embedding.cu`:新增 RoPE 位置编码 kernel(GQA 支持,含 neox/partial 变体),替换旧版 `apply_rope_qkv.cu`
- `custom_ops/metax_ops/apply_rope_qkv.cu`:删除旧版 RoPE 实现
- `custom_ops/metax_ops/maca_version.h`:删除版本兼容头文件
- `custom_ops/metax_ops/fused_moe_gemm_kernels.h`:移除 MACA 版本条件编译分支
- `custom_ops/gpu_ops/gelu_tanh.cu`:修复 block size 计算错误(`std::max``std::min`- `fastdeploy/model_executor/layers/backends/metax/attention/flash_attn_metax_backend.py`:新增 Python 侧 Flash Attention backend 调用封装
- `fastdeploy/worker/metax_model_runner.py``metax_worker.py`:适配新 attention backend
- `custom_ops/setup_ops.py`:更新编译源文件列表

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

新增 Metax Flash Attention 功能整体结构清晰,但两处 phi::errors::External(...) 调用存在严重错误处理缺陷,必须修复后方可合入。PR 描述需补充完整。


if (status != MCFLASHATTN_STATUS_SUCCESS) {
phi::errors::External("Error in McFlashAttn, error code is %d", status);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug phi::errors::External(...) 仅构造错误对象但不抛出,mha_varlen_fwd 失败后程序静默继续执行,后续 release_tensor 正常调用但输出结果为无效数据。

flash_attn_kvcache_forwardmha_fwd_kvcache 调用后存在相同问题。

建议修复:

if (status != MCFLASHATTN_STATUS_SUCCESS) {
    PADDLE_THROW(phi::errors::External(
        "McFlashAttn failed with error code %d", status));
}

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 28, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-30 10:14:39

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

2 个 required 任务失败,需优先处理后方可合并。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 36 6 0 0 0

2 任务状态汇总

2.1 Required任务 : 8/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Approval 18s 需要 Approval 请通过人工审批 Job -
run_tests_with_coverage 1h25m PR问题:FLASH_ATTN 分支覆盖率 71% < 80% 补充 FLASH_ATTN 分支单测 Job -
其余 8 个必选任务通过 - - - - -

2.2 可选任务 — 28/32 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 2m18s Job -
Check PR Template 29s Job -
CI_HPU 1h4m Job -
Trigger Jenkins for PR 12s Job -
其余 28 个可选任务通过 - - -

3 失败详情(仅 required)

Approval — 需要人工审批(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

run_tests_with_coverage — 覆盖率阈值(置信度: 高)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

  • 状态: ❌ 失败
  • 错误类型: 覆盖率阈值
  • 置信度: 高
  • 根因摘要: 新增 FLASH_ATTN 分支覆盖率不足
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例: 无。日志显示 TEST_EXIT_CODE: 0 且输出 All tests passed,失败发生在覆盖率校验步骤。

根因详情:
diff_coverage.json 显示本次 diff 总覆盖率为 71%,低于 80% 阈值;其中 fastdeploy/platforms/maca.py 覆盖率仅 50%,违规行为 L64-L65。结合源码可见 PR 新增/修改了 MACAPlatform.get_attention_backend_cls()_Backend.FLASH_ATTN 分支,返回 MetaxFlashAttentionBackend,但现有 tests/platforms/test_platforms.py::TestMACAPlatform 仅覆盖 NATIVE/APPEND/INVALID,未覆盖该新增分支。

关键日志:

Coverage generation failed (exit code 9)
GPU Patch Coverage Details:
{"src_stats": {"fastdeploy/platforms/maca.py": {"percent_covered": 50.0,
"violation_lines": [64, 65], "covered_lines": [61, 63]},
"fastdeploy/model_executor/forward_meta.py": {"percent_covered": 100.0}},
"total_num_lines": 7, "total_num_violations": 2,
"total_percent_covered": 71}

修复建议:

  1. tests/platforms/test_platforms.pyTestMACAPlatform 中补充 FLASH_ATTN 分支单测,例如在 L94 附近新增断言:self.assertIn("MetaxFlashAttentionBackend", MACAPlatform.get_attention_backend_cls(_Backend.FLASH_ATTN))
  2. 修复后重新触发 run_tests_with_coverage,确认 diff_coverage.json 总覆盖率 ≥ 80%。

关联变更: fastdeploy/platforms/maca.py L63-L65;建议补测 tests/platforms/test_platforms.py L94 附近。
链接: 查看日志

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants