Add EXAONE 4.0 model support for DeepSpeed inference v2 @ by notkisk · Pull Request #7456 · deepspeedai/DeepSpeed

notkisk · 2025-07-29T01:48:18Z

#7453
Implements comprehensive support for EXAONE 4.0 models (32B and 1.2B variants) in DeepSpeed's inference v2 framework.

Key features:

Hybrid attention mechanism with 3:1 sliding window to full attention ratio
QK-Reorder-Norm support for custom normalization ordering
Conditional RoPE application (skipped for global attention layers)
Grouped Query Attention (40 query heads, 8 key-value heads)
Full compatibility with ZeRO optimization stages
Parameter mapping between HuggingFace and DeepSpeed formats

Implementation includes:

ExaoneTransformerContainer and ExaoneNonTransformerContainer for parameter management
ExaoneInferenceModel with layer type detection and hybrid attention logic
ExaonePolicy for model instantiation and container orchestration
Comprehensive unit test suite with 14 test cases
Integration with existing DeepSpeed inference v2 architecture

Validated with EXAONE-4.0-32B and EXAONE-4.0-1.2B models from HuggingFace.

notkisk · 2025-07-29T10:37:26Z

@hwchen2017 @tohtana @tjruwase @loadams Please take a look!

notkisk · 2025-07-29T15:58:05Z

@loadams

hwchen2017 · 2025-08-12T06:03:02Z

Hi @notkisk , I tried to test your code, and get the following error:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/deepspeed/hongwei/test.py", line 23, in <module>
[rank0]:     pipe = pipeline("LGAI-EXAONE/EXAONE-4.0-1.2B")
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/mii/api.py", line 231, in pipeline
[rank0]:     inference_engine = load_model(model_config)
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/mii/modeling/models.py", line 17, in load_model
[rank0]:     inference_engine = build_hf_engine(
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/engine_factory.py", line 142, in build_hf_engine
[rank0]:     return InferenceEngineV2(policy, engine_config)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/engine_v2.py", line 83, in __init__
[rank0]:     self._model = self._policy.build_model(self._config, self._base_mp_group)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 157, in build_model
[rank0]:     self.populate_model_parameters()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 199, in populate_model_parameters
[rank0]:     container_map.map_param(name, parameter)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 78, in map_param
[rank0]:     self._non_transformer_params.set_dependency(name, parameter)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/layer_container_base.py", line 318, in set_dependency
[rank0]:     setattr(target_param, target_dependency_name, dep_value)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/parameter_base.py", line 39, in param_setter
[rank0]:     self.complete_component()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/parameter_base.py", line 164, in complete_component
[rank0]:     finalized_param = self.finalize()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/common_parameters/embedding_parameters.py", line 26, in finalize
[rank0]:     return self.inference_model.transform_embedding_param(self.params)
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 211, in __getattribute__
[rank0]:     return super().__getattribute__(key)
[rank0]: AttributeError: 'Exaone4Config' object has no attribute 'transform_embedding_param'

Can you show me how you verified the code? Also your can contribute the test code to deepspeed example.

hwchen2017 · 2025-08-12T06:07:58Z

+        map.set_transformer_params(['model.layers'], transformer_containers)
+
+        # Create non-transformer container for embedding/output/norm parameters
+        map.set_non_transformer_params(ExaoneNonTransformerContainer(self._model_config))


Looks like that the parameter is supposed to be self.model

- Added @pytest.mark.inference_v2 markers to all test methods in test_exaone.py - This ensures the tests are included in CI workflow runs for inference v2 - Tests will now run automatically with the nv-a6000.yml workflow Signed-off-by: notkisk <salahxd99@gmail.com>

## Summary Add support for LG AI Research's EXAONE 4.0 model family in DeepSpeed Inference V2. Closes #7453 ## Changes - New model implementation: `deepspeed/inference/v2/model_implementations/exaone4/` - `container.py`: Transformer and non-transformer parameter containers - `model.py`: Inference model with post-norm architecture and QK-Norm support - `policy.py`: Inference V2 policy - Register EXAONE 4.0 in `engine_factory.py` and `__init__.py` ## Key architectural differences from Mistral/Llama - **Post-norm**: RMSNorm is applied after attention/MLP outputs (not before), followed by residual addition - **QK-Norm**: Per-head RMSNorm applied to Q and K projections after the QKV linear layer - **Hybrid attention**: 32B model uses 3:1 sliding window/full attention ratio (via `layer_types` config) ## Supported models - [EXAONE-4.0-1.2B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-1.2B) (all full attention) - [EXAONE-4.0-32B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B) (hybrid sliding/full attention) Requires `transformers >= 4.54.0`. ## Related - Supersedes #7456 (draft, inactive for 6 months) --------- Signed-off-by: Bias92 <pewpewplay315@gmail.com>

## Summary Add support for LG AI Research's EXAONE 4.0 model family in DeepSpeed Inference V2. Closes deepspeedai#7453 ## Changes - New model implementation: `deepspeed/inference/v2/model_implementations/exaone4/` - `container.py`: Transformer and non-transformer parameter containers - `model.py`: Inference model with post-norm architecture and QK-Norm support - `policy.py`: Inference V2 policy - Register EXAONE 4.0 in `engine_factory.py` and `__init__.py` ## Key architectural differences from Mistral/Llama - **Post-norm**: RMSNorm is applied after attention/MLP outputs (not before), followed by residual addition - **QK-Norm**: Per-head RMSNorm applied to Q and K projections after the QKV linear layer - **Hybrid attention**: 32B model uses 3:1 sliding window/full attention ratio (via `layer_types` config) ## Supported models - [EXAONE-4.0-1.2B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-1.2B) (all full attention) - [EXAONE-4.0-32B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B) (hybrid sliding/full attention) Requires `transformers >= 4.54.0`. ## Related - Supersedes deepspeedai#7456 (draft, inactive for 6 months) --------- Signed-off-by: Bias92 <pewpewplay315@gmail.com> Signed-off-by: nathon-lee <leejianwoo@gmail.com>

notkisk requested review from hwchen2017, loadams, tjruwase and tohtana as code owners July 29, 2025 01:48

notkisk force-pushed the feature/exaone-4.0-support branch from 0f7375e to 299d96a Compare July 29, 2025 01:54

notkisk mentioned this pull request Jul 29, 2025

[REQUEST] Add support for EXAONE 4.0 models #7453

Closed

loadams reviewed Jul 29, 2025

View reviewed changes

Comment thread tests/unit/inference/v2/model_implementations/test_exaone.py

notkisk force-pushed the feature/exaone-4.0-support branch from d6d4e0e to 11792c2 Compare July 29, 2025 15:55

notkisk requested a review from loadams July 30, 2025 11:44

notkisk force-pushed the feature/exaone-4.0-support branch 2 times, most recently from 6663bf8 to 0b346ec Compare August 10, 2025 14:04

hwchen2017 reviewed Aug 12, 2025

View reviewed changes

notkisk added 2 commits August 12, 2025 13:57

Fix EXAONE 4.0 policy container mapping issue

ef075c5

notkisk force-pushed the feature/exaone-4.0-support branch from 0b346ec to f0fcaf5 Compare August 12, 2025 14:00

notkisk marked this pull request as draft August 12, 2025 14:26

Bias92 mentioned this pull request Feb 14, 2026

Add EXAONE 4.0 model support for Inference V2 #7853

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EXAONE 4.0 model support for DeepSpeed inference v2 @#7456

Add EXAONE 4.0 model support for DeepSpeed inference v2 @#7456
notkisk wants to merge 2 commits intodeepspeedai:masterfrom
notkisk:feature/exaone-4.0-support

notkisk commented Jul 29, 2025 •

edited

Loading

Uh oh!

notkisk commented Jul 29, 2025

Uh oh!

Uh oh!

notkisk commented Jul 29, 2025

Uh oh!

hwchen2017 commented Aug 12, 2025 •

edited

Loading

Uh oh!

hwchen2017 Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

notkisk commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

notkisk commented Jul 29, 2025

Uh oh!

Uh oh!

notkisk commented Jul 29, 2025

Uh oh!

hwchen2017 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hwchen2017 Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

notkisk commented Jul 29, 2025 •

edited

Loading

hwchen2017 commented Aug 12, 2025 •

edited

Loading