Skip to content

Add EXAONE 4.0 model support for DeepSpeed inference v2 @#7456

Draft
notkisk wants to merge 2 commits intodeepspeedai:masterfrom
notkisk:feature/exaone-4.0-support
Draft

Add EXAONE 4.0 model support for DeepSpeed inference v2 @#7456
notkisk wants to merge 2 commits intodeepspeedai:masterfrom
notkisk:feature/exaone-4.0-support

Conversation

@notkisk
Copy link
Copy Markdown

@notkisk notkisk commented Jul 29, 2025

#7453
Implements comprehensive support for EXAONE 4.0 models (32B and 1.2B variants) in DeepSpeed's inference v2 framework.

Key features:

  • Hybrid attention mechanism with 3:1 sliding window to full attention ratio
  • QK-Reorder-Norm support for custom normalization ordering
  • Conditional RoPE application (skipped for global attention layers)
  • Grouped Query Attention (40 query heads, 8 key-value heads)
  • Full compatibility with ZeRO optimization stages
  • Parameter mapping between HuggingFace and DeepSpeed formats

Implementation includes:

  • ExaoneTransformerContainer and ExaoneNonTransformerContainer for parameter management
  • ExaoneInferenceModel with layer type detection and hybrid attention logic
  • ExaonePolicy for model instantiation and container orchestration
  • Comprehensive unit test suite with 14 test cases
  • Integration with existing DeepSpeed inference v2 architecture

Validated with EXAONE-4.0-32B and EXAONE-4.0-1.2B models from HuggingFace.

@notkisk notkisk force-pushed the feature/exaone-4.0-support branch from 0f7375e to 299d96a Compare July 29, 2025 01:54
@notkisk
Copy link
Copy Markdown
Author

notkisk commented Jul 29, 2025

@hwchen2017 @tohtana @tjruwase @loadams Please take a look!

Comment thread tests/unit/inference/v2/model_implementations/test_exaone.py
@notkisk notkisk force-pushed the feature/exaone-4.0-support branch from d6d4e0e to 11792c2 Compare July 29, 2025 15:55
@notkisk
Copy link
Copy Markdown
Author

notkisk commented Jul 29, 2025

@loadams

@notkisk notkisk requested a review from loadams July 30, 2025 11:44
@notkisk notkisk force-pushed the feature/exaone-4.0-support branch 2 times, most recently from 6663bf8 to 0b346ec Compare August 10, 2025 14:04
@hwchen2017
Copy link
Copy Markdown
Contributor

hwchen2017 commented Aug 12, 2025

Hi @notkisk , I tried to test your code, and get the following error:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/deepspeed/hongwei/test.py", line 23, in <module>
[rank0]:     pipe = pipeline("LGAI-EXAONE/EXAONE-4.0-1.2B")
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/mii/api.py", line 231, in pipeline
[rank0]:     inference_engine = load_model(model_config)
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/mii/modeling/models.py", line 17, in load_model
[rank0]:     inference_engine = build_hf_engine(
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/engine_factory.py", line 142, in build_hf_engine
[rank0]:     return InferenceEngineV2(policy, engine_config)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/engine_v2.py", line 83, in __init__
[rank0]:     self._model = self._policy.build_model(self._config, self._base_mp_group)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 157, in build_model
[rank0]:     self.populate_model_parameters()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 199, in populate_model_parameters
[rank0]:     container_map.map_param(name, parameter)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 78, in map_param
[rank0]:     self._non_transformer_params.set_dependency(name, parameter)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/layer_container_base.py", line 318, in set_dependency
[rank0]:     setattr(target_param, target_dependency_name, dep_value)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/parameter_base.py", line 39, in param_setter
[rank0]:     self.complete_component()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/parameter_base.py", line 164, in complete_component
[rank0]:     finalized_param = self.finalize()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/common_parameters/embedding_parameters.py", line 26, in finalize
[rank0]:     return self.inference_model.transform_embedding_param(self.params)
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 211, in __getattribute__
[rank0]:     return super().__getattribute__(key)
[rank0]: AttributeError: 'Exaone4Config' object has no attribute 'transform_embedding_param'

Can you show me how you verified the code? Also your can contribute the test code to deepspeed example.

map.set_transformer_params(['model.layers'], transformer_containers)

# Create non-transformer container for embedding/output/norm parameters
map.set_non_transformer_params(ExaoneNonTransformerContainer(self._model_config))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that the parameter is supposed to be self.model

- Added @pytest.mark.inference_v2 markers to all test methods in test_exaone.py
- This ensures the tests are included in CI workflow runs for inference v2
- Tests will now run automatically with the nv-a6000.yml workflow

Signed-off-by: notkisk <salahxd99@gmail.com>
@notkisk notkisk force-pushed the feature/exaone-4.0-support branch from 0b346ec to f0fcaf5 Compare August 12, 2025 14:00
@notkisk notkisk marked this pull request as draft August 12, 2025 14:26
tohtana pushed a commit that referenced this pull request Feb 17, 2026
## Summary
Add support for LG AI Research's EXAONE 4.0 model family in DeepSpeed
Inference V2.

Closes #7453

## Changes
- New model implementation:
`deepspeed/inference/v2/model_implementations/exaone4/`
  - `container.py`: Transformer and non-transformer parameter containers
- `model.py`: Inference model with post-norm architecture and QK-Norm
support
  - `policy.py`: Inference V2 policy
- Register EXAONE 4.0 in `engine_factory.py` and `__init__.py`

## Key architectural differences from Mistral/Llama
- **Post-norm**: RMSNorm is applied after attention/MLP outputs (not
before), followed by residual addition
- **QK-Norm**: Per-head RMSNorm applied to Q and K projections after the
QKV linear layer
- **Hybrid attention**: 32B model uses 3:1 sliding window/full attention
ratio (via `layer_types` config)

## Supported models
- [EXAONE-4.0-1.2B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-1.2B)
(all full attention)
- [EXAONE-4.0-32B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B)
(hybrid sliding/full attention)

Requires `transformers >= 4.54.0`.

## Related
- Supersedes #7456 (draft, inactive for 6 months)

---------

Signed-off-by: Bias92 <pewpewplay315@gmail.com>
nathon-lee pushed a commit to nathon-lee/DeepSpeed_woo that referenced this pull request Mar 7, 2026
## Summary
Add support for LG AI Research's EXAONE 4.0 model family in DeepSpeed
Inference V2.

Closes deepspeedai#7453

## Changes
- New model implementation:
`deepspeed/inference/v2/model_implementations/exaone4/`
  - `container.py`: Transformer and non-transformer parameter containers
- `model.py`: Inference model with post-norm architecture and QK-Norm
support
  - `policy.py`: Inference V2 policy
- Register EXAONE 4.0 in `engine_factory.py` and `__init__.py`

## Key architectural differences from Mistral/Llama
- **Post-norm**: RMSNorm is applied after attention/MLP outputs (not
before), followed by residual addition
- **QK-Norm**: Per-head RMSNorm applied to Q and K projections after the
QKV linear layer
- **Hybrid attention**: 32B model uses 3:1 sliding window/full attention
ratio (via `layer_types` config)

## Supported models
- [EXAONE-4.0-1.2B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-1.2B)
(all full attention)
- [EXAONE-4.0-32B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B)
(hybrid sliding/full attention)

Requires `transformers >= 4.54.0`.

## Related
- Supersedes deepspeedai#7456 (draft, inactive for 6 months)

---------

Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: nathon-lee <leejianwoo@gmail.com>
nathon-lee pushed a commit to nathon-lee/DeepSpeed_woo that referenced this pull request Mar 28, 2026
## Summary
Add support for LG AI Research's EXAONE 4.0 model family in DeepSpeed
Inference V2.

Closes deepspeedai#7453

## Changes
- New model implementation:
`deepspeed/inference/v2/model_implementations/exaone4/`
  - `container.py`: Transformer and non-transformer parameter containers
- `model.py`: Inference model with post-norm architecture and QK-Norm
support
  - `policy.py`: Inference V2 policy
- Register EXAONE 4.0 in `engine_factory.py` and `__init__.py`

## Key architectural differences from Mistral/Llama
- **Post-norm**: RMSNorm is applied after attention/MLP outputs (not
before), followed by residual addition
- **QK-Norm**: Per-head RMSNorm applied to Q and K projections after the
QKV linear layer
- **Hybrid attention**: 32B model uses 3:1 sliding window/full attention
ratio (via `layer_types` config)

## Supported models
- [EXAONE-4.0-1.2B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-1.2B)
(all full attention)
- [EXAONE-4.0-32B](https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-32B)
(hybrid sliding/full attention)

Requires `transformers >= 4.54.0`.

## Related
- Supersedes deepspeedai#7456 (draft, inactive for 6 months)

---------

Signed-off-by: Bias92 <pewpewplay315@gmail.com>
Signed-off-by: nathon-lee <leejianwoo@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants