add-qwen3-omni-thinker by hiworldwzj · Pull Request #1208 · ModelTC/LightLLM

hiworldwzj · 2026-02-06T09:13:28Z

No description provided.

gemini-code-assist · 2026-02-06T09:14:04Z

Summary of Changes

Hello @hiworldwzj, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the LightLLM framework's capabilities by integrating the Qwen3OmniMOETpPartModel, a new multimodal model. It introduces a complete set of components for handling both audio and visual inputs, from feature extraction and processing to their integration within the model's transformer layers. The changes ensure that the new model is seamlessly supported, including its unique configuration, tokenizer, and specialized weight loading mechanisms.

Highlights

New Model Integration: Introduced comprehensive support for the Qwen3OmniMOETpPartModel, a new multimodal model, by adding its core architecture, inference structure, and specific layer weights.
Multimodal Processing Enhancements: Added dedicated modules for advanced audio feature extraction (WhisperFeatureExtractor) and sophisticated vision processing (e.g., Qwen3OmniMoeVisionTransformerPretrainedModel with patch embedding, merging, and attention blocks).
System Integration and Configuration: Updated the model registry, tokenizer instantiation logic, and configuration utilities to correctly recognize and utilize the new Qwen3Omni model and its associated thinker_config.
Core Logic Refinements: Adjusted the base layer weight verification for safer attribute access and modified the transformer layer inference to handle QKV projections specific to the new model's requirements.
Dependency Update: The transformers library dependency has been updated to version 4.57.1.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

lightllm/common/basemodel/layer_weights/base_layer_weight.py
- Modified the verify_load method to safely access the layer_num_ attribute, preventing potential AttributeError when the attribute might not be present.
lightllm/models/init.py
- Added import for Qwen3OmniMOETpPartModel to register the new model within the LightLLM framework.
lightllm/models/qwen2_vl/triton_kernel/get_mrope_position_ids.py
- Commented out image_h and updated the calculation for h_pos to use image_w, indicating a change in how image position IDs are derived, possibly for specific image aspect ratios or processing.
lightllm/models/qwen3_omni_moe_thinker/audio_process.py
- Added WhisperFeatureExtractor class, providing functionality for audio feature extraction, including mel filter bank application, spectrogram generation, and normalization.
lightllm/models/qwen3_omni_moe_thinker/infer_struct.py
- Introduced Qwen3OmniMOEInferStateInfo class, inheriting from Qwen3VLInferStateInfo, to define the inference state for the new model.
lightllm/models/qwen3_omni_moe_thinker/layer_infer/transformer_layer_infer.py
- Added Qwen3OmniMOETransformerLayerInfer class, extending Qwen3VLMOETransformerLayerInfer, and initialized head_dim_ and mrope_section from network configuration.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/code2wav_causal_conv_net.py
- Added Qwen3OmniMoeCausalConvNetWeight class to handle weights for causal 1D convolution operations, including weight creation, loading from Hugging Face, and native/CUDA forward passes.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/code2wav_causal_trans_conv_net.py
- Added Qwen3OmniMoeCode2wavCausalTransConvNetWeight class to manage weights for causal 1D transposed convolution operations.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/code2wav_conv_ne_xt.py
- Added Qwen3OmniMoeConvNeXtBlockWeight class for ConvNeXt block weights, encompassing normalization, linear transformations, and GELU activation.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/talker_resize_mlp_weight.py
- Added Qwen3OmniMoeTalkerResizeMLPWeight class to manage weights for MLP-based resizing operations.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/pre_and_post_layer_weight.py
- Added Qwen3OmniMOEThinkerPreAndPostLayerWeight class, extending Qwen2PreAndPostLayerWeight with specific weight names for the Qwen3Omni model's embedding, LM head, and final normalization layers.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/transformers_layer_weight.py
- Added Qwen3OmniMOEThinkerTransformerLayerWeight class, extending Qwen3MOETransformerLayerWeight with specific weight names for attention and MoE layers, and customized MoE initialization.
lightllm/models/qwen3_omni_moe_thinker/model.py
- Introduced QWen3OmniTokenizer and Qwen3OmniMOETpPartModel classes, integrating the new model and its tokenizer, including audio token length calculation and multimodal encoding logic.
- Modified _init_config to load configuration from thinker_config within the model's config.json.
lightllm/models/qwen3_omni_moe_thinker/qwen3_omni_audio.py
- Added Qwen3OmniMoeAudioEncoderLayer, Qwen3OmniMoeAudioAttention, SinusoidsPositionEmbedding, and Qwen3OmniMoeAudioEncoder classes, defining the architecture for the audio encoder, including convolutional layers, positional embeddings, and attention mechanisms.
- Implemented encode method for processing audio items and caching their embeddings.
lightllm/models/qwen3_omni_moe_thinker/qwen3_omni_visual.py
- Added Qwen3OmniMoeVisionMLP, Qwen3OmniMoeVisionPatchEmbed, Qwen3OmniMoeVisionPatchMerger, Qwen3OmniMoeVisionBlock, and Qwen3OmniMoeVisionTransformerPretrainedModel classes, defining the architecture for the vision encoder, including patch embedding, positional embeddings, attention blocks, and deepstack feature handling.
- Implemented encode method for processing image items and returning their embeddings and deepstack features.
lightllm/models/qwen3_vl_moe/layer_infer/transformer_layer_infer.py
- Modified the _get_qkv method to split the qkv projection into q and cache_kv components, optimizing how query and key-value projections are handled.
lightllm/server/audioserver/model_infer/model_rpc.py
- Imported Qwen3OmniMoeAudioEncoder and updated the exposed_init_model method to support the new audio encoder type, checking for thinker_config in the model configuration.
lightllm/server/embed_cache/embed_cache_client.py
- Added copy_vision_to_cache method, specifically designed to handle vision embedding tensors, including a check for 3D tensors for deepstack processing.
lightllm/server/tokenizer.py
- Imported QWen3OmniTokenizer and updated the get_tokenizer function to instantiate it when a thinker_config is detected in the model configuration.
lightllm/server/visualserver/model_infer/model_rpc.py
- Imported Qwen3OmniMoeVisionTransformerPretrainedModel and updated the exposed_init_model method to support the new vision encoder type, checking for thinker_config.
- Changed the call from self.cpu_embed_cache_client.copy_to_cache to self.cpu_embed_cache_client.copy_vision_to_cache for vision embeddings.
lightllm/utils/config_utils.py
- Modified _get_config_llm_keyvalue to prioritize thinker_config when present for retrieving LLM configuration values.
- Added special handling for eos_token_id for models with Qwen3OmniMoeForConditionalGeneration architecture, setting it to 151645.
- Updated get_vocab_size to check for thinker_config when determining the vocabulary size.
lightllm/utils/embed_utils.py
- Added Qwen3OmniMOETpPartModel to the list of models for which embed_cache_meta is calculated, ensuring proper cache management for the new model.
requirements.txt
- Updated the transformers library version from 4.53.3 to 4.57.1.

Activity

The pull request introduces a new multimodal model, Qwen3OmniMOE, by adding numerous new Python files that define its audio and vision processing components, inference structure, and layer weights.
Existing core files across the lightllm project, including model registry, tokenizer, and configuration utilities, have been modified to integrate and support this new model.
A dependency update for the transformers library is included, reflecting potential new features or compatibility requirements for the Qwen3OmniMOE model.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the qwen3-omni-thinker model, a significant undertaking that adds numerous files for the new model's architecture, including its audio and visual components. While the overall structure is sound, I've identified several areas for improvement. My review includes a critical bug fix in configuration handling, high-severity issues related to type correctness and device handling, and several medium-severity suggestions to enhance code clarity, remove dead code, and align with best practices. Addressing these points will improve the robustness and maintainability of the new model's integration.

I am having trouble creating individual review comments. Click here to see my feedback.

lightllm/utils/config_utils.py (28-29)

There is a potential bug in the config retrieval logic. If thinker_config exists but does not contain the requested key, value will be overwritten to None, potentially discarding a valid value found in a higher-level config. The logic should be revised to prioritize thinker_config without incorrectly nullifying a previously found value.

lightllm/models/qwen3_omni_moe_thinker/qwen3_omni_audio.py (85-87)

The method is type-hinted to return a torch.Tensor, but it returns a tuple (hidden_states,). To match the type hint and expected behavior, you should return hidden_states directly.

        return hidden_states

lightllm/models/qwen3_omni_moe_thinker/model.py (111)

The first argument to ValueError is a boolean expression image_cnt == image_id, which will result in an error message like (False, 'invalid image tag num: ...'). This is likely not the intended behavior. The error message should probably be just the formatted string.

                raise ValueError(f"invalid image tag num: {image_cnt} vs {image_id}!")

lightllm/models/qwen3_omni_moe_thinker/audio_process.py (179-184)

The device is hardcoded to "cuda" when converting the final tensors, but the method signature includes a device parameter that defaults to "cpu". This inconsistency can lead to unexpected behavior. The device parameter should be used to allow for flexibility in device placement.

        input_features = torch.from_numpy(np.asarray(padded_inputs["input_features"], dtype=np.float32)).to(
            device=device, dtype=torch.bfloat16
        )
        attention_mask = torch.from_numpy(np.asarray(padded_inputs["attention_mask"], dtype=np.float32)).to(
            device=device, dtype=torch.int32
        )

lightllm/models/qwen3_omni_moe_thinker/model.py (139)

Similar to the image tag check, the first argument to ValueError here is a boolean, which is likely not intended. The error message should probably be just the formatted string.

                raise ValueError(f"invalid audio tag num: {audio_cnt} vs {audio_id}!")

lightllm/models/qwen3_omni_moe_thinker/model.py (165)

This print statement appears to be for debugging purposes. It should be removed or replaced with a proper logging call (e.g., logger.debug(...)) for production code.

lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/code2wav_conv_ne_xt.py (123)

It's a best practice to avoid using names of built-in functions like input as variable names. This can cause confusion and potential bugs. Consider renaming it to something like residual here and on line 140.

        residual = hidden_states

lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/code2wav_causal_trans_conv_net.py (34)

The assignment to pad is redundant because the variable pad is not used afterward. This can be simplified.

        self.right_pad = self.left_pad

lightllm/models/qwen3_omni_moe_thinker/layer_infer/transformer_layer_infer.py (15)

The return statement at the end of an __init__ method is redundant as it implicitly returns None. It can be removed for cleaner code.

lightllm/common/basemodel/layer_weights/base_layer_weight.py (36-40)

The logic to safely access self.layer_num_ can be simplified by using getattr with a default value. This is more concise and idiomatic Python.

                layer_num = getattr(self, "layer_num_", None)
                assert attr.verify_load(), f"Loading {attr_name} of layers {layer_num} fails."

lightllm/models/qwen3_omni_moe_thinker/qwen3_omni_visual.py (369)

This print statement seems to be for debugging. It can produce a large amount of output and should be removed or replaced with a proper logger call for production code.

lightllm/server/audioserver/model_infer/model_rpc.py (34)

The model's data type is hardcoded to bfloat16, which ignores the data_type provided in kvargs. For consistency and flexibility, consider converting the model to the specified data type from the arguments.

lightllm/server/visualserver/model_infer/model_rpc.py (88-92)

The model's data type is hardcoded to bfloat16, which ignores the data_type from kvargs. It would be more robust to use the provided data_type to set the model's precision.

lightllm/models/qwen2_vl/triton_kernel/get_mrope_position_ids.py (31)

This line is commented out and appears to be unused. To improve code clarity, it's best to remove such dead code.

lightllm/utils/config_utils.py (87-88)

Using a bare except: is generally discouraged as it can hide unexpected errors. It's better to catch only the specific exceptions you anticipate, such as KeyError, IndexError, or AssertionError.

    except (KeyError, IndexError, AssertionError):
        pass

sangchengmeng and others added 18 commits February 6, 2026 02:44

add-qwen3-omni-thinker

f2b9e1c

fix

d8cc7b2

fix qwen3-omni tokenizer

f50c20a

fix prelayer.

2cc5d01

fix

c21060d

fix transformer layer weight.

1ebec5a

fix mrope

bd24b7e

Fix mrope

68de2b3

fix cpu cache impl.

886306d

add embed cache vision function

e988b20

fix0205

74ae14c

add-audio

169fe84

add-audio

d0ed9d4

add-audio

753e13f

fix

910a3f2

fix requirements.

31381cc

fix.

94d3eaa

fix audio token num calcu

f17f68c

rm

66c9371

gemini-code-assist bot reviewed Feb 6, 2026

View reviewed changes

wangzaijun added 2 commits February 6, 2026 09:29

fix all

650d8f9

fix

7d30307

hiworldwzj merged commit 8af3048 into main Feb 6, 2026
1 check passed

hiworldwzj deleted the wzj_omni branch February 6, 2026 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add-qwen3-omni-thinker#1208

add-qwen3-omni-thinker#1208
hiworldwzj merged 21 commits intomainfrom
wzj_omni

hiworldwzj commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hiworldwzj commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

lightllm/utils/config_utils.py (28-29)

lightllm/models/qwen3_omni_moe_thinker/qwen3_omni_audio.py (85-87)

lightllm/models/qwen3_omni_moe_thinker/model.py (111)

lightllm/models/qwen3_omni_moe_thinker/audio_process.py (179-184)

lightllm/models/qwen3_omni_moe_thinker/model.py (139)

lightllm/models/qwen3_omni_moe_thinker/model.py (165)

lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/code2wav_conv_ne_xt.py (123)

lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/code2wav_causal_trans_conv_net.py (34)

lightllm/models/qwen3_omni_moe_thinker/layer_infer/transformer_layer_infer.py (15)

lightllm/common/basemodel/layer_weights/base_layer_weight.py (36-40)

lightllm/models/qwen3_omni_moe_thinker/qwen3_omni_visual.py (369)

lightllm/server/audioserver/model_infer/model_rpc.py (34)

lightllm/server/visualserver/model_infer/model_rpc.py (88-92)

lightllm/models/qwen2_vl/triton_kernel/get_mrope_position_ids.py (31)

lightllm/utils/config_utils.py (87-88)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants