add-qwen3-omni-thinker by SangChengC · Pull Request #1202 · ModelTC/LightLLM

SangChengC · 2026-02-04T13:21:22Z

No description provided.

gemini-code-assist · 2026-02-04T13:22:00Z

Summary of Changes

Hello @SangChengC, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the LightLLM framework by integrating the Qwen3 Omni Thinker model. The changes encompass the addition of core model definitions, specialized components for processing both audio and visual data, and updates to the tokenizer and inference pipeline. These modifications ensure that the new multimodal model can be seamlessly loaded, configured, and utilized for various tasks requiring understanding across different data types.

Highlights

New Model Support: Introduced comprehensive support for the Qwen3 Omni Thinker multimodal model, enabling its integration and functionality within the LightLLM framework.
Multimodal Capabilities: Added dedicated audio and visual processing components, including Qwen3OmniMoeAudioEncoder and Qwen3OmniMoeVisionTransformerPretrainedModel, to handle diverse input modalities.
Custom Tokenization and Inference: Implemented QWen3OmniTokenizer for specialized multimodal tokenization and Qwen3OmniMOEInferStateInfo for managing the model's unique inference state.
Configuration System Updates: Modified configuration utilities and server-side model initialization logic to correctly parse and load model parameters from a new thinker_config structure.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

lightllm/models/init.py
- Added import for Qwen3OmniMOETpPartModel to register the new model.
lightllm/models/qwen2/layer_weights/pre_and_post_layer_weight.py
- Modified Qwen2PreAndPostLayerWeight to initialize lm_head_weight_ using LMHeadWeight, referencing 'thinker.lm_head.weight'.
lightllm/models/qwen2_vl/infer_struct.py
- Added use_image_h parameter to the get_mrope_position function call.
lightllm/models/qwen2_vl/triton_kernel/get_mrope_position_ids.py
- Added use_image_h parameter to get_mrope_position_triton and included conditional logic to modify b_image_thwd.
lightllm/models/qwen3_omni_moe_thinker/audio_process.py
- Added WhisperFeatureExtractor class for audio feature extraction, enabling audio modality processing.
lightllm/models/qwen3_omni_moe_thinker/infer_struct.py
- Added Qwen3OmniMOEInferStateInfo class, inheriting from Qwen3VLInferStateInfo and setting use_image_h to False for specific inference handling.
lightllm/models/qwen3_omni_moe_thinker/layer_infer/transformer_layer_infer.py
- Added Qwen3OmniMOETransformerLayerInfer class, inheriting from Qwen3VLMOETransformerLayerInfer and initializing mrope_section for multimodal Rotary Position Embeddings.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/code2wav_causal_conv_net.py
- Added Qwen3OmniMoeCausalConvNetWeight class for handling causal convolutional network weights.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/code2wav_causal_trans_conv_net.py
- Added Qwen3OmniMoeCode2wavCausalTransConvNetWeight class for handling causal transposed convolutional network weights.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/code2wav_conv_ne_xt.py
- Added Qwen3OmniMoeConvNeXtBlockWeight class for handling ConvNeXt block weights.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/meta_weights/talker_resize_mlp_weight.py
- Added Qwen3OmniMoeTalkerResizeMLPWeight class for handling MLP weights.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/pre_and_post_layer_weight.py
- Added Qwen3OmniMOEThinkerPreAndPostLayerWeight class, including a rename_weight_keys function to adjust weight prefixes during loading.
lightllm/models/qwen3_omni_moe_thinker/layer_weights/transformers_layer_weight.py
- Added Qwen3OmniMOEThinkerTransformerLayerWeight class for transformer layer weights.
lightllm/models/qwen3_omni_moe_thinker/model.py
- Added QWen3OmniTokenizer and Qwen3OmniMOETpPartModel classes, registering the new model and defining its multimodal tokenization logic, including audio token length calculation.
lightllm/models/qwen3_omni_moe_thinker/qwen3_omni_audio.py
- Added Qwen3OmniMoeAudioEncoder and related components (Qwen3OmniMoeAudioEncoderLayer, Qwen3OmniMoeAudioAttention, SinusoidsPositionEmbedding) for detailed audio processing.
lightllm/models/qwen3_omni_moe_thinker/qwen3_omni_visual.py
- Added Qwen3OmniMoeVisionTransformerPretrainedModel and related components (Qwen3OmniMoeVisionMLP, Qwen3OmniMoeVisionPatchEmbed, Qwen3OmniMoeVisionPatchMerger, Qwen3OmniMoeVisionBlock) for detailed visual processing.
lightllm/server/audioserver/model_infer/model_rpc.py
- Modified exposed_init_model to support Qwen3OmniMoeAudioEncoder and adjusted config loading to check for thinker_config.
lightllm/server/tokenizer.py
- Modified get_tokenizer to include QWen3OmniTokenizer when a thinker_config is present in the model configuration.
lightllm/server/visualserver/model_infer/model_rpc.py
- Modified exposed_init_model to support Qwen3OmniMoeVisionTransformerPretrainedModel and adjusted config loading to check for thinker_config.
lightllm/utils/config_utils.py
- Modified _get_config_llm_keyvalue, get_eos_token_ids, and get_vocab_size to correctly retrieve configuration values from the thinker_config section.
- Hardcoded the eos_token_id to [151645] (with a comment indicating potential future refinement).

Activity

The pull request introduces a new multimodal model, Qwen3 Omni Thinker, by adding numerous new files and modifying existing ones to integrate its audio and visual processing capabilities, tokenizer, and inference logic.
Configuration utilities and server-side model initialization have been updated to recognize and load this new model architecture, ensuring proper setup for multimodal inference.
Specific changes include new classes for audio feature extraction, visual patch embedding and merging, and specialized transformer layers, all tailored for the Qwen3 Omni Thinker model.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the qwen3-omni-thinker model. The changes include new model files, modifications to existing server and utility components to handle the new model's configuration and architecture. While the overall direction is correct, I've identified several critical issues, including hardcoded values that can break other models, incorrect return types causing runtime errors, and buggy logic in utility functions. There are also some medium-severity issues like the use of eval and leftover debug print statements. Addressing these points will significantly improve the robustness and maintainability of the code.

gemini-code-assist · 2026-02-04T13:26:06Z

lightllm/models/qwen3_omni_moe_thinker/audio_process.py

+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+
+        is_batched_numpy = isinstance(raw_speech, np.ndarray) and len(raw_speech.shape) > 1
+        if is_batched_numpy and len(raw_speech.shape) > 2:
+            raise ValueError(f"Only mono-channel audio is supported for input to {self}")
+        is_batched = is_batched_numpy or (
+            isinstance(raw_speech, (list, tuple)) and (isinstance(raw_speech[0], (np.ndarray, tuple, list)))
+        )
+
+        if is_batched:
+            raw_speech = [np.asarray([speech], dtype=np.float32).T for speech in raw_speech]
+        elif not is_batched and not isinstance(raw_speech, np.ndarray):
+            raw_speech = np.asarray(raw_speech, dtype=np.float32)
+        elif isinstance(raw_speech, np.ndarray) and raw_speech.dtype is np.dtype(np.float64):
+            raw_speech = raw_speech.astype(np.float32)
+
+        # always return batch
+        if not is_batched:
+            raw_speech = [np.asarray([raw_speech]).T]
+
+        batched_speech = BatchFeature({"input_features": raw_speech})
+
+        # convert into correct format for padding
+
+        padded_inputs = self.pad(
+            batched_speech,
+            padding=padding,
+            max_length=max_length if max_length else self.n_samples,
+            truncation=truncation,
+            pad_to_multiple_of=pad_to_multiple_of,
+            return_attention_mask=return_attention_mask or do_normalize,
+        )
+
+        # zero-mean and unit-variance normalization
+        if do_normalize:
+            padded_inputs["input_features"] = self.zero_mean_unit_var_norm(
+                padded_inputs["input_features"],
+                attention_mask=padded_inputs["attention_mask"],
+                padding_value=self.padding_value,
+            )
+            padded_inputs["input_features"] = np.stack(padded_inputs["input_features"], axis=0)
+
+        # make sure list is in array format
+        input_features = padded_inputs.get("input_features").transpose(2, 0, 1)
+
+        input_features = self._torch_extract_fbank_features(input_features[0], device)
+
+        if isinstance(input_features[0], list):
+            padded_inputs["input_features"] = [np.asarray(feature, dtype=np.float32) for feature in input_features]
+
+        else:
+            padded_inputs["input_features"] = input_features
+
+        if return_attention_mask:
+            # rescale from sample (48000) to feature (3000)
+            rescaled_attention_mask = padded_inputs["attention_mask"][:, :: self.hop_length]
+
+            # The STFT computation produces L//hop_length + 1 frames,
+            # but we skip the last frame (see `_torch_extract_fbank_features`).
+            # This means we need to trim the rescaled attention mask to match
+            # the actual number of frames (L//hop_length) when the input length
+            # is not perfectly divisible by the hop length.
+            if padded_inputs["attention_mask"].shape[1] % self.hop_length != 0:
+                rescaled_attention_mask = rescaled_attention_mask[:, :-1]
+            padded_inputs["attention_mask"] = rescaled_attention_mask
+
+        if return_token_timestamps is not None:
+            padded_inputs["num_frames"] = [len(raw_speech_i) // self.hop_length for raw_speech_i in raw_speech]
+
+        if return_tensors is not None:
+            padded_inputs = padded_inputs.convert_to_tensors(return_tensors)
+
+        return padded_inputs


The _preprocess method is type-hinted to return a Tuple[torch.Tensor, torch.Tensor], but it currently returns a single BatchFeature object. This will cause a ValueError: too many values to unpack at the call site in qwen3_omni_audio.py, which expects two return values. The method should be updated to return the input features and their corresponding lengths as a tuple to match the type hint and the caller's expectation.

if return_tensors is not None: padded_inputs = padded_inputs.convert_to_tensors(return_tensors) lengths = [len(raw_speech_i) // self.hop_length for raw_speech_i in raw_speech] if return_tensors == "pt": lengths = torch.tensor(lengths, device=padded_inputs["input_features"].device) elif return_tensors == "np": lengths = np.array(lengths) if return_token_timestamps is not None: padded_inputs["num_frames"] = lengths return padded_inputs["input_features"], lengths

gemini-code-assist · 2026-02-04T13:26:06Z

lightllm/utils/config_utils.py


 def get_eos_token_ids(model_path: str) -> Optional[List[int]]:
    eos_token_id = _get_config_llm_keyvalue(model_path=model_path, key_name=["eos_token_id"])
+    return [151645]  # 后面看看怎么改? 直接改config.json?


Hardcoding the return value [151645] for get_eos_token_ids is a critical issue. This will break any model that relies on this function to get its correct EOS token ID(s) from the configuration file. The hardcoded line should be removed to restore the original, correct logic of reading from the config.

gemini-code-assist · 2026-02-04T13:26:06Z

lightllm/models/qwen2/layer_weights/pre_and_post_layer_weight.py

+        hidden_size = network_config["hidden_size"]
+        vocab_size = network_config["vocab_size"]
+        tie_word_embeddings = network_config.get("tie_word_embeddings", False)
+        self.lm_head_weight_ = LMHeadWeight(
+            dim=hidden_size,
+            vocab_size=vocab_size,
+            weight_name="thinker.lm_head.weight",
+            data_type=self.data_type_,
+            embedding_weight=self.wte_weight_ if tie_word_embeddings else None,
+        )


The lm_head_weight_ is being overridden with a hardcoded weight name "thinker.lm_head.weight". This change is in the Qwen2PreAndPostLayerWeight class, which may be used by other qwen2 models. Hardcoding a model-specific weight name here can break other models that don't use this weight naming scheme. It's better to make this configurable or handle it within the specific model's weight class (Qwen3OmniMOEThinkerPreAndPostLayerWeight).

gemini-code-assist · 2026-02-04T13:26:06Z

lightllm/utils/config_utils.py

+        if config_json.get("thinker_config") is not None:
+            value = config_json.get("thinker_config", {}).get("text_config").get(key)


The logic to handle thinker_config is flawed. It unconditionally overwrites the value variable if thinker_config exists, even if a value was already found from other keys. This will lead to incorrect behavior. This check should be part of the fallback chain, only executing if the value hasn't been found in other locations.

gemini-code-assist · 2026-02-04T13:26:06Z

lightllm/models/qwen3_omni_moe_thinker/model.py

+            mel_len = chunk_len // 160
+            dilation = 1
+            L_in = mel_len
+            for (padding, kernel_size, stride) in eval("[(1,3,1)] + [(1,3,2)] "):


Using eval() on a string, even if it's a literal, is generally considered unsafe and can be a performance bottleneck. It's better to use the literal value directly.

Suggested change

for (padding, kernel_size, stride) in eval("[(1,3,1)] + [(1,3,2)] "):

for (padding, kernel_size, stride) in [(1, 3, 1), (1, 3, 2)]:

gemini-code-assist · 2026-02-04T13:26:06Z

lightllm/models/qwen3_omni_moe_thinker/model.py

+            all_config = json.load(json_file)
+            self.config = all_config["thinker_config"]["text_config"]
+        # rename keys
+        print(f"self.config is {self.config}")


This print statement seems to be for debugging purposes. It should be removed from the production code.

gemini-code-assist · 2026-02-04T13:26:06Z

lightllm/models/qwen3_omni_moe_thinker/qwen3_omni_audio.py

+                raise ValueError(f"cannot read audio which type is {type(item)}!")
+
+            # padding to min audio len
+            MIN_AUDIO_LEN = 480


The magic number 480 is used multiple times in this method. It's better to define it as a module-level constant to improve readability and maintainability. For example: MIN_AUDIO_LEN = 480 at the top of the file.

gemini-code-assist · 2026-02-04T13:26:06Z

lightllm/models/qwen3_omni_moe_thinker/qwen3_omni_visual.py

+                deepstack_feature_lists.append(deepstack_feature)
+
+        hidden_states = self.merger(hidden_states)
+        print(f"hidden_states is {hidden_states}, deepstack is {deepstack_feature_lists}")


This print statement appears to be for debugging. It should be removed before merging.

add-qwen3-omni-thinker

80b1144

gemini-code-assist bot reviewed Feb 4, 2026

View reviewed changes

wangzaijun and others added 6 commits February 5, 2026 06:41

fix

6b1c2c6

fix qwen3-omni tokenizer

608c27e

fix prelayer.

ca36e85

fix

8a42c69

fix transformer layer weight.

e7656f5

fix mrope

01e3e18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add-qwen3-omni-thinker#1202

add-qwen3-omni-thinker#1202
SangChengC wants to merge 7 commits intomainfrom
add-qwen3-omni-moe

SangChengC commented Feb 4, 2026

Uh oh!

gemini-code-assist bot commented Feb 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if config_json.get("thinker_config") is not None:
		value = config_json.get("thinker_config", {}).get("text_config").get(key)

	for (padding, kernel_size, stride) in eval("[(1,3,1)] + [(1,3,2)] "):
	for (padding, kernel_size, stride) in [(1, 3, 1), (1, 3, 2)]:

Conversation

SangChengC commented Feb 4, 2026

Uh oh!

gemini-code-assist bot commented Feb 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants