Skip to content

Conversation

@yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Dec 8, 2025

Purpose

Intend to fix #29999 (comment)

and #29999 (comment)

But after discussion with @ProExpertProg and @bnellnm , we found that require a must-pass param needs to update everything especially for tests, we decide to instead using a moe_parallel_config

This PR could also fix current issue in main

(Worker_TP2_EP2 pid=248418) WARNING 12-10 13:35:58 [vllm.py:1394] Current vLLM config is not set.
(Worker_TP2_EP2 pid=248418) INFO 12-10 13:35:58 [scheduler.py:228] Chunked prefill is enabled with max_num_batched_tokens=2048.
(Worker_TP4_EP4 pid=248420) WARNING 12-10 13:35:58 [vllm.py:1394] Current vLLM config is not set.
(Worker_TP4_EP4 pid=248420) INFO 12-10 13:35:58 [scheduler.py:228] Chunked prefill is enabled with max_num_batched_tokens=2048.
(Worker_TP3_EP3 pid=248419) WARNING 12-10 13:35:58 [vllm.py:1394] Current vLLM config is not set.
(Worker_TP3_EP3 pid=248419) INFO 12-10 13:35:58 [scheduler.py:228] Chunked prefill is enabled with max_num_batched_tokens=2048.
(Worker_TP5_EP5 pid=248421) WARNING 12-10 13:35:58 [vllm.py:1394] Current vLLM config is not set.
(Worker_TP5_EP5 pid=248421) INFO 12-10 13:35:58 [scheduler.py:228] Chunked prefill is enabled with max_num_batched_tokens=2048.
(Worker_TP0_EP0 pid=248416) WARNING 12-10 13:35:58 [vllm.py:1394] Current vLLM config is not set.
(Worker_TP0_EP0 pid=248416) INFO 12-10 13:35:58 [scheduler.py:228] Chunked prefill is enabled with max_num_batched_tokens=2048.
(Worker_TP6_EP6 pid=248422) WARNING 12-10 13:35:58 [vllm.py:1394] Current vLLM config is not set.
(Worker_TP6_EP6 pid=248422) INFO 12-10 13:35:58 [scheduler.py:228] Chunked prefill is enabled with max_num_batched_tokens=2

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a helpful comment to vllm/model_executor/layers/fused_moe/modular_kernel.py clarifying the logic for handling the parallel_config parameter in FusedMoEModularKernel. The comment explains that while explicit passing is preferred, a fallback to the current vLLM config exists for testing purposes. This improves code clarity and maintainability. The change is correct and I have no further suggestions.

@yewentao256 yewentao256 marked this pull request as draft December 9, 2025 00:54
@yewentao256 yewentao256 changed the title [Small] Add comment for parallel_config in FusedMoEModularKernel [Feat] Refactor for parallel_config in FusedMoEModularKernel Dec 10, 2025
@mergify mergify bot added the nvidia label Dec 10, 2025
@yewentao256 yewentao256 marked this pull request as ready for review December 10, 2025 20:22
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@yewentao256
Copy link
Member Author

@bnellnm @ProExpertProg CC

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 10, 2025
@yewentao256 yewentao256 requested a review from tjtanaa as a code owner December 10, 2025 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants