Skip to content

feat(iflow): add support for glm-5, minimax-m2.5, qwen3-32b, tstars2.0, iflow-rome-30ba3b#130

Open
MasuRii wants to merge 1 commit intoMirrowel:devfrom
MasuRii:feat/iflow-new-models
Open

feat(iflow): add support for glm-5, minimax-m2.5, qwen3-32b, tstars2.0, iflow-rome-30ba3b#130
MasuRii wants to merge 1 commit intoMirrowel:devfrom
MasuRii:feat/iflow-new-models

Conversation

@MasuRii
Copy link
Contributor

@MasuRii MasuRii commented Feb 15, 2026

Summary

Add support for 5 new iFlow models to maintain feature parity: glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b. Each model is registered with appropriate thinking, reasoning, and preservation configurations.

Changes

  • Add glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b to HARDCODED_MODELS
  • Add glm-5 to ENABLE_THINKING_MODELS and GLM_MODELS sets
  • Add qwen3-32b to ENABLE_THINKING_MODELS set
  • Add minimax-m2.5 to REASONING_SPLIT_MODELS set
  • Update REASONING_PRESERVATION_MODELS_PREFIXES with glm-5 and tstars prefixes
  • Sort HARDCODED_MODELS list alphabetically by model family for consistency

Files Changed

File Type Impact
src/rotator_library/providers/iflow_provider.py Modified Register 5 new models with correct thinking/reasoning configurations

Testing Instructions

  1. Verify each new model appears in the model list endpoint
  2. Test glm-5 — should send enable_thinking + clear_thinking=false (GLM-style)
  3. Test minimax-m2.5 — should use reasoning_split instead of enable_thinking
  4. Test qwen3-32b — should send enable_thinking
  5. Test tstars2.0 and iflow-rome-30ba3b — standard chat completions

Tested right now:
image

Closes #129


Important

Add support for new iFlow models glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b with appropriate configurations in iflow_provider.py.

  • Models:
    • Add glm-5, minimax-m2.5, qwen3-32b, tstars2.0, iflow-rome-30ba3b to HARDCODED_MODELS in iflow_provider.py.
    • Add glm-5 to ENABLE_THINKING_MODELS and GLM_MODELS.
    • Add qwen3-32b to ENABLE_THINKING_MODELS.
    • Add minimax-m2.5 to REASONING_SPLIT_MODELS.
    • Update REASONING_PRESERVATION_MODELS_PREFIXES with glm-5 and tstars prefixes.
  • Misc:
    • Sort HARDCODED_MODELS alphabetically by model family.

This description was created by Ellipsis for 87b894b. You can customize this summary. It will automatically update as commits are pushed.

…0, iflow-rome-30ba3b

Add five new models to the iFlow provider to maintain feature parity
with the competitor repository (router-for-me/CLIProxyAPI):

- glm-5: Added to HARDCODED_MODELS, ENABLE_THINKING_MODELS, and
  GLM_MODELS (thinking support with GLM-style clear_thinking handling)
- minimax-m2.5: Added to HARDCODED_MODELS and REASONING_SPLIT_MODELS
  (thinking support via reasoning_split boolean)
- qwen3-32b: Added to HARDCODED_MODELS and ENABLE_THINKING_MODELS
  (thinking support)
- tstars2.0: Added to HARDCODED_MODELS (multimodal assistant)
- iflow-rome-30ba3b: Added to HARDCODED_MODELS (iFlow Rome model)

Also update REASONING_PRESERVATION_MODELS_PREFIXES to include "glm-5"
and "tstars" prefixes, and alphabetically sort the HARDCODED_MODELS
list for maintainability.

Closes: Mirrowel#129
@MasuRii MasuRii requested a review from Mirrowel as a code owner February 15, 2026 07:37
@mirrobot-agent
Copy link
Contributor

Starting my review of the iFlow provider updates. I'll be checking the configuration for the 5 new models (glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b) to ensure they're correctly registered and integrated. 🔍

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

Nice work on expanding the iFlow model support! Adding these 5 new models (glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b) with their specific thinking and reasoning configurations ensures the provider stays up-to-date with the latest offerings. The internal grouping of models by family in HARDCODED_MODELS is also a great improvement for maintainability.

Architectural Feedback

The approach of using specific sets for different thinking/reasoning behaviors (ENABLE_THINKING_MODELS, GLM_MODELS, REASONING_SPLIT_MODELS) is clean and matches the established patterns in the codebase.

Key Suggestions

  • Reasoning Preservation: I've noted that qwen3-32b (and some existing models like deepseek-v3.2) are missing from the REASONING_PRESERVATION_MODELS_PREFIXES list. If these models return reasoning_content, adding them to this list is necessary for coherent multi-turn conversations.
  • Prefix Simplification: Suggesting a move to more generic prefixes (e.g., glm- instead of version-specific ones) to reduce the need for future updates.

Nitpicks

  • Alphabetical Sorting: The deepseek family is currently slightly out of place in the family-sorted list.

Questions for the Author

  • Does iflow-rome-30ba3b also support any form of thinking or reasoning content, or is it strictly a standard completion model?
  • Are there specific reasons why deepseek and qwen models are excluded from the reasoning preservation cache?

This review was generated by an AI assistant.

"qwen3-max-preview",
"qwen3-235b-a22b-thinking-2507",
"qwen3-vl-plus",
"deepseek-v3.2-reasoner",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deepseek family is currently placed after qwen. If the intention is to sort the list alphabetically by model family (as mentioned in the PR summary), the deepseek group (starting with 'D') should be moved to the top, before glm ('G').

"glm-4.7",
"glm-5",
"qwen3-max-preview",
"qwen3-32b",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since qwen3-32b is configured as a thinking model, it will likely return reasoning_content. You might want to add a "qwen" prefix to REASONING_PRESERVATION_MODELS_PREFIXES (line 106) so that its reasoning is correctly cached and preserved in multi-turn conversations. The same applies to deepseek-v3.2 and other thinking-enabled models that aren't currently in that list.

# Models that benefit from reasoning_content preservation in message history
# (for multi-turn conversations)
REASONING_PRESERVATION_MODELS_PREFIXES = ("glm-4", "minimax-m2")
REASONING_PRESERVATION_MODELS_PREFIXES = ("glm-4", "glm-5", "minimax-m2", "tstars")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this list more robust, consider using broader prefixes where appropriate. For example, "glm-" would cover both glm-4 and glm-5 (and any future versions), while "minimax-" would cover all current and future minimax models.

Copilot AI review requested due to automatic review settings February 15, 2026 12:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for additional iFlow model IDs and updates model-specific “thinking/reasoning” handling, with substantial additional refactoring in the iFlow provider around payload shaping and streaming error handling.

Changes:

  • Register 5 new iFlow models and update model capability sets (thinking, GLM handling, reasoning_split, reasoning preservation prefixes).
  • Expand iFlow provider request/stream handling (suffix-based thinking config parsing, image normalization + fallback logic, gzip handling, token usage estimation).
  • Harden embedding cost calculation to handle missing usage/cost metadata.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
src/rotator_library/providers/iflow_provider.py Adds new model IDs and significantly refactors request building + streaming handling (thinking config, image handling, gzip, token estimation).
src/rotator_library/client/executor.py Makes embedding cost calculation resilient to usage=None and input_cost_per_token being None or 0.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1538 to 1542
# Ensure max_tokens is at least 1024 as recommended by iflow2api logic
if "max_tokens" in payload:
payload["max_tokens"] = max(payload["max_tokens"], 1024)
else:
# Default to 4096 if not specified
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forces max_tokens to be at least 1024 and defaults it to 4096 when the caller didn’t specify it. That changes the meaning of requests (e.g., callers intentionally limiting max_tokens below 1024) and can increase latency/cost unexpectedly. Consider only applying a default when max_tokens is missing, and avoid overriding user-provided values (or gate the minimum behind a config flag).

Suggested change
# Ensure max_tokens is at least 1024 as recommended by iflow2api logic
if "max_tokens" in payload:
payload["max_tokens"] = max(payload["max_tokens"], 1024)
else:
# Default to 4096 if not specified
# Default max_tokens if not specified; do not override user-provided values
if "max_tokens" not in payload:

Copilot uses AI. Check for mistakes.
Comment on lines 1230 to 1310
async def _generate_vision_summary(
self,
client: httpx.AsyncClient,
api_key: str,
api_base: str,
body: Dict[str, Any],
model: str,
) -> str:
"""
Generate a text summary of images using a vision model.
This is used for two-stage vision processing where GLM/MiniMax models
don't support images directly, so we first use a vision model to
describe the images, then pass that description to the text model.
Args:
client: HTTP client to use
api_key: API key for iFlow
api_base: Base URL for iFlow API
body: Original request body with images
model: Target model name (for context in prompt)
Returns:
Text summary of the images
"""
vision_body = copy.deepcopy(body)
vision_body["stream"] = False
vision_body.pop("stream_options", None)
vision_body.pop("tools", None)
vision_body.pop("tool_choice", None)
vision_body.pop("response_format", None)
vision_body.pop("thinking", None)
vision_body.pop("reasoning_effort", None)
vision_body["max_tokens"] = 1024

messages = vision_body.get("messages", [])
if not isinstance(messages, list):
return ""

# Add instruction for vision model
instruction = (
f"You are an image analysis assistant. The current model is {model}, "
"which does not process images. Please analyze the user-uploaded images "
"and output a structured summary for the text model to continue reasoning. "
"Must include: 1) Main subject and scene; 2) Text in the image; "
"3) Key details relevant to the user's question. "
"Do not say you cannot see the image. Be concise."
)
vision_body["messages"] = [
{"role": "system", "content": instruction}
] + messages
vision_body["model"] = FORCED_VISION_MODEL

# Build headers
headers = self._build_iflow_headers(api_key, stream=False)
url = f"{api_base.rstrip('/')}/chat/completions"

try:
response = await client.post(
url,
headers=headers,
json=vision_body,
timeout=TimeoutConfig.non_streaming(),
)
response.raise_for_status()

# Handle potential gzip response
content = response.content
if response.headers.get("content-encoding") == "gzip":
content = gzip.decompress(content)

result = json.loads(content)
return extract_text_from_result(result)

except (httpx.HTTPStatusError, httpx.RequestError, httpx.TimeoutException) as e:
lib_logger.warning(f"[iFlow] Vision summary generation failed: {e}")
return ""
except (json.JSONDecodeError, gzip.BadGzipFile) as e:
lib_logger.warning(f"[iFlow] Vision summary parsing failed: {e}")
return ""

Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_generate_vision_summary() is currently unused (no call sites), and there’s also _build_two_stage_main_body() below. Keeping large unused async + HTTP logic in the provider increases maintenance burden and makes it harder to reason about the real behavior. Either wire these into the vision fallback flow (two-stage summary) or remove them until needed.

Suggested change
async def _generate_vision_summary(
self,
client: httpx.AsyncClient,
api_key: str,
api_base: str,
body: Dict[str, Any],
model: str,
) -> str:
"""
Generate a text summary of images using a vision model.
This is used for two-stage vision processing where GLM/MiniMax models
don't support images directly, so we first use a vision model to
describe the images, then pass that description to the text model.
Args:
client: HTTP client to use
api_key: API key for iFlow
api_base: Base URL for iFlow API
body: Original request body with images
model: Target model name (for context in prompt)
Returns:
Text summary of the images
"""
vision_body = copy.deepcopy(body)
vision_body["stream"] = False
vision_body.pop("stream_options", None)
vision_body.pop("tools", None)
vision_body.pop("tool_choice", None)
vision_body.pop("response_format", None)
vision_body.pop("thinking", None)
vision_body.pop("reasoning_effort", None)
vision_body["max_tokens"] = 1024
messages = vision_body.get("messages", [])
if not isinstance(messages, list):
return ""
# Add instruction for vision model
instruction = (
f"You are an image analysis assistant. The current model is {model}, "
"which does not process images. Please analyze the user-uploaded images "
"and output a structured summary for the text model to continue reasoning. "
"Must include: 1) Main subject and scene; 2) Text in the image; "
"3) Key details relevant to the user's question. "
"Do not say you cannot see the image. Be concise."
)
vision_body["messages"] = [
{"role": "system", "content": instruction}
] + messages
vision_body["model"] = FORCED_VISION_MODEL
# Build headers
headers = self._build_iflow_headers(api_key, stream=False)
url = f"{api_base.rstrip('/')}/chat/completions"
try:
response = await client.post(
url,
headers=headers,
json=vision_body,
timeout=TimeoutConfig.non_streaming(),
)
response.raise_for_status()
# Handle potential gzip response
content = response.content
if response.headers.get("content-encoding") == "gzip":
content = gzip.decompress(content)
result = json.loads(content)
return extract_text_from_result(result)
except (httpx.HTTPStatusError, httpx.RequestError, httpx.TimeoutException) as e:
lib_logger.warning(f"[iFlow] Vision summary generation failed: {e}")
return ""
except (json.JSONDecodeError, gzip.BadGzipFile) as e:
lib_logger.warning(f"[iFlow] Vision summary parsing failed: {e}")
return ""

Copilot uses AI. Check for mistakes.
Comment on lines 1411 to 1436
async def _make_request_with_retry(
self,
client: httpx.AsyncClient,
method: str,
url: str,
headers: Dict[str, str],
json_body: Dict[str, Any],
max_retries: int = MAX_RETRIES,
) -> httpx.Response:
"""
Make an HTTP request with automatic retry for transient failures.
Retry conditions:
- 5xx server errors
- 429 rate limits (with Retry-After header)
- Network errors
Does NOT retry:
- 4xx client errors (except 429)
- Authentication errors
"""
if max_retries < 1:
raise ValueError(f"max_retries must be at least 1, got {max_retries}")
last_error: Optional[Exception] = None

for attempt in range(max_retries):
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_make_request_with_retry() is unused, and last_error is assigned but never read. If this retry helper isn’t going to be integrated, consider removing it to avoid dead code; otherwise, wire it into the request path and either use last_error (e.g., for a final raised exception with context) or drop it.

Copilot uses AI. Check for mistakes.
Comment on lines 30 to 35
from litellm.exceptions import RateLimitError, AuthenticationError
from pathlib import Path

from ..core.errors import StreamedAPIError
import uuid
from datetime import datetime
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are unused imports after these changes (AuthenticationError and datetime don’t appear to be referenced in this module). Removing unused imports will avoid lint warnings and keep the provider easier to scan.

Suggested change
from litellm.exceptions import RateLimitError, AuthenticationError
from pathlib import Path
from ..core.errors import StreamedAPIError
import uuid
from datetime import datetime
from litellm.exceptions import RateLimitError
from pathlib import Path
from ..core.errors import StreamedAPIError
import uuid

Copilot uses AI. Check for mistakes.
Comment on lines 2038 to 2051
# If vision fallback, strip images from payload and add vision summary context
if vision_fallback:
messages = payload.get("messages", [])
if messages:
sanitized = strip_images_from_messages(messages)
bridge_msg = {
"role": "system",
"content": (
"The original model does not support images. "
"Please continue with the text content only."
),
}
payload["messages"] = [bridge_msg] + sanitized

Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vision-fallback path switches effective_model to FORCED_VISION_MODEL but then strips images from payload["messages"]. This prevents the vision model from actually receiving the image inputs, so the fallback cannot succeed for image requests. Either keep the images when calling the forced vision model, or implement the intended two-stage flow (vision summary with images -> text model with images stripped + injected summary).

Suggested change
# If vision fallback, strip images from payload and add vision summary context
if vision_fallback:
messages = payload.get("messages", [])
if messages:
sanitized = strip_images_from_messages(messages)
bridge_msg = {
"role": "system",
"content": (
"The original model does not support images. "
"Please continue with the text content only."
),
}
payload["messages"] = [bridge_msg] + sanitized

Copilot uses AI. Check for mistakes.
Comment on lines 2026 to 2036
# Parse suffix from model name for thinking config
suffix_result = parse_suffix(model_name)
base_model = suffix_result.model_name

# Handle vision fallback
effective_model = FORCED_VISION_MODEL if vision_fallback else base_model

kwargs_with_model = {**kwargs, "model": effective_model}
payload = self._build_request_payload(
model_name, kwargs, **kwargs_with_stripped_model
effective_model, kwargs, **kwargs_with_model
)
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model-name suffix parsing for thinking config is effectively disabled here: parse_suffix(model_name) is used to compute base_model, and then effective_model is always set to base_model (unless vision fallback). That strips the suffix before _build_request_payload() / _apply_thinking_config() runs, so suffix-driven configs like glm-4.7(high) can never take effect. Pass the original model name (including suffix) into _build_request_payload() and let _apply_thinking_config() strip it, or delay suffix stripping until after config extraction.

Copilot uses AI. Check for mistakes.
file_logger.log_error(error_msg)
raise httpx.HTTPStatusError(
f"HTTP {response.status_code}: {error_text}",
request=response,
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This httpx.HTTPStatusError is constructed with request=response, but request should be an httpx.Request (usually response.request). Passing the Response object can break downstream error handling/formatting that expects a Request. Use request=response.request here (consistent with the earlier construction above).

Suggested change
request=response,
request=response.request,

Copilot uses AI. Check for mistakes.
Comment on lines 912 to 925
effort = kwargs.get(IFLOW_REASONING_EFFORT)
if not effort:
return None

effort_lower = str(effort).strip().lower()

if effort_lower == "none":
return ThinkingConfig(mode=ThinkingMode.NONE, budget=0)

try:
level = ThinkingLevel(effort_lower)
return ThinkingConfig(mode=ThinkingMode.LEVEL, level=level)
except ValueError:
return None
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_extract_openai_config_from_kwargs() treats falsy values as "no config" (if not effort: return None). This means reasoning_effort=0 (int) or strings like "0"/"false"/"disabled" won’t explicitly disable thinking anymore. If you want backward-compatible disable semantics (as the previous logic supported), handle these values explicitly instead of short-circuiting on truthiness.

Copilot uses AI. Check for mistakes.
@MasuRii MasuRii force-pushed the feat/iflow-new-models branch from 9817625 to 87b894b Compare February 15, 2026 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant