feat(iflow): add support for glm-5, minimax-m2.5, qwen3-32b, tstars2.0, iflow-rome-30ba3b#130
feat(iflow): add support for glm-5, minimax-m2.5, qwen3-32b, tstars2.0, iflow-rome-30ba3b#130MasuRii wants to merge 1 commit intoMirrowel:devfrom
Conversation
…0, iflow-rome-30ba3b Add five new models to the iFlow provider to maintain feature parity with the competitor repository (router-for-me/CLIProxyAPI): - glm-5: Added to HARDCODED_MODELS, ENABLE_THINKING_MODELS, and GLM_MODELS (thinking support with GLM-style clear_thinking handling) - minimax-m2.5: Added to HARDCODED_MODELS and REASONING_SPLIT_MODELS (thinking support via reasoning_split boolean) - qwen3-32b: Added to HARDCODED_MODELS and ENABLE_THINKING_MODELS (thinking support) - tstars2.0: Added to HARDCODED_MODELS (multimodal assistant) - iflow-rome-30ba3b: Added to HARDCODED_MODELS (iFlow Rome model) Also update REASONING_PRESERVATION_MODELS_PREFIXES to include "glm-5" and "tstars" prefixes, and alphabetically sort the HARDCODED_MODELS list for maintainability. Closes: Mirrowel#129
|
Starting my review of the iFlow provider updates. I'll be checking the configuration for the 5 new models (glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b) to ensure they're correctly registered and integrated. 🔍 |
There was a problem hiding this comment.
Overall Assessment
Nice work on expanding the iFlow model support! Adding these 5 new models (glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b) with their specific thinking and reasoning configurations ensures the provider stays up-to-date with the latest offerings. The internal grouping of models by family in HARDCODED_MODELS is also a great improvement for maintainability.
Architectural Feedback
The approach of using specific sets for different thinking/reasoning behaviors (ENABLE_THINKING_MODELS, GLM_MODELS, REASONING_SPLIT_MODELS) is clean and matches the established patterns in the codebase.
Key Suggestions
- Reasoning Preservation: I've noted that
qwen3-32b(and some existing models likedeepseek-v3.2) are missing from theREASONING_PRESERVATION_MODELS_PREFIXESlist. If these models returnreasoning_content, adding them to this list is necessary for coherent multi-turn conversations. - Prefix Simplification: Suggesting a move to more generic prefixes (e.g.,
glm-instead of version-specific ones) to reduce the need for future updates.
Nitpicks
- Alphabetical Sorting: The
deepseekfamily is currently slightly out of place in the family-sorted list.
Questions for the Author
- Does
iflow-rome-30ba3balso support any form of thinking or reasoning content, or is it strictly a standard completion model? - Are there specific reasons why
deepseekandqwenmodels are excluded from the reasoning preservation cache?
This review was generated by an AI assistant.
| "qwen3-max-preview", | ||
| "qwen3-235b-a22b-thinking-2507", | ||
| "qwen3-vl-plus", | ||
| "deepseek-v3.2-reasoner", |
There was a problem hiding this comment.
The deepseek family is currently placed after qwen. If the intention is to sort the list alphabetically by model family (as mentioned in the PR summary), the deepseek group (starting with 'D') should be moved to the top, before glm ('G').
| "glm-4.7", | ||
| "glm-5", | ||
| "qwen3-max-preview", | ||
| "qwen3-32b", |
There was a problem hiding this comment.
Since qwen3-32b is configured as a thinking model, it will likely return reasoning_content. You might want to add a "qwen" prefix to REASONING_PRESERVATION_MODELS_PREFIXES (line 106) so that its reasoning is correctly cached and preserved in multi-turn conversations. The same applies to deepseek-v3.2 and other thinking-enabled models that aren't currently in that list.
| # Models that benefit from reasoning_content preservation in message history | ||
| # (for multi-turn conversations) | ||
| REASONING_PRESERVATION_MODELS_PREFIXES = ("glm-4", "minimax-m2") | ||
| REASONING_PRESERVATION_MODELS_PREFIXES = ("glm-4", "glm-5", "minimax-m2", "tstars") |
There was a problem hiding this comment.
To make this list more robust, consider using broader prefixes where appropriate. For example, "glm-" would cover both glm-4 and glm-5 (and any future versions), while "minimax-" would cover all current and future minimax models.
There was a problem hiding this comment.
Pull request overview
Adds support for additional iFlow model IDs and updates model-specific “thinking/reasoning” handling, with substantial additional refactoring in the iFlow provider around payload shaping and streaming error handling.
Changes:
- Register 5 new iFlow models and update model capability sets (thinking, GLM handling, reasoning_split, reasoning preservation prefixes).
- Expand iFlow provider request/stream handling (suffix-based thinking config parsing, image normalization + fallback logic, gzip handling, token usage estimation).
- Harden embedding cost calculation to handle missing usage/cost metadata.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
src/rotator_library/providers/iflow_provider.py |
Adds new model IDs and significantly refactors request building + streaming handling (thinking config, image handling, gzip, token estimation). |
src/rotator_library/client/executor.py |
Makes embedding cost calculation resilient to usage=None and input_cost_per_token being None or 0.0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Ensure max_tokens is at least 1024 as recommended by iflow2api logic | ||
| if "max_tokens" in payload: | ||
| payload["max_tokens"] = max(payload["max_tokens"], 1024) | ||
| else: | ||
| # Default to 4096 if not specified |
There was a problem hiding this comment.
This forces max_tokens to be at least 1024 and defaults it to 4096 when the caller didn’t specify it. That changes the meaning of requests (e.g., callers intentionally limiting max_tokens below 1024) and can increase latency/cost unexpectedly. Consider only applying a default when max_tokens is missing, and avoid overriding user-provided values (or gate the minimum behind a config flag).
| # Ensure max_tokens is at least 1024 as recommended by iflow2api logic | |
| if "max_tokens" in payload: | |
| payload["max_tokens"] = max(payload["max_tokens"], 1024) | |
| else: | |
| # Default to 4096 if not specified | |
| # Default max_tokens if not specified; do not override user-provided values | |
| if "max_tokens" not in payload: |
| async def _generate_vision_summary( | ||
| self, | ||
| client: httpx.AsyncClient, | ||
| api_key: str, | ||
| api_base: str, | ||
| body: Dict[str, Any], | ||
| model: str, | ||
| ) -> str: | ||
| """ | ||
| Generate a text summary of images using a vision model. | ||
| This is used for two-stage vision processing where GLM/MiniMax models | ||
| don't support images directly, so we first use a vision model to | ||
| describe the images, then pass that description to the text model. | ||
| Args: | ||
| client: HTTP client to use | ||
| api_key: API key for iFlow | ||
| api_base: Base URL for iFlow API | ||
| body: Original request body with images | ||
| model: Target model name (for context in prompt) | ||
| Returns: | ||
| Text summary of the images | ||
| """ | ||
| vision_body = copy.deepcopy(body) | ||
| vision_body["stream"] = False | ||
| vision_body.pop("stream_options", None) | ||
| vision_body.pop("tools", None) | ||
| vision_body.pop("tool_choice", None) | ||
| vision_body.pop("response_format", None) | ||
| vision_body.pop("thinking", None) | ||
| vision_body.pop("reasoning_effort", None) | ||
| vision_body["max_tokens"] = 1024 | ||
|
|
||
| messages = vision_body.get("messages", []) | ||
| if not isinstance(messages, list): | ||
| return "" | ||
|
|
||
| # Add instruction for vision model | ||
| instruction = ( | ||
| f"You are an image analysis assistant. The current model is {model}, " | ||
| "which does not process images. Please analyze the user-uploaded images " | ||
| "and output a structured summary for the text model to continue reasoning. " | ||
| "Must include: 1) Main subject and scene; 2) Text in the image; " | ||
| "3) Key details relevant to the user's question. " | ||
| "Do not say you cannot see the image. Be concise." | ||
| ) | ||
| vision_body["messages"] = [ | ||
| {"role": "system", "content": instruction} | ||
| ] + messages | ||
| vision_body["model"] = FORCED_VISION_MODEL | ||
|
|
||
| # Build headers | ||
| headers = self._build_iflow_headers(api_key, stream=False) | ||
| url = f"{api_base.rstrip('/')}/chat/completions" | ||
|
|
||
| try: | ||
| response = await client.post( | ||
| url, | ||
| headers=headers, | ||
| json=vision_body, | ||
| timeout=TimeoutConfig.non_streaming(), | ||
| ) | ||
| response.raise_for_status() | ||
|
|
||
| # Handle potential gzip response | ||
| content = response.content | ||
| if response.headers.get("content-encoding") == "gzip": | ||
| content = gzip.decompress(content) | ||
|
|
||
| result = json.loads(content) | ||
| return extract_text_from_result(result) | ||
|
|
||
| except (httpx.HTTPStatusError, httpx.RequestError, httpx.TimeoutException) as e: | ||
| lib_logger.warning(f"[iFlow] Vision summary generation failed: {e}") | ||
| return "" | ||
| except (json.JSONDecodeError, gzip.BadGzipFile) as e: | ||
| lib_logger.warning(f"[iFlow] Vision summary parsing failed: {e}") | ||
| return "" | ||
|
|
There was a problem hiding this comment.
_generate_vision_summary() is currently unused (no call sites), and there’s also _build_two_stage_main_body() below. Keeping large unused async + HTTP logic in the provider increases maintenance burden and makes it harder to reason about the real behavior. Either wire these into the vision fallback flow (two-stage summary) or remove them until needed.
| async def _generate_vision_summary( | |
| self, | |
| client: httpx.AsyncClient, | |
| api_key: str, | |
| api_base: str, | |
| body: Dict[str, Any], | |
| model: str, | |
| ) -> str: | |
| """ | |
| Generate a text summary of images using a vision model. | |
| This is used for two-stage vision processing where GLM/MiniMax models | |
| don't support images directly, so we first use a vision model to | |
| describe the images, then pass that description to the text model. | |
| Args: | |
| client: HTTP client to use | |
| api_key: API key for iFlow | |
| api_base: Base URL for iFlow API | |
| body: Original request body with images | |
| model: Target model name (for context in prompt) | |
| Returns: | |
| Text summary of the images | |
| """ | |
| vision_body = copy.deepcopy(body) | |
| vision_body["stream"] = False | |
| vision_body.pop("stream_options", None) | |
| vision_body.pop("tools", None) | |
| vision_body.pop("tool_choice", None) | |
| vision_body.pop("response_format", None) | |
| vision_body.pop("thinking", None) | |
| vision_body.pop("reasoning_effort", None) | |
| vision_body["max_tokens"] = 1024 | |
| messages = vision_body.get("messages", []) | |
| if not isinstance(messages, list): | |
| return "" | |
| # Add instruction for vision model | |
| instruction = ( | |
| f"You are an image analysis assistant. The current model is {model}, " | |
| "which does not process images. Please analyze the user-uploaded images " | |
| "and output a structured summary for the text model to continue reasoning. " | |
| "Must include: 1) Main subject and scene; 2) Text in the image; " | |
| "3) Key details relevant to the user's question. " | |
| "Do not say you cannot see the image. Be concise." | |
| ) | |
| vision_body["messages"] = [ | |
| {"role": "system", "content": instruction} | |
| ] + messages | |
| vision_body["model"] = FORCED_VISION_MODEL | |
| # Build headers | |
| headers = self._build_iflow_headers(api_key, stream=False) | |
| url = f"{api_base.rstrip('/')}/chat/completions" | |
| try: | |
| response = await client.post( | |
| url, | |
| headers=headers, | |
| json=vision_body, | |
| timeout=TimeoutConfig.non_streaming(), | |
| ) | |
| response.raise_for_status() | |
| # Handle potential gzip response | |
| content = response.content | |
| if response.headers.get("content-encoding") == "gzip": | |
| content = gzip.decompress(content) | |
| result = json.loads(content) | |
| return extract_text_from_result(result) | |
| except (httpx.HTTPStatusError, httpx.RequestError, httpx.TimeoutException) as e: | |
| lib_logger.warning(f"[iFlow] Vision summary generation failed: {e}") | |
| return "" | |
| except (json.JSONDecodeError, gzip.BadGzipFile) as e: | |
| lib_logger.warning(f"[iFlow] Vision summary parsing failed: {e}") | |
| return "" |
| async def _make_request_with_retry( | ||
| self, | ||
| client: httpx.AsyncClient, | ||
| method: str, | ||
| url: str, | ||
| headers: Dict[str, str], | ||
| json_body: Dict[str, Any], | ||
| max_retries: int = MAX_RETRIES, | ||
| ) -> httpx.Response: | ||
| """ | ||
| Make an HTTP request with automatic retry for transient failures. | ||
| Retry conditions: | ||
| - 5xx server errors | ||
| - 429 rate limits (with Retry-After header) | ||
| - Network errors | ||
| Does NOT retry: | ||
| - 4xx client errors (except 429) | ||
| - Authentication errors | ||
| """ | ||
| if max_retries < 1: | ||
| raise ValueError(f"max_retries must be at least 1, got {max_retries}") | ||
| last_error: Optional[Exception] = None | ||
|
|
||
| for attempt in range(max_retries): |
There was a problem hiding this comment.
_make_request_with_retry() is unused, and last_error is assigned but never read. If this retry helper isn’t going to be integrated, consider removing it to avoid dead code; otherwise, wire it into the request path and either use last_error (e.g., for a final raised exception with context) or drop it.
| from litellm.exceptions import RateLimitError, AuthenticationError | ||
| from pathlib import Path | ||
|
|
||
| from ..core.errors import StreamedAPIError | ||
| import uuid | ||
| from datetime import datetime |
There was a problem hiding this comment.
There are unused imports after these changes (AuthenticationError and datetime don’t appear to be referenced in this module). Removing unused imports will avoid lint warnings and keep the provider easier to scan.
| from litellm.exceptions import RateLimitError, AuthenticationError | |
| from pathlib import Path | |
| from ..core.errors import StreamedAPIError | |
| import uuid | |
| from datetime import datetime | |
| from litellm.exceptions import RateLimitError | |
| from pathlib import Path | |
| from ..core.errors import StreamedAPIError | |
| import uuid |
| # If vision fallback, strip images from payload and add vision summary context | ||
| if vision_fallback: | ||
| messages = payload.get("messages", []) | ||
| if messages: | ||
| sanitized = strip_images_from_messages(messages) | ||
| bridge_msg = { | ||
| "role": "system", | ||
| "content": ( | ||
| "The original model does not support images. " | ||
| "Please continue with the text content only." | ||
| ), | ||
| } | ||
| payload["messages"] = [bridge_msg] + sanitized | ||
|
|
There was a problem hiding this comment.
The vision-fallback path switches effective_model to FORCED_VISION_MODEL but then strips images from payload["messages"]. This prevents the vision model from actually receiving the image inputs, so the fallback cannot succeed for image requests. Either keep the images when calling the forced vision model, or implement the intended two-stage flow (vision summary with images -> text model with images stripped + injected summary).
| # If vision fallback, strip images from payload and add vision summary context | |
| if vision_fallback: | |
| messages = payload.get("messages", []) | |
| if messages: | |
| sanitized = strip_images_from_messages(messages) | |
| bridge_msg = { | |
| "role": "system", | |
| "content": ( | |
| "The original model does not support images. " | |
| "Please continue with the text content only." | |
| ), | |
| } | |
| payload["messages"] = [bridge_msg] + sanitized |
| # Parse suffix from model name for thinking config | ||
| suffix_result = parse_suffix(model_name) | ||
| base_model = suffix_result.model_name | ||
|
|
||
| # Handle vision fallback | ||
| effective_model = FORCED_VISION_MODEL if vision_fallback else base_model | ||
|
|
||
| kwargs_with_model = {**kwargs, "model": effective_model} | ||
| payload = self._build_request_payload( | ||
| model_name, kwargs, **kwargs_with_stripped_model | ||
| effective_model, kwargs, **kwargs_with_model | ||
| ) |
There was a problem hiding this comment.
Model-name suffix parsing for thinking config is effectively disabled here: parse_suffix(model_name) is used to compute base_model, and then effective_model is always set to base_model (unless vision fallback). That strips the suffix before _build_request_payload() / _apply_thinking_config() runs, so suffix-driven configs like glm-4.7(high) can never take effect. Pass the original model name (including suffix) into _build_request_payload() and let _apply_thinking_config() strip it, or delay suffix stripping until after config extraction.
| file_logger.log_error(error_msg) | ||
| raise httpx.HTTPStatusError( | ||
| f"HTTP {response.status_code}: {error_text}", | ||
| request=response, |
There was a problem hiding this comment.
This httpx.HTTPStatusError is constructed with request=response, but request should be an httpx.Request (usually response.request). Passing the Response object can break downstream error handling/formatting that expects a Request. Use request=response.request here (consistent with the earlier construction above).
| request=response, | |
| request=response.request, |
| effort = kwargs.get(IFLOW_REASONING_EFFORT) | ||
| if not effort: | ||
| return None | ||
|
|
||
| effort_lower = str(effort).strip().lower() | ||
|
|
||
| if effort_lower == "none": | ||
| return ThinkingConfig(mode=ThinkingMode.NONE, budget=0) | ||
|
|
||
| try: | ||
| level = ThinkingLevel(effort_lower) | ||
| return ThinkingConfig(mode=ThinkingMode.LEVEL, level=level) | ||
| except ValueError: | ||
| return None |
There was a problem hiding this comment.
_extract_openai_config_from_kwargs() treats falsy values as "no config" (if not effort: return None). This means reasoning_effort=0 (int) or strings like "0"/"false"/"disabled" won’t explicitly disable thinking anymore. If you want backward-compatible disable semantics (as the previous logic supported), handle these values explicitly instead of short-circuiting on truthiness.
9817625 to
87b894b
Compare
Summary
Add support for 5 new iFlow models to maintain feature parity: glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b. Each model is registered with appropriate thinking, reasoning, and preservation configurations.
Changes
glm-5,minimax-m2.5,qwen3-32b,tstars2.0, andiflow-rome-30ba3btoHARDCODED_MODELSglm-5toENABLE_THINKING_MODELSandGLM_MODELSsetsqwen3-32btoENABLE_THINKING_MODELSsetminimax-m2.5toREASONING_SPLIT_MODELSsetREASONING_PRESERVATION_MODELS_PREFIXESwithglm-5andtstarsprefixesHARDCODED_MODELSlist alphabetically by model family for consistencyFiles Changed
src/rotator_library/providers/iflow_provider.pyTesting Instructions
glm-5— should sendenable_thinking+clear_thinking=false(GLM-style)minimax-m2.5— should usereasoning_splitinstead ofenable_thinkingqwen3-32b— should sendenable_thinkingtstars2.0andiflow-rome-30ba3b— standard chat completionsTested right now:

Closes #129
Important
Add support for new iFlow models
glm-5,minimax-m2.5,qwen3-32b,tstars2.0, andiflow-rome-30ba3bwith appropriate configurations iniflow_provider.py.glm-5,minimax-m2.5,qwen3-32b,tstars2.0,iflow-rome-30ba3btoHARDCODED_MODELSiniflow_provider.py.glm-5toENABLE_THINKING_MODELSandGLM_MODELS.qwen3-32btoENABLE_THINKING_MODELS.minimax-m2.5toREASONING_SPLIT_MODELS.REASONING_PRESERVATION_MODELS_PREFIXESwithglm-5andtstarsprefixes.HARDCODED_MODELSalphabetically by model family.This description was created by
for 87b894b. You can customize this summary. It will automatically update as commits are pushed.