feat(iflow): add support for glm-5, minimax-m2.5, qwen3-32b, tstars2.0, iflow-rome-30ba3b by MasuRii · Pull Request #130 · Mirrowel/LLM-API-Key-Proxy

MasuRii · 2026-02-15T07:37:09Z

Summary

Add support for 5 new iFlow models to maintain feature parity: glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b. Each model is registered with appropriate thinking, reasoning, and preservation configurations.

Changes

Add glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b to HARDCODED_MODELS
Add glm-5 to ENABLE_THINKING_MODELS and GLM_MODELS sets
Add qwen3-32b to ENABLE_THINKING_MODELS set
Add minimax-m2.5 to REASONING_SPLIT_MODELS set
Update REASONING_PRESERVATION_MODELS_PREFIXES with glm-5 and tstars prefixes
Sort HARDCODED_MODELS list alphabetically by model family for consistency

Files Changed

File	Type	Impact
`src/rotator_library/providers/iflow_provider.py`	Modified	Register 5 new models with correct thinking/reasoning configurations

Testing Instructions

Verify each new model appears in the model list endpoint
Test glm-5 — should send enable_thinking + clear_thinking=false (GLM-style)
Test minimax-m2.5 — should use reasoning_split instead of enable_thinking
Test qwen3-32b — should send enable_thinking
Test tstars2.0 and iflow-rome-30ba3b — standard chat completions

Tested right now:

Closes #129

Important

Add support for new iFlow models glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b with appropriate configurations in iflow_provider.py.

Models:
- Add glm-5, minimax-m2.5, qwen3-32b, tstars2.0, iflow-rome-30ba3b to HARDCODED_MODELS in iflow_provider.py.
- Add glm-5 to ENABLE_THINKING_MODELS and GLM_MODELS.
- Add qwen3-32b to ENABLE_THINKING_MODELS.
- Add minimax-m2.5 to REASONING_SPLIT_MODELS.
- Update REASONING_PRESERVATION_MODELS_PREFIXES with glm-5 and tstars prefixes.
Misc:
- Sort HARDCODED_MODELS alphabetically by model family.

^{This description was created by}^{for 87b894b. You can customize this summary. It will automatically update as commits are pushed.}

…0, iflow-rome-30ba3b Add five new models to the iFlow provider to maintain feature parity with the competitor repository (router-for-me/CLIProxyAPI): - glm-5: Added to HARDCODED_MODELS, ENABLE_THINKING_MODELS, and GLM_MODELS (thinking support with GLM-style clear_thinking handling) - minimax-m2.5: Added to HARDCODED_MODELS and REASONING_SPLIT_MODELS (thinking support via reasoning_split boolean) - qwen3-32b: Added to HARDCODED_MODELS and ENABLE_THINKING_MODELS (thinking support) - tstars2.0: Added to HARDCODED_MODELS (multimodal assistant) - iflow-rome-30ba3b: Added to HARDCODED_MODELS (iFlow Rome model) Also update REASONING_PRESERVATION_MODELS_PREFIXES to include "glm-5" and "tstars" prefixes, and alphabetically sort the HARDCODED_MODELS list for maintainability. Closes: Mirrowel#129

mirrobot-agent · 2026-02-15T07:37:57Z

Starting my review of the iFlow provider updates. I'll be checking the configuration for the 5 new models (glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b) to ensure they're correctly registered and integrated. 🔍

mirrobot-agent

Overall Assessment

Nice work on expanding the iFlow model support! Adding these 5 new models (glm-5, minimax-m2.5, qwen3-32b, tstars2.0, and iflow-rome-30ba3b) with their specific thinking and reasoning configurations ensures the provider stays up-to-date with the latest offerings. The internal grouping of models by family in HARDCODED_MODELS is also a great improvement for maintainability.

Architectural Feedback

The approach of using specific sets for different thinking/reasoning behaviors (ENABLE_THINKING_MODELS, GLM_MODELS, REASONING_SPLIT_MODELS) is clean and matches the established patterns in the codebase.

Key Suggestions

Reasoning Preservation: I've noted that qwen3-32b (and some existing models like deepseek-v3.2) are missing from the REASONING_PRESERVATION_MODELS_PREFIXES list. If these models return reasoning_content, adding them to this list is necessary for coherent multi-turn conversations.
Prefix Simplification: Suggesting a move to more generic prefixes (e.g., glm- instead of version-specific ones) to reduce the need for future updates.

Nitpicks

Alphabetical Sorting: The deepseek family is currently slightly out of place in the family-sorted list.

Questions for the Author

Does iflow-rome-30ba3b also support any form of thinking or reasoning content, or is it strictly a standard completion model?
Are there specific reasons why deepseek and qwen models are excluded from the reasoning preservation cache?

This review was generated by an AI assistant.

mirrobot-agent · 2026-02-15T07:38:51Z

src/rotator_library/providers/iflow_provider.py

    "qwen3-max-preview",
-    "qwen3-235b-a22b-thinking-2507",
+    "qwen3-vl-plus",
    "deepseek-v3.2-reasoner",


The deepseek family is currently placed after qwen. If the intention is to sort the list alphabetically by model family (as mentioned in the PR summary), the deepseek group (starting with 'D') should be moved to the top, before glm ('G').

mirrobot-agent · 2026-02-15T07:38:51Z

src/rotator_library/providers/iflow_provider.py

    "glm-4.7",
+    "glm-5",
    "qwen3-max-preview",
+    "qwen3-32b",


Since qwen3-32b is configured as a thinking model, it will likely return reasoning_content. You might want to add a "qwen" prefix to REASONING_PRESERVATION_MODELS_PREFIXES (line 106) so that its reasoning is correctly cached and preserved in multi-turn conversations. The same applies to deepseek-v3.2 and other thinking-enabled models that aren't currently in that list.

mirrobot-agent · 2026-02-15T07:38:51Z

src/rotator_library/providers/iflow_provider.py

 # Models that benefit from reasoning_content preservation in message history
 # (for multi-turn conversations)
-REASONING_PRESERVATION_MODELS_PREFIXES = ("glm-4", "minimax-m2")
+REASONING_PRESERVATION_MODELS_PREFIXES = ("glm-4", "glm-5", "minimax-m2", "tstars")


To make this list more robust, consider using broader prefixes where appropriate. For example, "glm-" would cover both glm-4 and glm-5 (and any future versions), while "minimax-" would cover all current and future minimax models.

Copilot

Pull request overview

Adds support for additional iFlow model IDs and updates model-specific “thinking/reasoning” handling, with substantial additional refactoring in the iFlow provider around payload shaping and streaming error handling.

Changes:

Register 5 new iFlow models and update model capability sets (thinking, GLM handling, reasoning_split, reasoning preservation prefixes).
Expand iFlow provider request/stream handling (suffix-based thinking config parsing, image normalization + fallback logic, gzip handling, token usage estimation).
Harden embedding cost calculation to handle missing usage/cost metadata.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File	Description
`src/rotator_library/providers/iflow_provider.py`	Adds new model IDs and significantly refactors request building + streaming handling (thinking config, image handling, gzip, token estimation).
`src/rotator_library/client/executor.py`	Makes embedding cost calculation resilient to `usage=None` and `input_cost_per_token` being `None` or `0.0`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-15T12:41:18Z

src/rotator_library/providers/iflow_provider.py

+        # Ensure max_tokens is at least 1024 as recommended by iflow2api logic
+        if "max_tokens" in payload:
+            payload["max_tokens"] = max(payload["max_tokens"], 1024)
+        else:
+            # Default to 4096 if not specified


This forces max_tokens to be at least 1024 and defaults it to 4096 when the caller didn’t specify it. That changes the meaning of requests (e.g., callers intentionally limiting max_tokens below 1024) and can increase latency/cost unexpectedly. Consider only applying a default when max_tokens is missing, and avoid overriding user-provided values (or gate the minimum behind a config flag).

Suggested change

# Ensure max_tokens is at least 1024 as recommended by iflow2api logic

if "max_tokens" in payload:

payload["max_tokens"] = max(payload["max_tokens"], 1024)

else:

# Default to 4096 if not specified

# Default max_tokens if not specified; do not override user-provided values

if "max_tokens" not in payload:

Copilot · 2026-02-15T12:41:19Z

src/rotator_library/providers/iflow_provider.py

+    async def _generate_vision_summary(
+        self,
+        client: httpx.AsyncClient,
+        api_key: str,
+        api_base: str,
+        body: Dict[str, Any],
+        model: str,
+    ) -> str:
+        """
+        Generate a text summary of images using a vision model.
+
+        This is used for two-stage vision processing where GLM/MiniMax models
+        don't support images directly, so we first use a vision model to
+        describe the images, then pass that description to the text model.
+
+        Args:
+            client: HTTP client to use
+            api_key: API key for iFlow
+            api_base: Base URL for iFlow API
+            body: Original request body with images
+            model: Target model name (for context in prompt)
+
+        Returns:
+            Text summary of the images
+        """
+        vision_body = copy.deepcopy(body)
+        vision_body["stream"] = False
+        vision_body.pop("stream_options", None)
+        vision_body.pop("tools", None)
+        vision_body.pop("tool_choice", None)
+        vision_body.pop("response_format", None)
+        vision_body.pop("thinking", None)
+        vision_body.pop("reasoning_effort", None)
+        vision_body["max_tokens"] = 1024
+
+        messages = vision_body.get("messages", [])
+        if not isinstance(messages, list):
+            return ""
+
+        # Add instruction for vision model
+        instruction = (
+            f"You are an image analysis assistant. The current model is {model}, "
+            "which does not process images. Please analyze the user-uploaded images "
+            "and output a structured summary for the text model to continue reasoning. "
+            "Must include: 1) Main subject and scene; 2) Text in the image; "
+            "3) Key details relevant to the user's question. "
+            "Do not say you cannot see the image. Be concise."
+        )
+        vision_body["messages"] = [
+            {"role": "system", "content": instruction}
+        ] + messages
+        vision_body["model"] = FORCED_VISION_MODEL
+
+        # Build headers
+        headers = self._build_iflow_headers(api_key, stream=False)
+        url = f"{api_base.rstrip('/')}/chat/completions"
+
+        try:
+            response = await client.post(
+                url,
+                headers=headers,
+                json=vision_body,
+                timeout=TimeoutConfig.non_streaming(),
+            )
+            response.raise_for_status()
+
+            # Handle potential gzip response
+            content = response.content
+            if response.headers.get("content-encoding") == "gzip":
+                content = gzip.decompress(content)
+
+            result = json.loads(content)
+            return extract_text_from_result(result)
+
+        except (httpx.HTTPStatusError, httpx.RequestError, httpx.TimeoutException) as e:
+            lib_logger.warning(f"[iFlow] Vision summary generation failed: {e}")
+            return ""
+        except (json.JSONDecodeError, gzip.BadGzipFile) as e:
+            lib_logger.warning(f"[iFlow] Vision summary parsing failed: {e}")
+            return ""
+


_generate_vision_summary() is currently unused (no call sites), and there’s also _build_two_stage_main_body() below. Keeping large unused async + HTTP logic in the provider increases maintenance burden and makes it harder to reason about the real behavior. Either wire these into the vision fallback flow (two-stage summary) or remove them until needed.

Suggested change

async def _generate_vision_summary(

self,

client: httpx.AsyncClient,

api_key: str,

api_base: str,

body: Dict[str, Any],

model: str,

) -> str:

"""

Generate a text summary of images using a vision model.

This is used for two-stage vision processing where GLM/MiniMax models

don't support images directly, so we first use a vision model to

describe the images, then pass that description to the text model.

Args:

client: HTTP client to use

api_key: API key for iFlow

api_base: Base URL for iFlow API

body: Original request body with images

model: Target model name (for context in prompt)

Returns:

Text summary of the images

"""

vision_body = copy.deepcopy(body)

vision_body["stream"] = False

vision_body.pop("stream_options", None)

vision_body.pop("tools", None)

vision_body.pop("tool_choice", None)

vision_body.pop("response_format", None)

vision_body.pop("thinking", None)

vision_body.pop("reasoning_effort", None)

vision_body["max_tokens"] = 1024

messages = vision_body.get("messages", [])

if not isinstance(messages, list):

return ""

# Add instruction for vision model

instruction = (

f"You are an image analysis assistant. The current model is {model}, "

"which does not process images. Please analyze the user-uploaded images "

"and output a structured summary for the text model to continue reasoning. "

"Must include: 1) Main subject and scene; 2) Text in the image; "

"3) Key details relevant to the user's question. "

"Do not say you cannot see the image. Be concise."

)

vision_body["messages"] = [

{"role": "system", "content": instruction}

] + messages

vision_body["model"] = FORCED_VISION_MODEL

# Build headers

headers = self._build_iflow_headers(api_key, stream=False)

url = f"{api_base.rstrip('/')}/chat/completions"

try:

response = await client.post(

url,

headers=headers,

json=vision_body,

timeout=TimeoutConfig.non_streaming(),

)

response.raise_for_status()

# Handle potential gzip response

content = response.content

if response.headers.get("content-encoding") == "gzip":

content = gzip.decompress(content)

result = json.loads(content)

return extract_text_from_result(result)

except (httpx.HTTPStatusError, httpx.RequestError, httpx.TimeoutException) as e:

lib_logger.warning(f"[iFlow] Vision summary generation failed: {e}")

return ""

except (json.JSONDecodeError, gzip.BadGzipFile) as e:

lib_logger.warning(f"[iFlow] Vision summary parsing failed: {e}")

return ""

Copilot · 2026-02-15T12:41:19Z

src/rotator_library/providers/iflow_provider.py

+    async def _make_request_with_retry(
+        self,
+        client: httpx.AsyncClient,
+        method: str,
+        url: str,
+        headers: Dict[str, str],
+        json_body: Dict[str, Any],
+        max_retries: int = MAX_RETRIES,
+    ) -> httpx.Response:
+        """
+        Make an HTTP request with automatic retry for transient failures.
+
+        Retry conditions:
+        - 5xx server errors
+        - 429 rate limits (with Retry-After header)
+        - Network errors
+
+        Does NOT retry:
+        - 4xx client errors (except 429)
+        - Authentication errors
+        """
+        if max_retries < 1:
+            raise ValueError(f"max_retries must be at least 1, got {max_retries}")
+        last_error: Optional[Exception] = None
+
+        for attempt in range(max_retries):


_make_request_with_retry() is unused, and last_error is assigned but never read. If this retry helper isn’t going to be integrated, consider removing it to avoid dead code; otherwise, wire it into the request path and either use last_error (e.g., for a final raised exception with context) or drop it.

Copilot · 2026-02-15T12:41:19Z

src/rotator_library/providers/iflow_provider.py

 from litellm.exceptions import RateLimitError, AuthenticationError
 from pathlib import Path
+
+from ..core.errors import StreamedAPIError
 import uuid
 from datetime import datetime


There are unused imports after these changes (AuthenticationError and datetime don’t appear to be referenced in this module). Removing unused imports will avoid lint warnings and keep the provider easier to scan.

Suggested change

from litellm.exceptions import RateLimitError, AuthenticationError

from pathlib import Path

from ..core.errors import StreamedAPIError

import uuid

from datetime import datetime

from litellm.exceptions import RateLimitError

from pathlib import Path

from ..core.errors import StreamedAPIError

import uuid

Copilot · 2026-02-15T12:41:19Z

src/rotator_library/providers/iflow_provider.py

+            # If vision fallback, strip images from payload and add vision summary context
+            if vision_fallback:
+                messages = payload.get("messages", [])
+                if messages:
+                    sanitized = strip_images_from_messages(messages)
+                    bridge_msg = {
+                        "role": "system",
+                        "content": (
+                            "The original model does not support images. "
+                            "Please continue with the text content only."
+                        ),
+                    }
+                    payload["messages"] = [bridge_msg] + sanitized
+


The vision-fallback path switches effective_model to FORCED_VISION_MODEL but then strips images from payload["messages"]. This prevents the vision model from actually receiving the image inputs, so the fallback cannot succeed for image requests. Either keep the images when calling the forced vision model, or implement the intended two-stage flow (vision summary with images -> text model with images stripped + injected summary).

Suggested change

# If vision fallback, strip images from payload and add vision summary context

if vision_fallback:

messages = payload.get("messages", [])

if messages:

sanitized = strip_images_from_messages(messages)

bridge_msg = {

"role": "system",

"content": (

"The original model does not support images. "

"Please continue with the text content only."

),

}

payload["messages"] = [bridge_msg] + sanitized

Copilot · 2026-02-15T12:41:19Z

src/rotator_library/providers/iflow_provider.py

+            # Parse suffix from model name for thinking config
+            suffix_result = parse_suffix(model_name)
+            base_model = suffix_result.model_name
+
+            # Handle vision fallback
+            effective_model = FORCED_VISION_MODEL if vision_fallback else base_model
+
+            kwargs_with_model = {**kwargs, "model": effective_model}
            payload = self._build_request_payload(
-                model_name, kwargs, **kwargs_with_stripped_model
+                effective_model, kwargs, **kwargs_with_model
            )


Model-name suffix parsing for thinking config is effectively disabled here: parse_suffix(model_name) is used to compute base_model, and then effective_model is always set to base_model (unless vision fallback). That strips the suffix before _build_request_payload() / _apply_thinking_config() runs, so suffix-driven configs like glm-4.7(high) can never take effect. Pass the original model name (including suffix) into _build_request_payload() and let _apply_thinking_config() strip it, or delay suffix stripping until after config extraction.

Copilot · 2026-02-15T12:41:20Z

src/rotator_library/providers/iflow_provider.py

+                        file_logger.log_error(error_msg)
+                        raise httpx.HTTPStatusError(
+                            f"HTTP {response.status_code}: {error_text}",
+                            request=response,


This httpx.HTTPStatusError is constructed with request=response, but request should be an httpx.Request (usually response.request). Passing the Response object can break downstream error handling/formatting that expects a Request. Use request=response.request here (consistent with the earlier construction above).

Suggested change

request=response,

request=response.request,

Copilot · 2026-02-15T12:41:20Z

src/rotator_library/providers/iflow_provider.py

+        effort = kwargs.get(IFLOW_REASONING_EFFORT)
+        if not effort:
+            return None
+
+        effort_lower = str(effort).strip().lower()
+
+        if effort_lower == "none":
+            return ThinkingConfig(mode=ThinkingMode.NONE, budget=0)
+
+        try:
+            level = ThinkingLevel(effort_lower)
+            return ThinkingConfig(mode=ThinkingMode.LEVEL, level=level)
+        except ValueError:
+            return None


_extract_openai_config_from_kwargs() treats falsy values as "no config" (if not effort: return None). This means reasoning_effort=0 (int) or strings like "0"/"false"/"disabled" won’t explicitly disable thinking anymore. If you want backward-compatible disable semantics (as the previous logic supported), handle these values explicitly instead of short-circuiting on truthiness.

MasuRii requested a review from Mirrowel as a code owner February 15, 2026 07:37

mirrobot-agent bot reviewed Feb 15, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings February 15, 2026 12:36

Copilot started reviewing on behalf of MasuRii February 15, 2026 12:37 View session

Copilot AI reviewed Feb 15, 2026

View reviewed changes

MasuRii force-pushed the feat/iflow-new-models branch from 9817625 to 87b894b Compare February 15, 2026 13:15

Uh oh!

Conversation

MasuRii commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Files Changed

Testing Instructions

Uh oh!

mirrobot-agent bot commented Feb 15, 2026

Uh oh!

mirrobot-agent bot left a comment

Choose a reason for hiding this comment

Overall Assessment

Architectural Feedback

Key Suggestions

Nitpicks

Questions for the Author

Uh oh!

mirrobot-agent bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MasuRii commented Feb 15, 2026 •

edited

Loading