-
Notifications
You must be signed in to change notification settings - Fork 3k
Gemini 3 Pro support and cross-model conversation compatibility #2158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Bump litellm dependency to >= 1.80.7 for Gemini thought signatures support - Add Gemini 3 Pro thought_signature support for function calling - Handle both LiteLLM provider_specific_fields and Gemini extra_content formats - Clean up __thought__ suffix on tool call ids for Gemini models - Attach provider_data to all non-Responses output items - Store model, response_id and provider specific metadata - Store Gemini thought_signature on function call items - Use provider_data.model to decide what data is safe to send per provider - Keep handoff transcripts stable by hiding provider_data in history output
|
Thanks for sending this PR! Overall, the design is clean and the code looks good to go. If anyone could try this branch out and share early feedback before releasing it, it would be greatly appreciated. |
|
I am currently working on 0.6.4 release. This one can be included in 0.7.0 or later, so please wait a moment! |
markmcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi 👋 - I'm from the Gemini team, just took a quick pass over the code to see how it works with our implementation of thought re-circulation and everything here LGTM. One FYI comment on parallel tool calls but no action is required.
| continue | ||
|
|
||
| # Default to skip validator, overridden if valid thought signature exists | ||
| tool_call["provider_specific_fields"] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - in the context of parallel tool-calls, this adds the dummy signature to every tool call returned. In the docs, we specify that a dummy signature is to be provided on the first tool call, however it is safe to apply on all of them so no need to change anything.
| ) | ||
| return cast(Union[Response, AsyncStream[ResponseStreamEvent]], response) | ||
|
|
||
| def _remove_openai_responses_api_incompatible_fields(self, list_input: list[Any]) -> list[Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ihower Mixing OpenAI Responses API items and non-OpenAI Chat Completions items in the same conversation history is not currently supported. We recommend using only Chat Completions for this use case, and in practice, most users do not mix providers.
My question is: does this workaround actually work in real-world usage? In particular, the lack of valid item IDs seems problematic. Beyond that, there are likely many unexpected patterns, so I am not convinced we should simply accept items after removing id and provider_data. Given that, I think it may make more sense to reject these items (either via Responses API rejection or input validation on the SDK side) rather than trying to support this kind of workaround. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think supporting this is very useful in practice.
- Why this matters?
Some real apps let users switch models inside one conversation. For example, Perplexity has a “pick a model” UX. In this SDK, handoffs also often mean switching to a different model. There are several issues requesting switching to non-OpenAI models before(whennest_handoff_history=False):
- Gemini does not work with agents sdk #226
- invalid_request_error when using "chat_completions" with triage agent (gemini -> any other model) #237
- Handoff does not work with Claude 3.7 Sonnet #270
- Title: Handoff fails with Invalid 'input[1].id': '__fake_id__' when using non-OpenAI models and OpenAI tracing #1485
This PR can address these needs together.
Also, frameworks like LangChain and PydanticAI treat cross-model history as a standard feature by defining an internal normalized message format.
-
Does it work in real usage?
Yes, it works for the common case. I included live test code in the PR description, and I tested mixed histories across Gemini, Claude, and OpenAI. Both streaming and non-streaming paths work.
I agree this may not support all patterns, especially when mixing provider-specific hosted tools. But for the typical case with local function calls, it should behave well.
- Why removing
__fake_id__andprovider_datais enough
Because we do not define an internal standard format ahead of time (unlike LangChain or PydanticAI), the most compatible approach is: treat OpenAI Responses API items as the baseline, and only add extra fields when we pass through non-OpenAI providers.
-
About
id:ResponseInputItemParamdoes not requireidas an input field. The only time we have a real, valid item id is when the item originally came from the Responses API. The SDK injected__fake_id__is just a placeholder, so removing it before calling the Responses API is safe and makes the payload valid. -
About
provider_data:provider_dataexists only to store provider specific data (i.e. non-OpenAI Responses API). The Responses API does not understand it, so stripping it restores compatibility. If a user never uses non-Responses providers, items never getprovider_datain the first place, so nothing changes.
So I’m aiming to support cross-model conversation compatibility while keeping changes to the raw item format minimal, to avoid impacting developers who only use the Responses API: keep the baseline format unchanged for Responses-API only users, and only do compatibility cleanup for histories that passed through non-OpenAI providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the response. Your changes here should be okay for both many of relatively simple cross-provider use cases and OpenAI Responses API only use cases. That said, this project is not committed to fully support the cross-provider use cases. Having this extra logic is okay, but I'd like to have additional code comment clarifying this logic does not guarantee all the use cases across various providers work without issues. Also, as a maintainer, I will continue encouraging users not to mix those. I understand other frameworks may support such nicely, but our Responses API support won't pursue the same. For those use cases, we generally recommend using chat completions + litellm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no rush at all but i would like to have some clarification on the expectation of cross-provider support at the code level. Once it's added, we can merge the changes here.
Resolves:
Summary
This PR does two main things:
thought_signaturesin function calling.The goal is to make different providers interoperable: allowing them to safely share the same
to_input_list()items, while each provider only receives the metadata it understands.Examples
Besides unit tests, I performed live tests for all the following scenarios:
LiteLLM + Gemini
Gemini ChatCompletions (OpenAI-compatible endpoint)
Cross-model conversations (same raw items handled by different models)
Handoffs (disabled
nest_handoff_history)1. Gemini 3 Pro function calling (
thought_signatures)Gemini 3 Pro now requires a
thought_signatureattached to function call in the same turn.Docs: https://ai.google.dev/gemini-api/docs/thought-signatures
This PR supports both integration paths with non-streaming and streaming modes:
The conversation flow is: LiteLLM ↔ ChatCompletions ↔ our raw items.
LiteLLM layer
LiteLLM places Gemini’s
thought_signatureinsideprovider_specific_fields.This PR handles the conversion between:
LiteLLM’s
provider_specific_fields["thought_signature"]↔
Google ChatCompletions format
extra_content={"google": {"thought_signature": ...}}ChatCompletions layer
This PR handles the conversion between:
Google ChatCompletions format
extra_content={"google": {"thought_signature": ...}}↔
our raw item’s internal new field
provider_data["thought_signature"]Cleaning up LiteLLM’s
__thought__suffixLiteLLM adds a
__thought__suffix to Gemini tool call ids (see:BerriAI/litellm#16895). This suffix is not needed since we
have
thought_signatureand it causes call_id validation problems when the items are passed to other models.Therefore, this PR removes it.
2. Enables cross-model conversations
To support cross-model conversations, this PR introduces a new
provider_datafieldon raw response items. This field holds metadata not compatible with the OpenAI Responses API, allowing us to:
For non–OpenAI Responses API models, we now store this into raw item:
This design is like PydanticAI, which uses a similar structure. The difference: PydanticAI stores metadata for all models,
whereas this PR stores
provider_dataonly for non-OpenAI providers.With
provider_dataand the model name passed into the converters, agents can now safely switch models while reusing the same raw items fromto_input_list(). This includes:It also works with handoffs when
nest_handoff_history=False.Implementation Details
Because items in a conversation can come from different providers, and each provider has different requirements, this PR passes the target model name into several conversion helpers:
Converter.items_to_messages(..., model=...)LitellmConverter.convert_message_to_openai(..., model=...)ChatCmplStreamHandler.handle_stream(..., model=...)Converter.message_to_output_items(..., provider_data=...)This lets us branch on behavior for different providers in a controlled way and avoid regressions by handling provider-specific cases. This is especially important for reasoning models, where each provider handles encrypted tokens differently.
There are libraries like PydanticAI and LangChain define their own internal standard formats to enable cross-model conversations:
By contrast, LiteLLM has not fully abstracted away these differences. It focuses on making each model call work with provider-specific workarounds, without defining a normalized history format for cross-model conversations. Therefore, we need explicit model-aware at this layer to make cross-model possible.
For example, when we store Claude's
thinking_blockssignature inside our reasoning item'sencrypted_contentfield, we also need to know that it came from a Claude model. Otherwise, we will send this Claude-only encrypted content to another provider, which cannot safely interpret it.The guiding principle in this PR is to treat OpenAI Responses API items as the baseline format, and use
provider_datato extend them with provider-specific metadata when needed.For OpenAI Responses API:
When sending items to the OpenAI Responses API, we must not send provider-specific metadata or fake ids.
This PR adds:
OpenAIResponsesModel._remove_openai_responses_api_incompatible_fields(...)provider_data.idwhen it equalsFAKE_RESPONSES_ID.provider_data(these are provider-specific).provider_datafield from all items.This keeps the payload clean and compatible with the Responses API, even if the items previously flowed through non-OpenAI providers.
Design notes: reasoning items vs provider_data
This PR does not introduce a separate reasoning item (e.g. Claude thinking_blocks does) for Gemini function call's
thought_signatures. Instead it stores the signatures inprovider_dataon the function call item.The main reasons:
This design is again similar to PydanticAI’s approach and also mirrors the underlying Gemini parts structure: signatures are attached to the parts they describe instead of creating an extra reasoning item with no text.
I also study at the Gemini API raw format, there are four raw part structure with thought_signature:
functionCall: {...}withthought_signature: "xxx"→ handled in this PR: keep the thought_signature with the function call.text: "...."withthought_signature: "xxx"→ could attach to the output item (no extra reasoning item needed).text: ""withthought_signature: "xxx"→ (empty text) this is the case where a standalone reasoning item makes sense.text: "summary..."withthought: true→ (this is thinking summary) this is another case where a standalone reasoning item make sense.This PR implements case (1), which is sufficient for Gemini’s current function calling requirement.
Other cases can be added later if needed.
This PR should have no side effects on projects that only use the OpenAI Responses API, and I believe it establishes a better groundwork for handling various provider-specific cases.