Skip to content

Conversation

@ihower
Copy link
Contributor

@ihower ihower commented Dec 7, 2025

Resolves:

Summary

This PR does two main things:

  1. Adds support for Gemini 3 Pro thought_signatures in function calling.
  2. Enables cross-model conversations (OpenAI ↔ Gemini, Gemini ↔ Claude, etc.) by storing provider-specific metadata on raw items and passing the target model name into the conversion helpers.

The goal is to make different providers interoperable: allowing them to safely share the same to_input_list() items, while each provider only receives the metadata it understands.

Examples

Besides unit tests, I performed live tests for all the following scenarios:

1. Gemini 3 Pro function calling (thought_signatures)

Gemini 3 Pro now requires a thought_signature attached to function call in the same turn.
Docs: https://ai.google.dev/gemini-api/docs/thought-signatures

This PR supports both integration paths with non-streaming and streaming modes:

  1. LiteLLM integration (requires upgrading LiteLLM to version 1.80.5 or later)
  2. Google’s Gemini OpenAI-compatible API endpoint

The conversation flow is: LiteLLM ↔ ChatCompletions ↔ our raw items.

LiteLLM layer

LiteLLM places Gemini’s thought_signature inside provider_specific_fields.

This PR handles the conversion between:

LiteLLM’s provider_specific_fields["thought_signature"]

Google ChatCompletions format extra_content={"google": {"thought_signature": ...}}

ChatCompletions layer

This PR handles the conversion between:

Google ChatCompletions format extra_content={"google": {"thought_signature": ...}}

our raw item’s internal new field provider_data["thought_signature"]

Cleaning up LiteLLM’s __thought__ suffix

LiteLLM adds a __thought__ suffix to Gemini tool call ids (see:
BerriAI/litellm#16895). This suffix is not needed since we
have thought_signature and it causes call_id validation problems when the items are passed to other models.

Therefore, this PR removes it.

2. Enables cross-model conversations

To support cross-model conversations, this PR introduces a new provider_data field
on raw response items. This field holds metadata not compatible with the OpenAI Responses API, allowing us to:

  • Identify which provider produced each item.
  • Decide which fields are safe to send to other providers.
  • Keep OpenAI Responses API payloads clean and compatible.

For non–OpenAI Responses API models, we now store this into raw item:

provider_data = {
    "model": ...,
    "response_id": ...,  # Previously discarded when using non-Resposnes API; now preserved for inspection and debugging.
    # other provider-specific metadata, e.g. Gemini's "thought_signature": ...
}

This design is like PydanticAI, which uses a similar structure. The difference: PydanticAI stores metadata for all models,
whereas this PR stores provider_data only for non-OpenAI providers.

With provider_data and the model name passed into the converters, agents can now safely switch models while reusing the same raw items from to_input_list(). This includes:

  • Gemini ↔ OpenAI
  • Claude ↔ OpenAI
  • Gemini ↔ Claude

It also works with handoffs when nest_handoff_history=False.

Implementation Details

Because items in a conversation can come from different providers, and each provider has different requirements, this PR passes the target model name into several conversion helpers:

  • Converter.items_to_messages(..., model=...)
  • LitellmConverter.convert_message_to_openai(..., model=...)
  • ChatCmplStreamHandler.handle_stream(..., model=...)
  • Converter.message_to_output_items(..., provider_data=...)

This lets us branch on behavior for different providers in a controlled way and avoid regressions by handling provider-specific cases. This is especially important for reasoning models, where each provider handles encrypted tokens differently.

There are libraries like PydanticAI and LangChain define their own internal standard formats to enable cross-model conversations:

By contrast, LiteLLM has not fully abstracted away these differences. It focuses on making each model call work with provider-specific workarounds, without defining a normalized history format for cross-model conversations. Therefore, we need explicit model-aware at this layer to make cross-model possible.

For example, when we store Claude's thinking_blocks signature inside our reasoning item's encrypted_content field, we also need to know that it came from a Claude model. Otherwise, we will send this Claude-only encrypted content to another provider, which cannot safely interpret it.

The guiding principle in this PR is to treat OpenAI Responses API items as the baseline format, and use provider_data to extend them with provider-specific metadata when needed.

For OpenAI Responses API:

When sending items to the OpenAI Responses API, we must not send provider-specific metadata or fake ids.
This PR adds: OpenAIResponsesModel._remove_openai_responses_api_incompatible_fields(...)

  • Quickly returns the input unchanged if no item has provider_data.
  • Otherwise, processes items:
    • Removes id when it equals FAKE_RESPONSES_ID.
    • Drops reasoning items that have provider_data (these are provider-specific).
    • Removes the provider_data field from all items.

This keeps the payload clean and compatible with the Responses API, even if the items previously flowed through non-OpenAI providers.

Design notes: reasoning items vs provider_data

This PR does not introduce a separate reasoning item (e.g. Claude thinking_blocks does) for Gemini function call's thought_signatures. Instead it stores the signatures in provider_data on the function call item.

The main reasons:

  • Gemini thought signatures are function-call-bound and have no summary text. They belong strictly to the function call. Turning them into a separate reasoning item would add complexity without any benefit.
  • Keeping the signature directly on the function call: Matches how Gemini emits its metadata and Keeps the data model simple.

This design is again similar to PydanticAI’s approach and also mirrors the underlying Gemini parts structure: signatures are attached to the parts they describe instead of creating an extra reasoning item with no text.

I also study at the Gemini API raw format, there are four raw part structure with thought_signature:

  1. functionCall: {...} with thought_signature: "xxx" → handled in this PR: keep the thought_signature with the function call.
  2. text: "...." with thought_signature: "xxx" → could attach to the output item (no extra reasoning item needed).
  3. text: "" with thought_signature: "xxx" → (empty text) this is the case where a standalone reasoning item makes sense.
  4. text: "summary..." with thought: true → (this is thinking summary) this is another case where a standalone reasoning item make sense.

This PR implements case (1), which is sufficient for Gemini’s current function calling requirement.
Other cases can be added later if needed.


This PR should have no side effects on projects that only use the OpenAI Responses API, and I believe it establishes a better groundwork for handling various provider-specific cases.

- Bump litellm dependency to >= 1.80.7 for Gemini thought signatures support
- Add Gemini 3 Pro thought_signature support for function calling
  - Handle both LiteLLM provider_specific_fields and Gemini extra_content formats
  - Clean up __thought__ suffix on tool call ids for Gemini models
- Attach provider_data to all non-Responses output items
  - Store model, response_id and provider specific metadata
  - Store Gemini thought_signature on function call items
- Use provider_data.model to decide what data is safe to send per provider
- Keep handoff transcripts stable by hiding provider_data in history output
@seratch seratch added this to the 0.7.x milestone Dec 8, 2025
@seratch
Copy link
Member

seratch commented Dec 8, 2025

Thanks for sending this PR! Overall, the design is clean and the code looks good to go. If anyone could try this branch out and share early feedback before releasing it, it would be greatly appreciated.

@mbegur
Copy link

mbegur commented Dec 17, 2025

@ihower @seratch is this PR good to merge? Would love to get this in the next release of the package 🙏

@seratch
Copy link
Member

seratch commented Dec 17, 2025

I am currently working on 0.6.4 release. This one can be included in 0.7.0 or later, so please wait a moment!

Copy link

@markmcd markmcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi 👋 - I'm from the Gemini team, just took a quick pass over the code to see how it works with our implementation of thought re-circulation and everything here LGTM. One FYI comment on parallel tool calls but no action is required.

continue

# Default to skip validator, overridden if valid thought signature exists
tool_call["provider_specific_fields"] = {
Copy link

@markmcd markmcd Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - in the context of parallel tool-calls, this adds the dummy signature to every tool call returned. In the docs, we specify that a dummy signature is to be provided on the first tool call, however it is safe to apply on all of them so no need to change anything.

)
return cast(Union[Response, AsyncStream[ResponseStreamEvent]], response)

def _remove_openai_responses_api_incompatible_fields(self, list_input: list[Any]) -> list[Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ihower Mixing OpenAI Responses API items and non-OpenAI Chat Completions items in the same conversation history is not currently supported. We recommend using only Chat Completions for this use case, and in practice, most users do not mix providers.

My question is: does this workaround actually work in real-world usage? In particular, the lack of valid item IDs seems problematic. Beyond that, there are likely many unexpected patterns, so I am not convinced we should simply accept items after removing id and provider_data. Given that, I think it may make more sense to reject these items (either via Responses API rejection or input validation on the SDK side) rather than trying to support this kind of workaround. Thoughts?

Copy link
Contributor Author

@ihower ihower Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think supporting this is very useful in practice.

  1. Why this matters?
    Some real apps let users switch models inside one conversation. For example, Perplexity has a “pick a model” UX. In this SDK, handoffs also often mean switching to a different model. There are several issues requesting switching to non-OpenAI models before(when nest_handoff_history=False):

This PR can address these needs together.

Also, frameworks like LangChain and PydanticAI treat cross-model history as a standard feature by defining an internal normalized message format.

  1. Does it work in real usage?

    Yes, it works for the common case. I included live test code in the PR description, and I tested mixed histories across Gemini, Claude, and OpenAI. Both streaming and non-streaming paths work.

I agree this may not support all patterns, especially when mixing provider-specific hosted tools. But for the typical case with local function calls, it should behave well.

  1. Why removing __fake_id__ and provider_data is enough
    Because we do not define an internal standard format ahead of time (unlike LangChain or PydanticAI), the most compatible approach is: treat OpenAI Responses API items as the baseline, and only add extra fields when we pass through non-OpenAI providers.
  • About id: ResponseInputItemParam does not require id as an input field. The only time we have a real, valid item id is when the item originally came from the Responses API. The SDK injected __fake_id__ is just a placeholder, so removing it before calling the Responses API is safe and makes the payload valid.

  • About provider_data: provider_data exists only to store provider specific data (i.e. non-OpenAI Responses API). The Responses API does not understand it, so stripping it restores compatibility. If a user never uses non-Responses providers, items never get provider_data in the first place, so nothing changes.

So I’m aiming to support cross-model conversation compatibility while keeping changes to the raw item format minimal, to avoid impacting developers who only use the Responses API: keep the baseline format unchanged for Responses-API only users, and only do compatibility cleanup for histories that passed through non-OpenAI providers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the response. Your changes here should be okay for both many of relatively simple cross-provider use cases and OpenAI Responses API only use cases. That said, this project is not committed to fully support the cross-provider use cases. Having this extra logic is okay, but I'd like to have additional code comment clarifying this logic does not guarantee all the use cases across various providers work without issues. Also, as a maintainer, I will continue encouraging users not to mix those. I understand other frameworks may support such nicely, but our Responses API support won't pursue the same. For those use cases, we generally recommend using chat completions + litellm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no rush at all but i would like to have some clarification on the expectation of cross-provider support at the code level. Once it's added, we can merge the changes here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants