-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat(responses)!: improve responses + conversations implementations #3810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR updates the Conversation item related types and improves a couple critical parts of the implemenation: - it creates a streaming output item for the final assistant message output by the model. until now we only added content parts and included that message in the final response. - rewrites the conversation update code completely to account for items other than messages (tool calls, outputs, etc.)
llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py
Outdated
Show resolved
Hide resolved
llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py
Outdated
Show resolved
Hide resolved
af111e9 to
85f7ec0
Compare
| | | ||
| # Fallback to the generic message type as a last resort | ||
| OpenAIResponseMessage, | ||
| | OpenAIResponseOutputMessageMCPCall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to the change I'm proposing at #3385, but there I'm adding diractly OpenAIResponseOutput as a possible input, to always keep both list in sync (based on a comment I received in that PR).
As the other PR is simpler/shorter, perhaps we can go with it and rebase this on top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luis5tb It is actually far more complex, even my changes are only a patch. We need to do a thorough review of everything carefully. My changes here were not motivated by trying to override your PR but came via an independent motivation. Unfortunate your PR went languishing for a long time. I also need to run tests still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luis5tb in fact I went your route first but then looked at OpenAI's definitions. They are very subtly different! It may be you are right but we need to do an automatic thorough check on all Responses type definitions -- there are way too many holes there right now.
|
@github-actions run precommit |
|
⏳ Running pre-commit hooks on PR #3810... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a pass and nothing caught my eyes. Thanks!
|
✅ Pre-commit hooks completed successfully! 🔧 Changes have been committed and pushed to the PR branch. |
| | OpenAIResponseOutputMessageFunctionToolCall | ||
| | OpenAIResponseOutputMessageFileSearchToolCall | ||
| | OpenAIResponseOutputMessageWebSearchToolCall | ||
| | OpenAIResponseOutputMessageFileSearchToolCall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice i was planning to follow up on these after the latest changes. thanks!
franciscojavierarceo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
|
@leseb wtf how did that pre-commit magic work man! This is in my forked Repo! |
442399f to
57b3d14
Compare
|
Still fixing some responses tests. Not broken by this change honestly, but still (since these things are not in CI yet)... |
|
Note that CI will appear red on this PR because llamastack/llama-stack-client-python#281 needs to be landed concurrently. |
| messages = await convert_response_input_to_chat_messages(input) | ||
| else: | ||
| # Use stored messages directly and convert only new input | ||
| messages = stored_messages or [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can stored_messages actually be None here?
| filter(lambda x: not isinstance(x, OpenAISystemMessageParam), orchestrator.final_messages) | ||
| ) | ||
| if store: | ||
| # TODO: we really should work off of output_items instead of "final_messages" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this still apply?
followup on llamastack#3810 Signed-off-by: Sébastien Han <seb@redhat.com>
followup on llamastack#3810 Signed-off-by: Sébastien Han <seb@redhat.com>
followup on #3810 Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do? Introduces two main fixes to enhance the stability of Responses API when dealing with tool calling responses and structured outputs. ### Changes Made 1. It added OpenAIResponseOutputMessageMCPCall and ListTools to OpenAIResponseInput but #3810 got merge that did the same in a different way. Still this PR does it in a way that keep the sync between OpenAIResponsesOutput and the allowed objects in OpenAIResponseInput. 2. Add protection in case self.ctx.response_format does not have type attribute BREAKING CHANGE: OpenAIResponseInput now uses OpenAIResponseOutput union type. This is semantically equivalent - all previously accepted types are still supported via the OpenAIResponseOutput union. This improves type consistency and maintainability.
This PR updates the Conversation item related types and improves a
couple critical parts of the implemenation:
it creates a streaming output item for the final assistant message output by
the model. until now we only added content parts and included that
message in the final response.
rewrites the conversation update code completely to account for items
other than messages (tool calls, outputs, etc.)
Test Plan
Used the test script from llamastack/llama-stack-client-python#281 for this