-
Notifications
You must be signed in to change notification settings - Fork 3k
feat: #636 Add human-in-the-loop (HITL) support to the SDK #2230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Michael James Schock <m@mjschock.com>
|
I've finished the basic pattern testing, but there may still be some uncovered cases. If anyone is interested in trying this feature early using this git branch, your feedback would be greatly appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review the whole changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Thanks for continuing the work @seratch! I've created this script to reproduce three error scenarios I'm seeing against the issue-636-hitl-2 branch. The third one tests for cross-language (openai-agents-python to openai-agents-js) RunState compatibility which was a constraint that I was running with but there may not be a real use case for it. Here's the output I'm seeing: OPENAI_API_KEY="sk-proj-XYZ" uv run issue-636-hitl-2/main.py
Error Reproduction Script for issue-636-hitl-2
================================================================================
================================================================================
TEST 1: Streaming 'Item already in conversation' error
================================================================================
Running first 4 inputs with streaming (session_id: conv_694c81cde284819786f65852689e40db039f57c89e3c3381)...
Processing input 1/4...
Saving 1 input items to session before model call (turn=1, sample types=['message'])
Saved 1 input items
Processing input 2/4...
Saving 3 input items to session before model call (turn=1, sample types=['message', 'message', 'message'])
✓ SUCCESS: Reproduced error during streaming: BadRequestError: Error code: 400 - {'error': {'message': 'Item already in conversation', 'type': 'invalid_request_error', 'param': 'items', 'code': 'item_already_in_conversation'}}
================================================================================
TEST 2: AssertionError for item count mismatch
================================================================================
Running all 8 inputs without streaming (session_id: conv_694c81d0e36c819390bb9e770465036b04250c7c8b59eec2)...
Final item count: 22
Expected: 20
✓ Test 2 AssertionError reproduced: Expected 20 items, got 22
This is the expected error for test 2.
================================================================================
TEST 3: ZodError when TypeScript loads Python state
================================================================================
Running first 4 inputs without streaming (session_id: conv_694c81ed3ecc8195a6df4addcd73ce870d8b95c8dafd79c0)...
Processing input 1/4...
Processing input 2/4...
Processing input 3/4...
Processing input 4/4...
Interruption occurred at input 4, saving state...
State saved to /home/mjschock/Projects/Timestep-AI/timestep/issue-636-hitl-2/test3_state_conv_694c81ed3ecc8195a6df4addcd73ce870d8b95c8dafd79c0.json
Calling TypeScript loader to load state (this should trigger ZodError naturally)...
Installing @openai/agents from npm...
✓ SUCCESS: Reproduced ZodError or validation error
Error output:
ZodError detected (this is the expected error):
[
{
"expected": "string",
"code": "invalid_type",
"path": [
"generatedItems",
1,
"rawItem",
"name"
],
"message": "Invalid input: expected string, received undefined"
}
]
================================================================================
Error Reproduction Summary
================================================================================
✓ 1. Streaming "Item already in conversation" error
✓ 2. AssertionError for item count mismatch
✓ 3. ZodError when TypeScript loads Python state
3/3 error scenarios replicated
================================================================================ |
|
@mjschock Thanks! The 3rd test pattern is not supported. The session data format is not compatible between TS and Python. |
This pull request resolves #636 by adding a Human-in-the-Loop (HITL) feature to the Python SDK, following a design similar to the TS SDK: https://openai.github.io/openai-agents-js/guides/human-in-the-loop/
Huge thanks to #2021, which served as the foundation for this PR.
Key changes include:
RunStateserialization and resume pipeline (RunResult.to_state/RunState.from_json), along with approval and rejection helpers, enablingRunner.runto pause for HITL decisions and resume interrupted runs.RunContextwhile preserving usage and turn metadata.previous_response_idandconversation_idtracking in sync for server-backed sessions.Examples
Handling interruptions
Streaming mode:
Enable HITL for function tools
Simplest example:
Passing function:
Enable HITL for hosted/local MCP server tools
Agents as tools
When you turn an agent into a tool for a different agent, you can set HITL to the sub agent run. Not only that, the HITL settings for the sub agent's tools are merged into the result.interruptions as well.
Shell tools
Realtime agents
When an approval is asked, your app will receive "tool_approval_required" events. Your app can display a confirmation popup etc. to the user.