feat: #636 Add human-in-the-loop (HITL) support to the SDK #2230

seratch · 2025-12-24T03:40:58Z

This pull request resolves #636 by adding a Human-in-the-Loop (HITL) feature to the Python SDK, following a design similar to the TS SDK: https://openai.github.io/openai-agents-js/guides/human-in-the-loop/

Huge thanks to #2021, which served as the foundation for this PR.

Key changes include:

Introduces a RunState serialization and resume pipeline (RunResult.to_state / RunState.from_json), along with approval and rejection helpers, enabling Runner.run to pause for HITL decisions and resume interrupted runs.
Refactors the agent runner to surface tool-approval interruptions across function tools, apply_patch and shell execution, handoffs, and hosted MCP or local MCP tools. Per-tool decisions are tracked in RunContext while preserving usage and turn metadata.
Extends realtime agents' tool flows to emit approval-required events and honor approval policies.
HITL works while keeping previous_response_id and conversation_id tracking in sync for server-backed sessions.
Adds HITL-focused examples (agent patterns, memory sessions, hosted MCP with on-approval, realtime UI), along with new MCP tool filter, remote, and SSE samples and sample files.
Expands test coverage and helper utilities for HITL flows, run-state serialization, shell and apply_patch handling, run-step processing, MCP approvals, and related error scenarios.

Examples

Handling interruptions

result = await Runner.run(agent, "What is the weather and temperature in Oakland?")
while len(result.interruptions) > 0:
    # Process each interruption
    state = result.to_state()
    for interruption in result.interruptions:
        confirmed = await confirm("\nDo you approve this tool call?")
        if confirmed:
            print(f"✓ Approved: {interruption.name}")
            state.approve(interruption)
        else:
            print(f"✗ Rejected: {interruption.name}")
            state.reject(interruption)

    print("\nResuming agent execution...")
    result = await Runner.run(agent, state)

Streaming mode:

# Stream the run and drain events before checking interruptions.
result = Runner.run_streamed(agent, "What is the weather and temperature in Oakland?")
async for _ in result.stream_events():
    pass

while result.interruptions:
    state = result.to_state()
    for interruption in result.interruptions:
        confirmed = await confirm("\nDo you approve this tool call?")
        if confirmed:
            print(f"✓ Approved: {interruption.name}")
            state.approve(interruption)
        else:
            print(f"✗ Rejected: {interruption.name}")
            state.reject(interruption)

    print("\nResuming agent execution (streamed)...")
    result = Runner.run_streamed(agent, state)
    async for _ in result.stream_events():
        pass

Enable HITL for function tools

Simplest example:

@function_tool(needs_approval=True)
async def update_seat(confirmation_number: str, new_seat: str) -> str:

Passing function:

async def _needs_temperature_approval(_ctx, params, _call_id) -> bool:
    return "Oakland" in params.get("city", "")

@function_tool(needs_approval=_needs_temperature_approval)
async def get_temperature(city: str) -> str:

Enable HITL for hosted/local MCP server tools

agent = Agent(
    name="MCP Assistant",
    instructions="....",
    tools=[
        HostedMCPTool(
            tool_config={
                "type": "mcp",
                "server_label": "deepwiki",
                "server_url": "https://mcp.deepwiki.com/sse",
                # Add this
                "require_approval": "always",  # or "never"
                # more granular control
                # "require_approval": {"always": {"tool_names": ["do_something", "send_something"]}},
            }
        )
    ],
)

Agents as tools

When you turn an agent into a tool for a different agent, you can set HITL to the sub agent run. Not only that, the HITL settings for the sub agent's tools are merged into the result.interruptions as well.

class UserContext(BaseModel):
    user_id: str

# get approval during the sub agent execution
@function_tool(needs_approval=True)
async def get_user_name(user_id: str) -> str:
    return lookup_user_name(user_id)

contract_expert = Agent[UserContext](
    name="contract expert",
    instructions="You are a contract expert agent. You are responsible for handling the contract requests.",
    model_settings=ModelSettings(tool_choice="required"),
    tools=[get_user_name],
)
main_agent = Agent(
    name="customer service agent",
    instructions="You are a customer service agent. You are responsible for handling the customer's requests.",
    tools=[
        contract_expert.as_tool(
            tool_name="help_with_contract",
            tool_description="Help the customer with their contract questions",
            # get approval for this agent execution
            needs_approval=True,
        ),
    ],
)

Shell tools

agent = Agent(
    name="Shell HITL Assistant",
    model="gpt-5.2",
    instructions="You can run shell commands using the shell tool.",
    tools=[
        # ShellExecutor runs local shell commands when the call is approved.
        ShellTool(executor=ShellExecutor(), needs_approval=True)
    ],
)
result = await Runner.run(agent, prompt)
while result.interruptions:
    state = result.to_state()
    for interruption in result.interruptions:
        commands = _extract_commands(interruption)
        approved, always = await prompt_shell_approval(commands)
        if approved:
            state.approve(interruption, always_approve=always)
        else:
            state.reject(interruption, always_reject=always)

    result = await Runner.run(agent, state)

print(f"\nFinal response:\n{result.final_output}")

Realtime agents

When an approval is asked, your app will receive "tool_approval_required" events. Your app can display a confirmation popup etc. to the user.

async def approve_tool_call(self, session_id: str, call_id: str, *, always: bool = False):
    """Approve a pending tool call for a session."""
    session = self.active_sessions.get(session_id)
    if not session:
        return
    await session.approve_tool_call(call_id, always=always)

async def reject_tool_call(self, session_id: str, call_id: str, *, always: bool = False):
    """Reject a pending tool call for a session."""
    session = self.active_sessions.get(session_id)
    if not session:
        return
    await session.reject_tool_call(call_id, always=always)

Co-authored-by: Michael James Schock <m@mjschock.com>

seratch · 2025-12-24T04:39:11Z

I've finished the basic pattern testing, but there may still be some uncovered cases. If anyone is interested in trying this feature early using this git branch, your feedback would be greatly appreciated.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/agents/run_context.py

seratch · 2025-12-24T07:29:19Z

@codex review the whole changes

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/agents/_run_impl.py

mjschock · 2025-12-24T23:59:07Z

Thanks for continuing the work @seratch!

I've created this script to reproduce three error scenarios I'm seeing against the issue-636-hitl-2 branch. The third one tests for cross-language (openai-agents-python to openai-agents-js) RunState compatibility which was a constraint that I was running with but there may not be a real use case for it.

Here's the output I'm seeing:

OPENAI_API_KEY="sk-proj-XYZ" uv run issue-636-hitl-2/main.py
Error Reproduction Script for issue-636-hitl-2
================================================================================

================================================================================
TEST 1: Streaming 'Item already in conversation' error
================================================================================
Running first 4 inputs with streaming (session_id: conv_694c81cde284819786f65852689e40db039f57c89e3c3381)...
 Processing input 1/4...
Saving 1 input items to session before model call (turn=1, sample types=['message'])
Saved 1 input items
 Processing input 2/4...
Saving 3 input items to session before model call (turn=1, sample types=['message', 'message', 'message'])
 ✓ SUCCESS: Reproduced error during streaming: BadRequestError: Error code: 400 - {'error': {'message': 'Item already in conversation', 'type': 'invalid_request_error', 'param': 'items', 'code': 'item_already_in_conversation'}}

================================================================================
TEST 2: AssertionError for item count mismatch
================================================================================
Running all 8 inputs without streaming (session_id: conv_694c81d0e36c819390bb9e770465036b04250c7c8b59eec2)...
 Final item count: 22
 Expected: 20

✓ Test 2 AssertionError reproduced: Expected 20 items, got 22
This is the expected error for test 2.

================================================================================
TEST 3: ZodError when TypeScript loads Python state
================================================================================
Running first 4 inputs without streaming (session_id: conv_694c81ed3ecc8195a6df4addcd73ce870d8b95c8dafd79c0)...
 Processing input 1/4...
 Processing input 2/4...
 Processing input 3/4...
 Processing input 4/4...
 Interruption occurred at input 4, saving state...
 State saved to /home/mjschock/Projects/Timestep-AI/timestep/issue-636-hitl-2/test3_state_conv_694c81ed3ecc8195a6df4addcd73ce870d8b95c8dafd79c0.json
 Calling TypeScript loader to load state (this should trigger ZodError naturally)...
 Installing @openai/agents from npm...
 ✓ SUCCESS: Reproduced ZodError or validation error
 Error output:
ZodError detected (this is the expected error):
[
 {
   "expected": "string",
   "code": "invalid_type",
   "path": [
     "generatedItems",
     1,
     "rawItem",
     "name"
   ],
   "message": "Invalid input: expected string, received undefined"
 }
]


================================================================================
Error Reproduction Summary
================================================================================
✓ 1. Streaming "Item already in conversation" error
✓ 2. AssertionError for item count mismatch
✓ 3. ZodError when TypeScript loads Python state

3/3 error scenarios replicated
================================================================================

seratch · 2025-12-25T09:07:51Z

@mjschock Thanks! The 3rd test pattern is not supported. The session data format is not compatible between TS and Python.

src/agents/agent.py

feat: #636 Add human-in-the-loop (HITL) support to the SDK

c110cac

Co-authored-by: Michael James Schock <m@mjschock.com>

seratch added this to the 0.7.x milestone Dec 24, 2025

seratch self-assigned this Dec 24, 2025

seratch added enhancement New feature or request feature:core feature:mcp feature:realtime feature:sessions labels Dec 24, 2025

seratch added 3 commits December 24, 2025 13:24

fix test issues

db30227

fix python 3.9 test issue

cb774e9

fix lint issue in tests

a8b280e

seratch marked this pull request as ready for review December 24, 2025 04:37

chatgpt-codex-connector bot reviewed Dec 24, 2025

View reviewed changes

src/agents/run_context.py Outdated Show resolved Hide resolved

fix a bug pointed out by codex review

d7cf51d

seratch added a commit that referenced this pull request Dec 24, 2025

docs: Add a new document page for #2230 changes

8474022

seratch mentioned this pull request Dec 24, 2025

docs: Add a new document page for #2230 changes #2231

Open

chatgpt-codex-connector bot reviewed Dec 24, 2025

View reviewed changes

src/agents/_run_impl.py Outdated Show resolved Hide resolved

seratch added 3 commits December 24, 2025 16:49

fix the issue pointed out by codex review

2081ce5

rename internal class

591c1f4

fix local review issue

3406677

split _run_impl.py into run_internal/

4657f1d

habema reviewed Dec 25, 2025

View reviewed changes

src/agents/agent.py Outdated Show resolved Hide resolved

fix the issue reported by #2230 (comment)

5a1f8c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: #636 Add human-in-the-loop (HITL) support to the SDK #2230

feat: #636 Add human-in-the-loop (HITL) support to the SDK #2230

Uh oh!

seratch commented Dec 24, 2025

Uh oh!

seratch commented Dec 24, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

seratch commented Dec 24, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

mjschock commented Dec 24, 2025 •

edited

Loading

Uh oh!

seratch commented Dec 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: #636 Add human-in-the-loop (HITL) support to the SDK #2230

Are you sure you want to change the base?

feat: #636 Add human-in-the-loop (HITL) support to the SDK #2230

Uh oh!

Conversation

seratch commented Dec 24, 2025

Examples

Handling interruptions

Streaming mode:

Enable HITL for function tools

Simplest example:

Passing function:

Enable HITL for hosted/local MCP server tools

Agents as tools

Shell tools

Realtime agents

Uh oh!

seratch commented Dec 24, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

seratch commented Dec 24, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mjschock commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seratch commented Dec 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mjschock commented Dec 24, 2025 •

edited

Loading