Skip to content

Comments

fix: HITL session persistence, tool events, and breakpoint interaction#189

Merged
cristipufu merged 2 commits intomainfrom
fix/hitl-session-persistence
Feb 21, 2026
Merged

fix: HITL session persistence, tool events, and breakpoint interaction#189
cristipufu merged 2 commits intomainfrom
fix/hitl-session-persistence

Conversation

@cristipufu
Copy link
Member

@cristipufu cristipufu commented Feb 21, 2026

Summary

Commit 1: Persist session after HITL resume and emit tool completed events

  • After HITL resume, checkpoint restore creates separate session copies per executor. Extract the most complete session (highest message count) and persist it to KV storage so the next turn has valid history.
  • Handle AgentExecutorResponse wrapper in _extract_tool_state_events and _extract_contents so function_result from executor_completed data is properly found.
  • Emit ToolCallEnd in close_message() for pending tool calls interrupted by HITL suspension (clears stale _pending_tool_calls state).
  • Track pending tool nodes (STARTED without COMPLETED) across stream iterations and synthesize COMPLETED events on HITL resume.

Commit 2: Prevent breakpoint+HITL infinite loop and duplicate events

  • Load breakpoint skip counts on HITL resume so the executor passes through without re-firing the breakpoint (prevents the infinite breakpoint -> HITL -> breakpoint loop on same node).
  • Clear _last_checkpoint_id after loading breakpoint state to prevent the next fresh turn from being mistaken for a breakpoint resume.
  • Detect stale checkpoints from previous turns by capturing a baseline before workflow.run() and comparing after breakpoint fires.
  • Fall back to _resumed_from_checkpoint_id (instead of None) when no new checkpoint was created, preventing replay from scratch that caused duplicate handoff_to_billing_agent events (4x).

Test plan

  • test_no_duplicate_tool_calls_on_breakpoint_resume — breakpoints=* + HITL: exactly 1 handoff_to_billing_agent (was 4x before fix)
  • test_second_turn_after_breakpoint_and_hitl — multi-turn stale checkpoint detection still works
  • test_breakpoint_then_hitl_does_not_loop — breakpoint + HITL on same node completes without infinite loop
  • test_breakpoint_on_all_nodes_with_hitl — breakpoints=* + HITL completes successfully
  • test_tool_node_completed_after_hitl_resume — tool nodes emit both STARTED and COMPLETED across HITL
  • All 12 HITL E2E tests pass
  • All 15 streaming tests pass

🤖 Generated with Claude Code

- After HITL resume, checkpoint restore creates separate session copies
  per executor. Extract the most complete session (highest message count)
  and persist it to KV storage so the next turn has valid history.
- Handle AgentExecutorResponse wrapper in _extract_tool_state_events and
  _extract_contents so function_result from executor_completed data is
  properly found.
- Emit ToolCallEnd in close_message() for pending tool calls interrupted
  by HITL suspension (clears stale _pending_tool_calls state).
- Track pending tool nodes (STARTED without COMPLETED) across stream
  iterations and synthesize COMPLETED events on HITL resume.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cristipufu cristipufu force-pushed the fix/hitl-session-persistence branch from 27a8882 to e686eaf Compare February 21, 2026 12:45
@cristipufu cristipufu changed the title fix: preserve checkpoint on breakpoint resume to prevent duplicate events fix: HITL session persistence, tool events, and breakpoint interaction Feb 21, 2026
@cristipufu cristipufu force-pushed the fix/hitl-session-persistence branch from e686eaf to 6264039 Compare February 21, 2026 12:48
@cristipufu cristipufu requested a review from Copilot February 21, 2026 12:48
@cristipufu cristipufu self-assigned this Feb 21, 2026
@cristipufu cristipufu force-pushed the fix/hitl-session-persistence branch from 6264039 to 6cb6f2b Compare February 21, 2026 12:50
- Load breakpoint skip counts on HITL resume so the executor passes
  through without re-firing the breakpoint (prevents the infinite
  breakpoint → HITL → breakpoint loop on same node)
- Clear _last_checkpoint_id after loading breakpoint state to prevent
  the next fresh turn from being mistaken for a breakpoint resume
- Detect stale checkpoints from previous turns by capturing a baseline
  before workflow.run() and comparing after breakpoint fires
- Fall back to _resumed_from_checkpoint_id (instead of None) when no
  new checkpoint was created, preventing replay from scratch that
  caused duplicate handoff_to_billing_agent events (4x)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cristipufu cristipufu force-pushed the fix/hitl-session-persistence branch from 6cb6f2b to 83c36bf Compare February 21, 2026 12:52

This comment was marked as outdated.

@cristipufu cristipufu merged commit 152ffcd into main Feb 21, 2026
46 checks passed
@cristipufu cristipufu deleted the fix/hitl-session-persistence branch February 21, 2026 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant