-
Notifications
You must be signed in to change notification settings - Fork 213
feat(testing): add advanced assertion utilities for agent test framework #976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 0146768 The changes in this PR will be included in the next version bump. This PR includes changesets to release 17 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
📝 WalkthroughWalkthroughThis change introduces advanced test utilities for the LiveKit agents voice framework. New assertions enable range-scoped event validation, conditional event skipping, and LLM-based intent judgment on messages. A new EventRangeAssert class provides scoped assertion methods, while RunAssert gains range(), skipNextEventIf(), and convenience helpers. MessageAssert includes a judge() method for LLM-evaluated intent validation. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Code
participant MA as MessageAssert
participant LLM as LLM (gpt-4o-mini)
participant Tool as check_intent Tool
Test->>MA: judge(llm, { intent: "..." })
activate MA
MA->>MA: Extract message content
MA->>MA: Validate inputs
MA->>LLM: Create message with check_intent tool
activate LLM
LLM->>Tool: Invoke check_intent(content, intent)
activate Tool
Tool-->>LLM: Tool result with validation
deactivate Tool
LLM-->>MA: Streaming tool call result
deactivate LLM
MA->>MA: Parse and verify result
MA-->>Test: Return MessageAssert
deactivate MA
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
🧰 Additional context used📓 Path-based instructions (4)**/*.{ts,tsx,js,jsx}📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
**/*.{ts,tsx}?(test|example|spec)📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
**/*.{ts,tsx}?(test|example)📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
**/{examples,test}/**/*.test.ts📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
🧠 Learnings (6)📓 Common learnings📚 Learning: 2026-01-16T14:33:39.551ZApplied to files:
📚 Learning: 2026-01-16T14:33:39.551ZApplied to files:
📚 Learning: 2026-01-16T14:33:39.551ZApplied to files:
📚 Learning: 2026-01-16T14:33:39.551ZApplied to files:
📚 Learning: 2026-01-16T14:33:39.551ZApplied to files:
🧬 Code graph analysis (2)examples/src/testing/run_result.test.ts (1)
agents/src/voice/testing/run_result.ts (1)
🔇 Additional comments (11)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0146768e66
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const { success, reason } = toolArgs; | ||
|
|
||
| if (!success) { | ||
| this._raise(`Judgment failed: ${reason}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate judge tool result types before treating as pass
The judge() result uses if (!success) to decide pass/fail without verifying that success is actually a boolean. When the provider doesn’t strictly enforce tool schemas (default strictToolSchema is false), the model can return "success": "false" or another truthy non-boolean; in that case this code treats the judgment as successful and the test incorrectly passes. This can silently corrupt test outcomes for providers/models that serialize booleans as strings. Consider validating the parsed args against the Zod schema (or at least typeof success === 'boolean') before using it.
Useful? React with 👍 / 👎.
skipNextEventIf()method for conditional event skipping based on type/criteriaEventRangeAssertclass withcontainsFunctionCall(),containsMessage(),containsFunctionCallOutput(),containsAgentHandoff()methods for searching within event rangesrange()method onRunAssertfor subset event searches with Python-like slice semanticsjudge()method onMessageAssertfor LLM-based semantic evaluation of message intentEventRangeAssertfrom testing module@livekit/rtc-nodeto^1.0.0-alpha.1forParticipantKind.CONNECTORsupportTest plan
pnpm vitest run examples/src/testing/run_result.test.ts- all 28 tests passskipNextEventIf()correctly skips matching events and returns undefined for non-matchesrange()andcontains*()methods find events within specified rangesjudge()passes/fails based on semantic intent evaluationat()andrange()Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.