Skip to content

Conversation

@toubatbrian
Copy link
Contributor

@toubatbrian toubatbrian commented Jan 15, 2026

  • Add skipNextEventIf() method for conditional event skipping based on type/criteria
  • Add EventRangeAssert class with containsFunctionCall(), containsMessage(), containsFunctionCallOutput(), containsAgentHandoff() methods for searching within event ranges
  • Add range() method on RunAssert for subset event searches with Python-like slice semantics
  • Add judge() method on MessageAssert for LLM-based semantic evaluation of message intent
  • Export EventRangeAssert from testing module
  • Update @livekit/rtc-node to ^1.0.0-alpha.1 for ParticipantKind.CONNECTOR support
  • Add comprehensive integration tests covering all new assertion utilities

Test plan

  • Run pnpm vitest run examples/src/testing/run_result.test.ts - all 28 tests pass
  • Verify skipNextEventIf() correctly skips matching events and returns undefined for non-matches
  • Verify range() and contains*() methods find events within specified ranges
  • Verify judge() passes/fails based on semantic intent evaluation
  • Verify negative indices work correctly with at() and range()

Summary by CodeRabbit

  • New Features
    • Added advanced test utilities for agent testing, including range-scoped event assertions for more flexible validation
    • Introduced LLM-powered intent evaluation for message assertions
    • Added conditional event skipping and range-based searching capabilities
    • Expanded test examples demonstrating multi-tool agent workflows and comprehensive assertion patterns

✏️ Tip: You can customize this high-level summary in your review settings.

@changeset-bot
Copy link

changeset-bot bot commented Jan 15, 2026

🦋 Changeset detected

Latest commit: 0146768

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

This change introduces advanced test utilities for the LiveKit agents voice framework. New assertions enable range-scoped event validation, conditional event skipping, and LLM-based intent judgment on messages. A new EventRangeAssert class provides scoped assertion methods, while RunAssert gains range(), skipNextEventIf(), and convenience helpers. MessageAssert includes a judge() method for LLM-evaluated intent validation.

Changes

Cohort / File(s) Summary
Changeset
.changeset/sour-islands-cheer.md
New changeset entry documenting patch version for advanced test utilities
Testing Framework Exports
agents/src/voice/testing/index.ts
Added EventRangeAssert re-export to public API
Core Test Utilities
agents/src/voice/testing/run_result.ts
Introduced EventRangeAssert class with range-scoped assertion methods; enhanced RunAssert with skipNextEventIf(), range(), and convenience contains\* methods; added judge() method to MessageAssert for LLM-based intent evaluation; expanded imports for zod, ChatRole, LLM, and tool\_context
Test Coverage
examples/src/testing/run_result.test.ts
Comprehensive test suite covering new assertion capabilities: range-based queries, conditional skipping, negative indexing, LLM judgment, and multi-tool agent workflows with restaurant ordering scenario

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Code
    participant MA as MessageAssert
    participant LLM as LLM (gpt-4o-mini)
    participant Tool as check_intent Tool
    
    Test->>MA: judge(llm, { intent: "..." })
    activate MA
    
    MA->>MA: Extract message content
    MA->>MA: Validate inputs
    
    MA->>LLM: Create message with check_intent tool
    activate LLM
    
    LLM->>Tool: Invoke check_intent(content, intent)
    activate Tool
    
    Tool-->>LLM: Tool result with validation
    deactivate Tool
    
    LLM-->>MA: Streaming tool call result
    deactivate LLM
    
    MA->>MA: Parse and verify result
    MA-->>Test: Return MessageAssert
    deactivate MA
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Poem

🐰 A range of assertions hops into view,
Skipping events with conditions so true,
The judge whispers to LLM with care,
Intent validation floats through the air,
Testing utilities multiply with flair! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main addition: advanced assertion utilities for the agent testing framework.
Description check ✅ Passed The PR description comprehensively covers all major changes with bullet points, includes a detailed test plan with verification results, and aligns well with the template requirements.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
examples/src/testing/run_result.test.ts (1)

147-147: Use full model name for LLM testing.

As per coding guidelines, when testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini').

Suggested fix
-    llmInstance = new openai.LLM({ model: 'gpt-4o-mini', temperature: 0 });
+    llmInstance = new openai.LLM({ model: 'openai/gpt-4o-mini', temperature: 0 });

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df523ab and 0146768.

📒 Files selected for processing (4)
  • .changeset/sour-islands-cheer.md
  • agents/src/voice/testing/index.ts
  • agents/src/voice/testing/run_result.ts
  • examples/src/testing/run_result.test.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

  • examples/src/testing/run_result.test.ts
  • agents/src/voice/testing/run_result.ts
  • agents/src/voice/testing/index.ts
**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

  • examples/src/testing/run_result.test.ts
  • agents/src/voice/testing/run_result.ts
  • agents/src/voice/testing/index.ts
**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

  • examples/src/testing/run_result.test.ts
  • agents/src/voice/testing/run_result.ts
  • agents/src/voice/testing/index.ts
**/{examples,test}/**/*.test.ts

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Include both basic streaming and tool calling tests to verify full LLM functionality

Files:

  • examples/src/testing/run_result.test.ts
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/{examples,test}/**/*.test.ts : Include both basic streaming and tool calling tests to verify full LLM functionality
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/{examples,test}/**/*.test.ts : Include both basic streaming and tool calling tests to verify full LLM functionality

Applied to files:

  • examples/src/testing/run_result.test.ts
  • agents/src/voice/testing/run_result.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx}?(test|example|spec) : When testing inference LLM, always use full model names from `agents/src/inference/models.ts` (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Applied to files:

  • examples/src/testing/run_result.test.ts
  • agents/src/voice/testing/run_result.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to examples/src/test_*.ts : For plugin component debugging (STT, TTS, LLM), create test example files prefixed with `test_` under the examples directory and run with `pnpm build && node ./examples/src/test_my_plugin.ts`

Applied to files:

  • examples/src/testing/run_result.test.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/*.{ts,tsx}?(test|example) : Initialize logger before using any LLM functionality with `initializeLogger({ pretty: true })` from 'livekit/agents'

Applied to files:

  • examples/src/testing/run_result.test.ts
  • agents/src/voice/testing/run_result.ts
📚 Learning: 2026-01-16T14:33:39.551Z
Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Use `pnpm build && pnpm dlx tsx ./examples/src/my_agent.ts dev|download-files --log-level=debug|info(default)` to run example agents from the examples directory

Applied to files:

  • examples/src/testing/run_result.test.ts
🧬 Code graph analysis (2)
examples/src/testing/run_result.test.ts (1)
agents/src/voice/testing/run_result.ts (2)
  • RunResult (47-225)
  • expect (76-91)
agents/src/voice/testing/run_result.ts (1)
agents/src/voice/testing/types.ts (5)
  • FunctionCallAssertOptions (94-97)
  • MessageAssertOptions (87-89)
  • FunctionCallOutputAssertOptions (102-105)
  • AgentHandoffAssertOptions (110-113)
  • RunEvent (50-54)
🔇 Additional comments (11)
.changeset/sour-islands-cheer.md (1)

1-5: LGTM!

Changeset is properly formatted with the correct package name and patch version bump appropriate for additive test utility features.

examples/src/testing/run_result.test.ts (2)

34-140: Well-designed test agent with diverse tools.

The TestAgent class provides excellent coverage for testing various assertion scenarios with a good mix of tools (weather, time, ordering system). The zod schemas are properly defined with appropriate descriptions.


156-467: Comprehensive test coverage for assertion utilities.

The tests thoroughly cover all new assertion methods including skipNextEventIf(), range(), contains*() methods, and judge() with both success and failure cases. Error messages are properly verified. Based on learnings, this includes both basic streaming and tool calling tests as required.

agents/src/voice/testing/run_result.ts (7)

30-32: LGTM!

The AgentConstructor type alias is appropriately defined with the eslint disable comment for the necessary any usage.


313-365: LGTM!

The skipNextEventIf implementation correctly handles all event types with proper cursor management. The try-catch approach elegantly reuses existing assertion logic while returning undefined for non-matches.


382-387: LGTM!

The range() method correctly implements Python-like slice semantics with proper defaults.


603-709: LGTM!

The EventRangeAssert class provides a clean API for range-based event searches. The implementation correctly tracks original indices for accurate error reporting and reuses existing assertion logic via try-catch.


740-833: Well-implemented LLM-based judgment capability.

The judge() method provides a clean API for semantic evaluation. The tool-based approach ensures structured output. The error handling and verbose logging are appropriate.

One minor observation: the content extraction at line 749 filters for string content only. If message content contains other types (like tool results), they would be excluded, which seems intentional for judging natural language responses.


151-157: LGTM!

Clean event creation without unnecessary type assertions.


803-816: Verify how the LLM SDK streams tool call arguments before relying on the try-catch pattern.

The current approach of parsing toolCall.args once per chunk and silently ignoring errors assumes either that the SDK provides complete JSON per chunk or buffers partial arguments. However, OpenAI's documented streaming behavior emits tool call arguments incrementally as partial JSON strings across multiple chunks that must be reassembled. If the LiveKit SDK follows this pattern and provides incremental args (not cumulative), only the final chunk with complete JSON would successfully parse, making earlier parse errors silent no-ops. Confirm:

  • Does the SDK provide cumulative args (each chunk contains complete JSON so far) or incremental args (only delta)?
  • If incremental, accumulate args across chunks by index/id before attempting JSON.parse().
agents/src/voice/testing/index.ts (1)

23-33: LGTM!

The EventRangeAssert export is correctly added, making the new range-based assertion class available as part of the public API.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@toubatbrian toubatbrian requested a review from lukasIO January 16, 2026 18:01
@toubatbrian
Copy link
Contributor Author

@codex

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0146768e66

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +822 to +825
const { success, reason } = toolArgs;

if (!success) {
this._raise(`Judgment failed: ${reason}`);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate judge tool result types before treating as pass

The judge() result uses if (!success) to decide pass/fail without verifying that success is actually a boolean. When the provider doesn’t strictly enforce tool schemas (default strictToolSchema is false), the model can return "success": "false" or another truthy non-boolean; in that case this code treats the judgment as successful and the test incorrectly passes. This can silently corrupt test outcomes for providers/models that serialize booleans as strings. Consider validating the parsed args against the Zod schema (or at least typeof success === 'boolean') before using it.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants