feat(testing): add advanced assertion utilities for agent test framework #976

toubatbrian · 2026-01-15T23:03:36Z

Add skipNextEventIf() method for conditional event skipping based on type/criteria
Add EventRangeAssert class with containsFunctionCall(), containsMessage(), containsFunctionCallOutput(), containsAgentHandoff() methods for searching within event ranges
Add range() method on RunAssert for subset event searches with Python-like slice semantics
Add judge() method on MessageAssert for LLM-based semantic evaluation of message intent
Export EventRangeAssert from testing module
Update @livekit/rtc-node to ^1.0.0-alpha.1 for ParticipantKind.CONNECTOR support
Add comprehensive integration tests covering all new assertion utilities

Test plan

Run pnpm vitest run examples/src/testing/run_result.test.ts - all 28 tests pass
Verify skipNextEventIf() correctly skips matching events and returns undefined for non-matches
Verify range() and contains*() methods find events within specified ranges
Verify judge() passes/fails based on semantic intent evaluation
Verify negative indices work correctly with at() and range()

Summary by CodeRabbit

New Features
- Added advanced test utilities for agent testing, including range-scoped event assertions for more flexible validation
- Introduced LLM-powered intent evaluation for message assertions
- Added conditional event skipping and range-based searching capabilities
- Expanded test examples demonstrating multi-tool agent workflows and comprehensive assertion patterns

_{✏️ Tip: You can customize this high-level summary in your review settings.}

changeset-bot · 2026-01-15T23:03:41Z

🦋 Changeset detected

Latest commit: 0146768

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages

Name	Type
@livekit/agents	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugins-test	Patch
@livekit/agents-plugin-xai	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

examples/src/testing/run_result.test.ts

pnpm-workspace.yaml

coderabbitai · 2026-01-16T17:47:36Z

📝 Walkthrough

Walkthrough

This change introduces advanced test utilities for the LiveKit agents voice framework. New assertions enable range-scoped event validation, conditional event skipping, and LLM-based intent judgment on messages. A new EventRangeAssert class provides scoped assertion methods, while RunAssert gains range(), skipNextEventIf(), and convenience helpers. MessageAssert includes a judge() method for LLM-evaluated intent validation.

Changes

Cohort / File(s)	Summary
Changeset `.changeset/sour-islands-cheer.md`	New changeset entry documenting patch version for advanced test utilities
Testing Framework Exports `agents/src/voice/testing/index.ts`	Added EventRangeAssert re-export to public API
Core Test Utilities `agents/src/voice/testing/run_result.ts`	Introduced EventRangeAssert class with range-scoped assertion methods; enhanced RunAssert with skipNextEventIf(), range(), and convenience contains\* methods; added judge() method to MessageAssert for LLM-based intent evaluation; expanded imports for zod, ChatRole, LLM, and tool\_context
Test Coverage `examples/src/testing/run_result.test.ts`	Comprehensive test suite covering new assertion capabilities: range-based queries, conditional skipping, negative indexing, LLM judgment, and multi-tool agent workflows with restaurant ordering scenario

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Code
    participant MA as MessageAssert
    participant LLM as LLM (gpt-4o-mini)
    participant Tool as check_intent Tool
    
    Test->>MA: judge(llm, { intent: "..." })
    activate MA
    
    MA->>MA: Extract message content
    MA->>MA: Validate inputs
    
    MA->>LLM: Create message with check_intent tool
    activate LLM
    
    LLM->>Tool: Invoke check_intent(content, intent)
    activate Tool
    
    Tool-->>LLM: Tool result with validation
    deactivate Tool
    
    LLM-->>MA: Streaming tool call result
    deactivate LLM
    
    MA->>MA: Parse and verify result
    MA-->>Test: Return MessageAssert
    deactivate MA

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Poem

🐰 A range of assertions hops into view,
Skipping events with conditions so true,
The judge whispers to LLM with care,
Intent validation floats through the air,
Testing utilities multiply with flair! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main addition: advanced assertion utilities for the agent testing framework.
Description check	✅ Passed	The PR description comprehensively covers all major changes with bullet points, includes a detailed test plan with verification results, and aligns well with the template requirements.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

examples/src/testing/run_result.test.ts (1)
147-147: Use full model name for LLM testing.

As per coding guidelines, when testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini').
Suggested fix
-    llmInstance = new openai.LLM({ model: 'gpt-4o-mini', temperature: 0 });
+    llmInstance = new openai.LLM({ model: 'openai/gpt-4o-mini', temperature: 0 });

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df523ab and 0146768.

📒 Files selected for processing (4)

.changeset/sour-islands-cheer.md
agents/src/voice/testing/index.ts
agents/src/voice/testing/run_result.ts
examples/src/testing/run_result.test.ts

🧰 Additional context used

📓 Path-based instructions (4)

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

examples/src/testing/run_result.test.ts
agents/src/voice/testing/run_result.ts
agents/src/voice/testing/index.ts

**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

examples/src/testing/run_result.test.ts
agents/src/voice/testing/run_result.ts
agents/src/voice/testing/index.ts

**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

examples/src/testing/run_result.test.ts
agents/src/voice/testing/run_result.ts
agents/src/voice/testing/index.ts

**/{examples,test}/**/*.test.ts

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Include both basic streaming and tool calling tests to verify full LLM functionality

Files:

examples/src/testing/run_result.test.ts

🧠 Learnings (6)

📓 Common learnings

Learnt from: CR
Repo: livekit/agents-js PR: 0
File: .cursor/rules/agent-core.mdc:0-0
Timestamp: 2026-01-16T14:33:39.551Z
Learning: Applies to **/{examples,test}/**/*.test.ts : Include both basic streaming and tool calling tests to verify full LLM functionality