fix(test): enable shuffle mode and fix test isolation bugs #6601

mldangelo · 2025-12-10T18:56:06Z

Summary

Enable test shuffle by default in vitest config (sequence: { shuffle: true })
Fix test isolation bugs across 9 test files that caused failures when tests ran in different orders
Update test/AGENTS.md documentation with mock isolation best practices

Test isolation fixes

File	Issue	Fix
`load.test.ts`	`path.parse` and `mockDereference` mocks persisted	Added `mockReset()` in 4 describe blocks
`iterative.test.ts`	4 hoisted mocks retained implementations	Added `mockReset()` for all hoisted mocks
`modelScan.test.ts`	Comprehensive mock pollution	Reset spawn, ModelAudit, HuggingFace mocks in each describe
`evaluator.test.ts`	`runExtensionHook` mock in 3 describe blocks	Added `mockReset()` + restore default implementation
`generate.test.ts`	`resolveConfigs` mock had no default	Added explicit mock setup in nested describe
`updates.test.ts`	`PROMPTFOO_DISABLE_UPDATE` env var pollution	Added `delete process.env.X` in beforeEach
`accounts.test.ts`	`readGlobalConfig` mock state leaked	Added explicit mock setup in test
`python.test.ts`	`path.resolve`/`path.extname` mocks missing	Added mock setup to 2 tests
`testCaseReader.test.ts`	Module cache pollution with xlsx	Call `resetModules()` before `doMock`
`watsonx.test.ts`	`WatsonXAI.newInstance` mock missing	Added mock setup in cached response test

Key learnings documented in AGENTS.md

vi.clearAllMocks() only clears call history, NOT mockImplementation() - use mockReset()
mockResolvedValueOnce() queues survive clearAllMocks() - use mockReset() to clear
Environment variables are shared state - explicitly delete in beforeEach
Module cache can cause vi.importActual to return mocked modules - call resetModules() first

Test plan

Verified with 50+ unique random seeds (9403 tests each run)
All 493 test files pass consistently regardless of execution order
Tests still pass with shuffle disabled (--sequence.shuffle=false)

Add proper mock resets in beforeEach blocks to ensure tests are isolated and can run in any order with --sequence.shuffle=true. Key changes: - test/util/config/load.test.ts: Reset path.parse mock to actual implementation in beforeEach for combineConfigs, resolveConfigs, and resolveConfigs with external defaultTest blocks - test/redteam/providers/iterative.test.ts: Add mockReset() for hoisted mocks (mockGetProvider, mockGetTargetResponse, mockCheckPenalizedPhrases, mockGetGraderById) since clearAllMocks only clears call history, not mockReturnValue implementations - test/commands/modelScan.test.ts: Reset spawn, getModelAuditCurrentVersion, ModelAudit, and HuggingFace mocks in beforeEach for all describe blocks The root cause was that vi.clearAllMocks() only clears call history but doesn't reset mockReturnValue/mockResolvedValue implementations. When tests set these values, they persist across tests unless explicitly reset with mockReset() or mockImplementation(). Fixes #2265 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add sequence.shuffle: true to both vitest.config.ts and vitest.integration.config.ts to catch test isolation issues early. This is the staff engineer approach: - Single source of truth in config (not scattered across scripts) - Applies to all test runs (local, CI, watch mode) - Self-documenting with clear comments - Override-able with --sequence.shuffle=false for debugging Also updated test/AGENTS.md with: - Documentation about shuffle being enabled by default - Critical mock isolation guidance (vi.clearAllMocks vs mockReset) - Override flags for debugging 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Reset runExtensionHook mock in defaultTest normalization tests - Add resolveConfigs mock setup in doGenerateRedteam external defaultTest tests Both issues were caused by tests relying on mock implementations set by previous tests, which vi.clearAllMocks() doesn't reset.

- Add runExtensionHook mock reset to main evaluator describe block - Add fetchWithTimeout mock reset to checkForUpdates describe block - Clear PROMPTFOO_DISABLE_UPDATE env var in checkForUpdates beforeEach Environment variables set by tests in one describe block were leaking to tests in other describe blocks when shuffle was enabled.

- evaluator.test.ts: Add runExtensionHook mock reset to defaultTest merging describe block - accounts.test.ts: Add readGlobalConfig mock setup in setUserEmail test - python.test.ts: Add path.resolve/path.extname mocks to 2 tests that relied on earlier test state - testCaseReader.test.ts: Fix xlsx module mock by calling resetModules before doMock These fixes ensure tests pass consistently regardless of execution order when running with shuffle enabled (vitest --sequence.shuffle=true). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

use-tusk · 2025-12-10T18:56:17Z

⏩ No test execution environment matched (86c8d28) View output ↗

View check history

Commit	Status	Output	Created (UTC)
`aa45d45`	⏩ No test execution environment matched	Output	Dec 10, 2025 6:56PM
`258d2aa`	⏩ No test execution environment matched	Output	Dec 10, 2025 7:01PM
`6e10106`	⏩ No test execution environment matched	Output	Dec 10, 2025 7:11PM
`ae0898b`	⏩ No test execution environment matched	Output	Dec 10, 2025 7:38PM
`3f52009`	⏩ No test execution environment matched	Output	Dec 10, 2025 9:12PM
`704085f`	⏩ No test execution environment matched	Output	Dec 10, 2025 10:32PM
`2b3ee5f`	⏩ No test execution environment matched	Output	Dec 10, 2025 10:54PM
`86c8d28`	⏩ No test execution environment matched	Output	Dec 11, 2025 6:59AM

View output in GitHub ↗

promptfoo-scanner

👍 All Clear

I reviewed this PR for LLM security vulnerabilities. The changes focus entirely on test infrastructure improvements - enabling random test execution order and fixing mock isolation issues. No LLM-related code was modified.

_{Minimum severity threshold for this scan: 🟡 Medium | Learn more}

…ation

coderabbitai · 2025-12-10T19:05:14Z

📝 Walkthrough

Walkthrough

This PR implements systematic test isolation improvements across the codebase by enabling random test execution and establishing comprehensive mock reset patterns. Changes include enabling test sequence randomization in Vitest configuration files (vitest.config.ts, vitest.integration.config.ts), adding explicit mock resets in beforeEach hooks across multiple test files (assertions, commands, evaluator, config, redteam, updates, and utilities), converting several beforeEach hooks to async for proper initialization, and documenting best practices for mock isolation in test documentation. The overall objective is preventing test state leakage when tests execute in random order.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

Mock reset consistency: Verify that mock reset patterns are correctly applied across all test files and that hoisted mocks receiving mockReset() calls (rather than clearAllMocks()) match what each test actually uses
Async beforeEach conversions: Confirm that converting beforeEach hooks to async in modelScan.test.ts, load.test.ts, and others doesn't introduce timing issues or race conditions in test execution
Module reset timing in testCaseReader.test.ts: The relocation of vi.resetModules() to occur before fs mock setup rather than after needs verification that it doesn't affect the test flow or module caching behavior
Configuration propagation: Ensure sequence.shuffle = true in both vitest config files is complete and that there are no other test runner entry points that might bypass this setting
Documentation accuracy: Verify that AGENTS.md guidance on mock isolation and disabling randomization flags (--sequence.shuffle=false, --sequence.seed=12345) matches actual Vitest capabilities and usage patterns

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix(test): enable shuffle mode and fix test isolation bugs' directly summarizes the main changes: enabling shuffle mode and fixing test isolation bugs across multiple test files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check	✅ Passed	The pull request description is directly related to the changeset, providing a clear summary of test shuffle enablement, isolation bug fixes, and documentation updates with specific file references and detailed test plan results.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/test-shuffle-isolation

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (10)

test/updates.test.ts (1)

72-79: Stronger per-suite isolation for fetchWithTimeout and PROMPTFOO_DISABLE_UPDATE

Resetting fetchWithTimeout and clearing PROMPTFOO_DISABLE_UPDATE in this beforeEach makes the checkForUpdates suite deterministic and order‑independent, even when other suites queue mockResolvedValueOnce calls or tweak that env var. The extra mockReset on top of the global one is redundant but harmless and keeps the intent local to this block. Based on learnings, this aligns with the isolation guidance in test/AGENTS.md.

test/redteam/commands/generate.test.ts (1)

1497-1526: Resetting resolveConfigs per-suite avoids hoisted mock leakage

Re‑initializing configModule.resolveConfigs in this beforeEach is the right way to stop implementations from other redteam suites leaking into the “external defaultTest” tests, especially with hoisted module mocks and shuffled ordering. The neutral default config you return is minimal but sufficient for these scenarios.

test/evaluator.test.ts (3)

338-347: Resetting runExtensionHook in the main evaluator suite prevents hook leaks

Adding mockReset() followed by the default identity implementation in this beforeEach stops per‑test overrides of runExtensionHook (e.g., in the sessionId tests later in the suite) from leaking across tests. Combined with vi.clearAllMocks(), this gives the evaluator tests a predictable starting hook state under shuffled execution.

4034-4040: Same runExtensionHook reset pattern correctly applied to defaultTest-merging tests

Using the same runExtensionHook reset in the evaluator defaultTest merging suite ensures those tests are not affected by hook behavior from the main evaluator block or vice versa. This is consistent with the isolation goal of the PR and keeps the extension‑related assertions here trustworthy regardless of run order.

4369-4374: Hook normalization is especially important for extension-focused tests

For the defaultTest normalization for extensions suite, normalizing runExtensionHook before each test is critical, since these tests explicitly assert on how extensions manipulate defaultTest. Guaranteeing a clean, array-backed runExtensionHook mock per test avoids very subtle flakiness when other suites modify the same mock. This change aligns nicely with the new AGENTS guidance on extension hooks and defaultTest setup.

test/util/config/load.test.ts (2)

199-217: Path/glob setup in combineConfigs beforeEach improves isolation

The async beforeEach that clears/restores mocks, fixes process.cwd, and rewires globSync + path.parse back to the real implementation via vi.importActual('path') prevents pollution from other suites that also mock these APIs, which is important now that tests run in random order.

If you find yourself tweaking this in more places, consider a small helper (e.g. resetPathAndGlobMocks()) to DRY up the pattern. As per coding guidelines on test independence.

1360-1371: resolveConfigs beforeEach correctly resets process/mocking state

Resetting all mocks, re-spying process.cwd, and restoring path.parse from vi.importActual('path') ensures resolveConfigs tests don't inherit cwd/path/glob state from other describes. This is a good fit for shuffle-enabled runs and for the CLI-exit tests that rely on a clean process spy per test.

Same helper you might use for combineConfigs could also cover this to keep the reset logic in one place.
test/util/testCaseReader.test.ts (1)
505-512: Module reset + vi.importActual usage is correct, but comment is stale

Moving vi.resetModules() before the mocks and using:
const actualFs = await vi.importActual<typeof import('fs')>('fs');
vi.doMock('fs', () => ({
  ...actualFs,
  existsSync: vi.fn().mockReturnValue(true),
}));
is the right way to get the real fs into a fresh module graph for this test. The remaining comment about “use require to get actual fs since vi.importActual may return mocked version” no longer matches the implementation and can be confusing.
-    // Mock fs module - use require to get actual fs since vi.importActual may return mocked version
-    const actualFs = await vi.importActual<typeof import('fs')>('fs');
+    // Mock fs module using the real Node fs implementation
+    const actualFs = await vi.importActual<typeof import('fs')>('fs');
You might also want to align the earlier XLSX test’s vi.doMock('fs', () => ({ ...vi.importActual('fs'), ... })) with this safer pattern.

As per coding guidelines about avoiding test pollution via module caches.
test/commands/modelScan.test.ts (1)

37-67: Shared beforeEach reset pattern correctly de-pollutes modelScan mocks

Across the various describe blocks you now:

vi.clearAllMocks() per test,

mockReset() the child_process.spawn mock,

re-import and reset getModelAuditCurrentVersion to a known default,

reset ModelAudit.findByRevision / ModelAudit.create to the default “no existing scan, fixed id” behavior, and

reset HuggingFace helpers (isHuggingFaceModel, getHuggingFaceMetadata, parseHuggingFaceModel) to neutral values.

This removes hidden coupling between:

CLI error-path tests,

re-scan-on-version-change tests,

installation detection (checkModelAuditInstalled), and

temp-file / no-write behavior,

which is critical now that the test runner shuffles order. The process-exit spy setup/teardown per describe looks consistent with the move to process.exitCode.

Given the repetition, consider a small async resetModelScanTestState() helper shared by these beforeEach blocks to keep future changes to the default mock behavior in one place. As per coding guidelines on deterministic, order-independent tests.

Also applies to: 326-356, 546-572, 616-646, 865-895
test/AGENTS.md (1)
29-31: Shuffle + mock-isolation guidance matches implementation; consider heading tweak

The additions:

documenting that tests run in random order by default (with --sequence.shuffle/--sequence.seed knobs), and

the “Critical: Mock Isolation” section clarifying vi.clearAllMocks() vs mockReset() and showing a beforeEach pattern,

are exactly aligned with the changes in the test files (hoisted mocks + per-describe resets) and with the independence requirements in this repo.

To satisfy markdownlint (MD036) and improve structure, you could make the “Critical: Mock Isolation” label an actual heading instead of bold text:
-**Critical: Mock Isolation**
+### Critical: Mock Isolation
Based on learnings about documenting agent/test behavior in AGENTS.md.

Also applies to: 72-84

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 873241e and aa45d45.

📒 Files selected for processing (12)

test/AGENTS.md (2 hunks)
test/assertions/python.test.ts (2 hunks)
test/commands/modelScan.test.ts (5 hunks)
test/evaluator.test.ts (3 hunks)
test/globalConfig/accounts.test.ts (1 hunks)
test/redteam/commands/generate.test.ts (1 hunks)
test/redteam/providers/iterative.test.ts (1 hunks)
test/updates.test.ts (1 hunks)
test/util/config/load.test.ts (6 hunks)
test/util/testCaseReader.test.ts (1 hunks)
vitest.config.ts (1 hunks)
vitest.integration.config.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx,js,jsx}: Follow consistent import order (Biome handles sorting)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object shorthand syntax whenever possible
Use async/await for asynchronous code

Files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
vitest.config.ts
test/evaluator.test.ts
test/assertions/python.test.ts
vitest.integration.config.ts
test/util/config/load.test.ts

test/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Vitest for all tests (both test/ and src/app/)

Files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/evaluator.test.ts
test/assertions/python.test.ts
test/util/config/load.test.ts

test/**/*.test.{ts,tsx,js}

📄 CodeRabbit inference engine (AGENTS.md)

Backend tests in test/ should use Vitest with globals enabled (describe, it, expect available without imports)

Files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/evaluator.test.ts
test/assertions/python.test.ts
test/util/config/load.test.ts

test/**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (test/AGENTS.md)

test/**/*.test.{ts,tsx,js,jsx}: Never increase test timeouts - fix the slow test instead
Never use .only() or .skip() in committed code
Always clean up mocks in afterEach using vi.resetAllMocks()
Import test utilities explicitly from 'vitest': describe, it, expect, beforeEach, afterEach, vi
Use Vitest's mocking utilities (vi.mock, vi.fn, vi.spyOn) rather than other mocking libraries
Prefer shallow mocking over deep mocking when using Vitest
Mock external dependencies but not the code being tested
Reset mocks between tests to prevent test pollution
Ensure all tests are independent and can run in any order
Clean up test data and mocks after each test
Test failures should be deterministic
For database tests, use in-memory instances or proper test fixtures

Files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/evaluator.test.ts
test/assertions/python.test.ts
test/util/config/load.test.ts

test/**/AGENTS.md

📄 CodeRabbit inference engine (test/CLAUDE.md)

Document all agent implementations and capabilities in AGENTS.md

Files:

test/AGENTS.md

🧠 Learnings (32)

📓 Common learnings

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Ensure all tests are independent and can run in any order

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Reset mocks between tests to prevent test pollution

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Use Vitest's mocking utilities (`vi.mock`, `vi.fn`, `vi.spyOn`) rather than other mocking libraries

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Always run tests with `--randomize` flag to ensure test independence

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Always clean up mocks in `afterEach` using `vi.resetAllMocks()`

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:02.324Z
Learning: Applies to test/**/*.{ts,tsx,js,jsx} : Use Vitest for all tests (both `test/` and `src/app/`)

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/app/AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:48.482Z
Learning: Applies to src/app/**/*.test.{ts,tsx} : Use `vi.fn()` for mocks and `vi.mock()` for module mocking in Vitest test files

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Test failures should be deterministic

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Prefer shallow mocking over deep mocking when using Vitest

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:14.828Z
Learning: Applies to src/redteam/test/redteam/**/*.ts : Add tests for new red team plugins in `test/redteam/` directory following the pattern in `src/redteam/plugins/pii.ts`

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Reset mocks between tests to prevent test pollution

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/AGENTS.md
test/evaluator.test.ts
test/assertions/python.test.ts
vitest.integration.config.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Clean up test data and mocks after each test

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/evaluator.test.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Always clean up mocks in `afterEach` using `vi.resetAllMocks()`

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/AGENTS.md
test/evaluator.test.ts
test/assertions/python.test.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Ensure all tests are independent and can run in any order

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/AGENTS.md
vitest.config.ts
test/evaluator.test.ts
test/assertions/python.test.ts
vitest.integration.config.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-10T02:05:13.020Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.020Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Never increase test timeouts - fix the slow test instead

Applied to files:

test/updates.test.ts
test/AGENTS.md

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : For database tests, use in-memory instances or proper test fixtures

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/AGENTS.md

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Mock external dependencies but not the code being tested

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/AGENTS.md
test/evaluator.test.ts
test/assertions/python.test.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-09T06:09:06.028Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/providers/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:06.028Z
Learning: Applies to src/providers/test/providers/**/*.ts : Test provider success AND error cases, including rate limits, timeouts, and invalid configs

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/AGENTS.md
test/evaluator.test.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-09T06:09:06.028Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/providers/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:06.028Z
Learning: Applies to src/providers/test/providers/**/*.ts : Provider tests must NEVER make real API calls - mock all HTTP requests using `vi.mock`

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/AGENTS.md
test/evaluator.test.ts
test/assertions/python.test.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Test failures should be deterministic

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/AGENTS.md
vitest.config.ts
test/assertions/python.test.ts
vitest.integration.config.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Use Vitest's mocking utilities (`vi.mock`, `vi.fn`, `vi.spyOn`) rather than other mocking libraries

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/AGENTS.md
vitest.config.ts
test/evaluator.test.ts
test/assertions/python.test.ts
vitest.integration.config.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-09T06:08:48.482Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/app/AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:48.482Z
Learning: Applies to src/app/**/*.test.{ts,tsx} : Use `vi.fn()` for mocks and `vi.mock()` for module mocking in Vitest test files

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/globalConfig/accounts.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/AGENTS.md
vitest.config.ts
test/evaluator.test.ts
test/assertions/python.test.ts
vitest.integration.config.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Import test utilities explicitly from 'vitest': `describe`, `it`, `expect`, `beforeEach`, `afterEach`, `vi`

Applied to files:

test/updates.test.ts
test/redteam/commands/generate.test.ts
test/redteam/providers/iterative.test.ts
test/commands/modelScan.test.ts
test/util/testCaseReader.test.ts
test/AGENTS.md
vitest.config.ts
test/evaluator.test.ts
test/assertions/python.test.ts
vitest.integration.config.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-09T06:08:02.324Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:02.324Z
Learning: Applies to test/**/*.test.{ts,tsx,js} : Backend tests in `test/` should use Vitest with globals enabled (`describe`, `it`, `expect` available without imports)

Applied to files:

test/updates.test.ts
vitest.config.ts
test/evaluator.test.ts
test/assertions/python.test.ts
vitest.integration.config.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-09T06:09:14.828Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:14.828Z
Learning: Applies to src/redteam/test/redteam/**/*.ts : Add tests for new red team plugins in `test/redteam/` directory following the pattern in `src/redteam/plugins/pii.ts`

Applied to files:

test/redteam/commands/generate.test.ts
test/redteam/providers/iterative.test.ts
vitest.config.ts
test/assertions/python.test.ts

📚 Learning: 2025-12-09T06:09:14.828Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:14.828Z
Learning: Applies to src/redteam/plugins/*.ts : Generate targeted test cases for specific vulnerability types in plugin implementations

Applied to files:

test/redteam/commands/generate.test.ts

📚 Learning: 2025-12-09T06:09:14.828Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:14.828Z
Learning: Applies to src/redteam/plugins/*.ts : Include assertions defining failure conditions in plugin test cases

Applied to files:

test/redteam/commands/generate.test.ts
test/redteam/providers/iterative.test.ts
test/assertions/python.test.ts

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/providers/**/*.test.{ts,tsx,js,jsx} : For provider testing, include test coverage for: success case, error cases (4xx, 5xx, rate limits), configuration validation, and token usage tracking

Applied to files:

test/redteam/providers/iterative.test.ts
test/AGENTS.md
test/evaluator.test.ts

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Prefer shallow mocking over deep mocking when using Vitest

Applied to files:

test/redteam/providers/iterative.test.ts
test/util/testCaseReader.test.ts
test/AGENTS.md
vitest.config.ts
test/evaluator.test.ts
test/assertions/python.test.ts
vitest.integration.config.ts
test/util/config/load.test.ts

📚 Learning: 2025-12-09T06:08:12.794Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: drizzle/AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:12.794Z
Learning: Applies to drizzle/test/**/*.{js,ts} : Use in-memory SQLite databases in test files to verify migrations work correctly without affecting production data

Applied to files:

test/commands/modelScan.test.ts

📚 Learning: 2025-12-09T06:08:55.096Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/commands/AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:55.096Z
Learning: Follow the standard CLI command structure demonstrated in `src/commands/eval.ts`: register with Commander, setup environment, track telemetry, use logger for output, and handle errors properly

Applied to files:

test/commands/modelScan.test.ts

📚 Learning: 2025-11-29T00:24:24.883Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-11-29T00:24:24.883Z
Learning: Applies to test/**/AGENTS.md : Document all agent implementations and capabilities in AGENTS.md

Applied to files:

test/AGENTS.md

📚 Learning: 2025-11-29T00:24:20.916Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/server/CLAUDE.md:0-0
Timestamp: 2025-11-29T00:24:20.916Z
Learning: Applies to src/server/**/AGENTS.md : Maintain clear documentation of agent architecture and design decisions in AGENTS.md

Applied to files:

test/AGENTS.md

📚 Learning: 2025-11-29T00:24:07.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/commands/CLAUDE.md:0-0
Timestamp: 2025-11-29T00:24:07.021Z
Learning: Applies to src/commands/**/*.md : Documentation about agents should be maintained in AGENTS.md

Applied to files:

test/AGENTS.md

📚 Learning: 2025-11-29T00:24:20.916Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/server/CLAUDE.md:0-0
Timestamp: 2025-11-29T00:24:20.916Z
Learning: Applies to src/server/**/AGENTS.md : Document agent responsibilities, capabilities, and interactions in AGENTS.md

Applied to files:

test/AGENTS.md

📚 Learning: 2025-11-29T00:24:17.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/CLAUDE.md:0-0
Timestamp: 2025-11-29T00:24:17.021Z
Learning: Applies to src/redteam/**/*agent*.{ts,tsx,js,jsx} : Maintain clear agent interface definitions and usage patterns

Applied to files:

test/AGENTS.md

📚 Learning: 2025-12-10T02:05:13.021Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Always run tests with `--randomize` flag to ensure test independence

Applied to files:

test/AGENTS.md
vitest.config.ts
vitest.integration.config.ts

📚 Learning: 2025-12-09T06:08:02.324Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:02.324Z
Learning: Applies to test/**/*.{ts,tsx,js,jsx} : Use Vitest for all tests (both `test/` and `src/app/`)

Applied to files:

test/AGENTS.md
vitest.config.ts
vitest.integration.config.ts

📚 Learning: 2025-07-18T17:25:57.700Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/gh-cli-workflow.mdc:0-0
Timestamp: 2025-07-18T17:25:57.700Z
Learning: Applies to **/*.{test,spec}.{js,ts,jsx,tsx} : Avoid disabling or skipping tests unless absolutely necessary and documented

Applied to files:

test/AGENTS.md

📚 Learning: 2025-12-09T06:08:02.324Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:02.324Z
Learning: Applies to src/app/**/*.test.{ts,tsx} : Frontend tests in `src/app/` should use Vitest with explicit imports

Applied to files:

vitest.config.ts
test/assertions/python.test.ts
vitest.integration.config.ts
test/util/config/load.test.ts

📚 Learning: 2025-10-06T03:43:01.653Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-06T03:43:01.653Z
Learning: Applies to test/**/*.{ts,tsx,js,jsx} : Follow Jest best practices using describe and it blocks in tests

Applied to files:

test/util/config/load.test.ts

🧬 Code graph analysis (4)

test/updates.test.ts (1)

src/util/fetch/index.ts (1)

fetchWithTimeout (141-173)

test/globalConfig/accounts.test.ts (1)

src/globalConfig/globalConfig.ts (1)

readGlobalConfig (21-39)

test/evaluator.test.ts (1)

src/evaluatorHelpers.ts (1)

runExtensionHook (542-595)

test/assertions/python.test.ts (3)

scripts/generate-blog-image.cjs (1)

path (3-3)

scripts/update-changelog-version.cjs (1)

path (11-11)

scripts/validate-changelog.cjs (1)

path (13-13)

🪛 markdownlint-cli2 (0.18.1)

test/AGENTS.md

72-72: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)

GitHub Check: Test on Node 24.x and ubuntu-latest
GitHub Check: webui tests
GitHub Check: Test on Node 24.x and windows-latest
GitHub Check: Build Docs
GitHub Check: Share Test
GitHub Check: Test on Node 22.x and ubuntu-latest
GitHub Check: Test on Node 20.x and ubuntu-latest
GitHub Check: Test on Node 20.x and macOS-latest
GitHub Check: Test on Node 20.x and windows-latest
GitHub Check: Test on Node 22.x and macOS-latest
GitHub Check: Test on Node 22.x and windows-latest
GitHub Check: Redteam (Staging API)
GitHub Check: Redteam (Production API)
GitHub Check: Generate Assets
GitHub Check: Build on Node 24.x
GitHub Check: Build on Node 22.x
GitHub Check: Build on Node 20.x
GitHub Check: security-scan
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (7)

test/globalConfig/accounts.test.ts (1)

153-164: Explicitly mocking readGlobalConfig avoids cross-test leakage

Mocking readGlobalConfig to a minimal { id: 'test-id' } here ensures setUserEmail is exercised against a clean config and won’t accidentally inherit account state from other tests when order is randomized. This is a good, tight fix for test isolation.

test/assertions/python.test.ts (1)

379-447: Per-test path.resolve/extname mocks fix order dependency

Adding explicit path.resolve / path.extname mocks in these tests ensures file‑based Python assertions no longer depend on whatever values previous tests left on the shared path mock. This matches the isolation guidance for shared mocks and makes these cases safe under shuffled execution.

vitest.config.ts (1)

20-26: Enabling sequence.shuffle with clear guidance looks good

Turning on sequence.shuffle by default (with comments on how to disable for debugging) is aligned with the test AGENTS guidance about catching order‑coupled tests early. The placement under test config is appropriate and should work well with the existing forked worker setup.

vitest.integration.config.ts (1)

20-26: Consistent randomized sequencing for integration tests

Mirroring sequence.shuffle: true into the integration config keeps unit and integration suites aligned on “random by default, override for debugging,” which is exactly what the updated testing guidance calls for. No issues with the surrounding fork/timeout settings.

test/redteam/providers/iterative.test.ts (1)

47-56: Hoisted mocks are now correctly reset for shuffle-safe isolation

Using mockReset() on the hoisted mocks in beforeEach (on top of vi.clearAllMocks()) ensures both call history and implementations are cleared between tests, which is exactly what you want with vi.hoisted + random test order. This prevents leaked mockReturnValue/mockResolvedValueOnce state across describes.

As per coding guidelines about resetting mocks between tests and the new mock isolation rules in test/AGENTS.md, this looks solid.

test/util/config/load.test.ts (2)

1650-1661: readConfig beforeEach correctly resets $RefParser and path.parse

The new async beforeEach that:

clears/restores mocks,

resets path.parse to the real implementation, and

mockReset()s mockDereference and re-establishes a pass-through implementation,

prevents queued mockResolvedValueOnce calls or custom path.parse implementations from other tests leaking into readConfig behavior.

This directly implements the mock-isolation guidance for hoisted mocks in test/AGENTS.md.

1851-1861: resolveConfigs with external defaultTest now has deterministic basePath, deref, and glob behavior

This beforeEach does three important things for isolation:

Sets cliState.basePath to a known value.

Resets mockDereference to a pass-through implementation.

Restores path.parse via vi.importActual('path').

The explicit vi.mocked(globSync).mockReturnValue(['config.json']); also removes any dependence on prior glob mocks. Together these make the external-defaultTest scenario stable under random test ordering.

If any other suites depend on cliState.basePath’s default value, double-check they explicitly set it in their own setup so this mutation can’t leak across files. Based on learnings about test independence.

Also applies to: 1881-1881

…ation

Replace hard-coded /tmp paths with path.join(os.tmpdir(), ...) to fix test failures on Windows where /tmp doesn't exist. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add mockReset() calls for mockedCheckModelAuditInstalled and mockedSpawn in beforeEach to ensure test isolation when tests run in random order. vi.clearAllMocks() only clears call history, not mock implementations set via mockResolvedValue/mockReturnValue. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

JustinBeckwith

Never seen this! Very cool.

…ation

Add mockReset() and mockImplementation() for runExtensionHook in the 'Evaluator with external defaultTest' describe block's beforeEach hook. This ensures the mock is properly reset between tests when running with shuffle mode enabled, fixing the Windows Node 24 CI failure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mldangelo and others added 5 commits December 10, 2025 00:24

mldangelo requested review from a team, JustinBeckwith, MrFlounder, addelong, will-holley and yash2998chhabria as code owners December 10, 2025 18:56

promptfoo-scanner bot reviewed Dec 10, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into fix/test-shuffle-isol…

258d2aa

…ation

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

mldangelo and others added 6 commits December 10, 2025 14:11

fix(test): add WatsonXAI mock to cached response test for isolation

6e10106

Merge remote-tracking branch 'origin/main' into fix/test-shuffle-isol…

ae0898b

…ation

Merge remote-tracking branch 'origin/main' into fix/test-shuffle-isol…

c5c4e45

…ation

Merge branch 'main' into fix/test-shuffle-isolation

704085f

JustinBeckwith approved these changes Dec 11, 2025

View reviewed changes

mldangelo and others added 2 commits December 11, 2025 01:54

Merge remote-tracking branch 'origin/main' into fix/test-shuffle-isol…

f8ddace

…ation

mldangelo merged commit 14a981f into main Dec 11, 2025
35 checks passed

mldangelo deleted the fix/test-shuffle-isolation branch December 11, 2025 07:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(test): enable shuffle mode and fix test isolation bugs #6601

fix(test): enable shuffle mode and fix test isolation bugs #6601

Uh oh!

mldangelo commented Dec 10, 2025 •

edited

Loading

Uh oh!

use-tusk bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

promptfoo-scanner bot left a comment

Uh oh!

coderabbitai bot commented Dec 10, 2025 •

edited

Loading

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

JustinBeckwith left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix(test): enable shuffle mode and fix test isolation bugs #6601

fix(test): enable shuffle mode and fix test isolation bugs #6601

Uh oh!

Conversation

mldangelo commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test isolation fixes

Key learnings documented in AGENTS.md

Test plan

Uh oh!

use-tusk bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

JustinBeckwith left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mldangelo commented Dec 10, 2025 •

edited

Loading

use-tusk bot commented Dec 10, 2025 •

edited

Loading

coderabbitai bot commented Dec 10, 2025 •

edited

Loading