Skip to content

Conversation

@mldangelo
Copy link
Member

@mldangelo mldangelo commented Dec 10, 2025

Summary

  • Enable test shuffle by default in vitest config (sequence: { shuffle: true })
  • Fix test isolation bugs across 9 test files that caused failures when tests ran in different orders
  • Update test/AGENTS.md documentation with mock isolation best practices

Closes #2265

Test isolation fixes

File Issue Fix
load.test.ts path.parse and mockDereference mocks persisted Added mockReset() in 4 describe blocks
iterative.test.ts 4 hoisted mocks retained implementations Added mockReset() for all hoisted mocks
modelScan.test.ts Comprehensive mock pollution Reset spawn, ModelAudit, HuggingFace mocks in each describe
evaluator.test.ts runExtensionHook mock in 3 describe blocks Added mockReset() + restore default implementation
generate.test.ts resolveConfigs mock had no default Added explicit mock setup in nested describe
updates.test.ts PROMPTFOO_DISABLE_UPDATE env var pollution Added delete process.env.X in beforeEach
accounts.test.ts readGlobalConfig mock state leaked Added explicit mock setup in test
python.test.ts path.resolve/path.extname mocks missing Added mock setup to 2 tests
testCaseReader.test.ts Module cache pollution with xlsx Call resetModules() before doMock
watsonx.test.ts WatsonXAI.newInstance mock missing Added mock setup in cached response test

Key learnings documented in AGENTS.md

  • vi.clearAllMocks() only clears call history, NOT mockImplementation() - use mockReset()
  • mockResolvedValueOnce() queues survive clearAllMocks() - use mockReset() to clear
  • Environment variables are shared state - explicitly delete in beforeEach
  • Module cache can cause vi.importActual to return mocked modules - call resetModules() first

Test plan

  • Verified with 50+ unique random seeds (9403 tests each run)
  • All 493 test files pass consistently regardless of execution order
  • Tests still pass with shuffle disabled (--sequence.shuffle=false)

mldangelo and others added 5 commits December 10, 2025 00:24
Add proper mock resets in beforeEach blocks to ensure tests are
isolated and can run in any order with --sequence.shuffle=true.

Key changes:
- test/util/config/load.test.ts: Reset path.parse mock to actual
  implementation in beforeEach for combineConfigs, resolveConfigs,
  and resolveConfigs with external defaultTest blocks
- test/redteam/providers/iterative.test.ts: Add mockReset() for
  hoisted mocks (mockGetProvider, mockGetTargetResponse,
  mockCheckPenalizedPhrases, mockGetGraderById) since clearAllMocks
  only clears call history, not mockReturnValue implementations
- test/commands/modelScan.test.ts: Reset spawn, getModelAuditCurrentVersion,
  ModelAudit, and HuggingFace mocks in beforeEach for all describe blocks

The root cause was that vi.clearAllMocks() only clears call history
but doesn't reset mockReturnValue/mockResolvedValue implementations.
When tests set these values, they persist across tests unless explicitly
reset with mockReset() or mockImplementation().

Fixes #2265

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add sequence.shuffle: true to both vitest.config.ts and
vitest.integration.config.ts to catch test isolation issues early.

This is the staff engineer approach:
- Single source of truth in config (not scattered across scripts)
- Applies to all test runs (local, CI, watch mode)
- Self-documenting with clear comments
- Override-able with --sequence.shuffle=false for debugging

Also updated test/AGENTS.md with:
- Documentation about shuffle being enabled by default
- Critical mock isolation guidance (vi.clearAllMocks vs mockReset)
- Override flags for debugging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Reset runExtensionHook mock in defaultTest normalization tests
- Add resolveConfigs mock setup in doGenerateRedteam external defaultTest tests

Both issues were caused by tests relying on mock implementations set by
previous tests, which vi.clearAllMocks() doesn't reset.
- Add runExtensionHook mock reset to main evaluator describe block
- Add fetchWithTimeout mock reset to checkForUpdates describe block
- Clear PROMPTFOO_DISABLE_UPDATE env var in checkForUpdates beforeEach

Environment variables set by tests in one describe block were leaking
to tests in other describe blocks when shuffle was enabled.
- evaluator.test.ts: Add runExtensionHook mock reset to defaultTest merging describe block
- accounts.test.ts: Add readGlobalConfig mock setup in setUserEmail test
- python.test.ts: Add path.resolve/path.extname mocks to 2 tests that relied on earlier test state
- testCaseReader.test.ts: Fix xlsx module mock by calling resetModules before doMock

These fixes ensure tests pass consistently regardless of execution order when
running with shuffle enabled (vitest --sequence.shuffle=true).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@use-tusk
Copy link
Contributor

use-tusk bot commented Dec 10, 2025

⏩ No test execution environment matched (86c8d28) View output ↗


View check history

Commit Status Output Created (UTC)
aa45d45 ⏩ No test execution environment matched Output Dec 10, 2025 6:56PM
258d2aa ⏩ No test execution environment matched Output Dec 10, 2025 7:01PM
6e10106 ⏩ No test execution environment matched Output Dec 10, 2025 7:11PM
ae0898b ⏩ No test execution environment matched Output Dec 10, 2025 7:38PM
3f52009 ⏩ No test execution environment matched Output Dec 10, 2025 9:12PM
704085f ⏩ No test execution environment matched Output Dec 10, 2025 10:32PM
2b3ee5f ⏩ No test execution environment matched Output Dec 10, 2025 10:54PM
86c8d28 ⏩ No test execution environment matched Output Dec 11, 2025 6:59AM

View output in GitHub ↗

Copy link
Contributor

@promptfoo-scanner promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 All Clear

I reviewed this PR for LLM security vulnerabilities. The changes focus entirely on test infrastructure improvements - enabling random test execution order and fixing mock isolation issues. No LLM-related code was modified.

Minimum severity threshold for this scan: 🟡 Medium | Learn more

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

📝 Walkthrough

Walkthrough

This PR implements systematic test isolation improvements across the codebase by enabling random test execution and establishing comprehensive mock reset patterns. Changes include enabling test sequence randomization in Vitest configuration files (vitest.config.ts, vitest.integration.config.ts), adding explicit mock resets in beforeEach hooks across multiple test files (assertions, commands, evaluator, config, redteam, updates, and utilities), converting several beforeEach hooks to async for proper initialization, and documenting best practices for mock isolation in test documentation. The overall objective is preventing test state leakage when tests execute in random order.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • Mock reset consistency: Verify that mock reset patterns are correctly applied across all test files and that hoisted mocks receiving mockReset() calls (rather than clearAllMocks()) match what each test actually uses
  • Async beforeEach conversions: Confirm that converting beforeEach hooks to async in modelScan.test.ts, load.test.ts, and others doesn't introduce timing issues or race conditions in test execution
  • Module reset timing in testCaseReader.test.ts: The relocation of vi.resetModules() to occur before fs mock setup rather than after needs verification that it doesn't affect the test flow or module caching behavior
  • Configuration propagation: Ensure sequence.shuffle = true in both vitest config files is complete and that there are no other test runner entry points that might bypass this setting
  • Documentation accuracy: Verify that AGENTS.md guidance on mock isolation and disabling randomization flags (--sequence.shuffle=false, --sequence.seed=12345) matches actual Vitest capabilities and usage patterns

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(test): enable shuffle mode and fix test isolation bugs' directly summarizes the main changes: enabling shuffle mode and fixing test isolation bugs across multiple test files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The pull request description is directly related to the changeset, providing a clear summary of test shuffle enablement, isolation bug fixes, and documentation updates with specific file references and detailed test plan results.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/test-shuffle-isolation

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (10)
test/updates.test.ts (1)

72-79: Stronger per-suite isolation for fetchWithTimeout and PROMPTFOO_DISABLE_UPDATE

Resetting fetchWithTimeout and clearing PROMPTFOO_DISABLE_UPDATE in this beforeEach makes the checkForUpdates suite deterministic and order‑independent, even when other suites queue mockResolvedValueOnce calls or tweak that env var. The extra mockReset on top of the global one is redundant but harmless and keeps the intent local to this block. Based on learnings, this aligns with the isolation guidance in test/AGENTS.md.

test/redteam/commands/generate.test.ts (1)

1497-1526: Resetting resolveConfigs per-suite avoids hoisted mock leakage

Re‑initializing configModule.resolveConfigs in this beforeEach is the right way to stop implementations from other redteam suites leaking into the “external defaultTest” tests, especially with hoisted module mocks and shuffled ordering. The neutral default config you return is minimal but sufficient for these scenarios.

test/evaluator.test.ts (3)

338-347: Resetting runExtensionHook in the main evaluator suite prevents hook leaks

Adding mockReset() followed by the default identity implementation in this beforeEach stops per‑test overrides of runExtensionHook (e.g., in the sessionId tests later in the suite) from leaking across tests. Combined with vi.clearAllMocks(), this gives the evaluator tests a predictable starting hook state under shuffled execution.


4034-4040: Same runExtensionHook reset pattern correctly applied to defaultTest-merging tests

Using the same runExtensionHook reset in the evaluator defaultTest merging suite ensures those tests are not affected by hook behavior from the main evaluator block or vice versa. This is consistent with the isolation goal of the PR and keeps the extension‑related assertions here trustworthy regardless of run order.


4369-4374: Hook normalization is especially important for extension-focused tests

For the defaultTest normalization for extensions suite, normalizing runExtensionHook before each test is critical, since these tests explicitly assert on how extensions manipulate defaultTest. Guaranteeing a clean, array-backed runExtensionHook mock per test avoids very subtle flakiness when other suites modify the same mock. This change aligns nicely with the new AGENTS guidance on extension hooks and defaultTest setup.

test/util/config/load.test.ts (2)

199-217: Path/glob setup in combineConfigs beforeEach improves isolation

The async beforeEach that clears/restores mocks, fixes process.cwd, and rewires globSync + path.parse back to the real implementation via vi.importActual('path') prevents pollution from other suites that also mock these APIs, which is important now that tests run in random order.

If you find yourself tweaking this in more places, consider a small helper (e.g. resetPathAndGlobMocks()) to DRY up the pattern. As per coding guidelines on test independence.


1360-1371: resolveConfigs beforeEach correctly resets process/mocking state

Resetting all mocks, re-spying process.cwd, and restoring path.parse from vi.importActual('path') ensures resolveConfigs tests don't inherit cwd/path/glob state from other describes. This is a good fit for shuffle-enabled runs and for the CLI-exit tests that rely on a clean process spy per test.

Same helper you might use for combineConfigs could also cover this to keep the reset logic in one place.

test/util/testCaseReader.test.ts (1)

505-512: Module reset + vi.importActual usage is correct, but comment is stale

Moving vi.resetModules() before the mocks and using:

const actualFs = await vi.importActual<typeof import('fs')>('fs');
vi.doMock('fs', () => ({
  ...actualFs,
  existsSync: vi.fn().mockReturnValue(true),
}));

is the right way to get the real fs into a fresh module graph for this test. The remaining comment about “use require to get actual fs since vi.importActual may return mocked version” no longer matches the implementation and can be confusing.

-    // Mock fs module - use require to get actual fs since vi.importActual may return mocked version
-    const actualFs = await vi.importActual<typeof import('fs')>('fs');
+    // Mock fs module using the real Node fs implementation
+    const actualFs = await vi.importActual<typeof import('fs')>('fs');

You might also want to align the earlier XLSX test’s vi.doMock('fs', () => ({ ...vi.importActual('fs'), ... })) with this safer pattern.

As per coding guidelines about avoiding test pollution via module caches.

test/commands/modelScan.test.ts (1)

37-67: Shared beforeEach reset pattern correctly de-pollutes modelScan mocks

Across the various describe blocks you now:

  • vi.clearAllMocks() per test,
  • mockReset() the child_process.spawn mock,
  • re-import and reset getModelAuditCurrentVersion to a known default,
  • reset ModelAudit.findByRevision / ModelAudit.create to the default “no existing scan, fixed id” behavior, and
  • reset HuggingFace helpers (isHuggingFaceModel, getHuggingFaceMetadata, parseHuggingFaceModel) to neutral values.

This removes hidden coupling between:

  • CLI error-path tests,
  • re-scan-on-version-change tests,
  • installation detection (checkModelAuditInstalled), and
  • temp-file / no-write behavior,

which is critical now that the test runner shuffles order. The process-exit spy setup/teardown per describe looks consistent with the move to process.exitCode.

Given the repetition, consider a small async resetModelScanTestState() helper shared by these beforeEach blocks to keep future changes to the default mock behavior in one place. As per coding guidelines on deterministic, order-independent tests.

Also applies to: 326-356, 546-572, 616-646, 865-895

test/AGENTS.md (1)

29-31: Shuffle + mock-isolation guidance matches implementation; consider heading tweak

The additions:

  • documenting that tests run in random order by default (with --sequence.shuffle/--sequence.seed knobs), and
  • the “Critical: Mock Isolation” section clarifying vi.clearAllMocks() vs mockReset() and showing a beforeEach pattern,

are exactly aligned with the changes in the test files (hoisted mocks + per-describe resets) and with the independence requirements in this repo.

To satisfy markdownlint (MD036) and improve structure, you could make the “Critical: Mock Isolation” label an actual heading instead of bold text:

-**Critical: Mock Isolation**
+### Critical: Mock Isolation

Based on learnings about documenting agent/test behavior in AGENTS.md.

Also applies to: 72-84

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 873241e and aa45d45.

📒 Files selected for processing (12)
  • test/AGENTS.md (2 hunks)
  • test/assertions/python.test.ts (2 hunks)
  • test/commands/modelScan.test.ts (5 hunks)
  • test/evaluator.test.ts (3 hunks)
  • test/globalConfig/accounts.test.ts (1 hunks)
  • test/redteam/commands/generate.test.ts (1 hunks)
  • test/redteam/providers/iterative.test.ts (1 hunks)
  • test/updates.test.ts (1 hunks)
  • test/util/config/load.test.ts (6 hunks)
  • test/util/testCaseReader.test.ts (1 hunks)
  • vitest.config.ts (1 hunks)
  • vitest.integration.config.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx,js,jsx}: Follow consistent import order (Biome handles sorting)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object shorthand syntax whenever possible
Use async/await for asynchronous code

Files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • vitest.config.ts
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • vitest.integration.config.ts
  • test/util/config/load.test.ts
test/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Vitest for all tests (both test/ and src/app/)

Files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • test/util/config/load.test.ts
test/**/*.test.{ts,tsx,js}

📄 CodeRabbit inference engine (AGENTS.md)

Backend tests in test/ should use Vitest with globals enabled (describe, it, expect available without imports)

Files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • test/util/config/load.test.ts
test/**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (test/AGENTS.md)

test/**/*.test.{ts,tsx,js,jsx}: Never increase test timeouts - fix the slow test instead
Never use .only() or .skip() in committed code
Always clean up mocks in afterEach using vi.resetAllMocks()
Import test utilities explicitly from 'vitest': describe, it, expect, beforeEach, afterEach, vi
Use Vitest's mocking utilities (vi.mock, vi.fn, vi.spyOn) rather than other mocking libraries
Prefer shallow mocking over deep mocking when using Vitest
Mock external dependencies but not the code being tested
Reset mocks between tests to prevent test pollution
Ensure all tests are independent and can run in any order
Clean up test data and mocks after each test
Test failures should be deterministic
For database tests, use in-memory instances or proper test fixtures

Files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • test/util/config/load.test.ts
test/**/AGENTS.md

📄 CodeRabbit inference engine (test/CLAUDE.md)

Document all agent implementations and capabilities in AGENTS.md

Files:

  • test/AGENTS.md
🧠 Learnings (32)
📓 Common learnings
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Ensure all tests are independent and can run in any order
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Reset mocks between tests to prevent test pollution
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Use Vitest's mocking utilities (`vi.mock`, `vi.fn`, `vi.spyOn`) rather than other mocking libraries
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Always run tests with `--randomize` flag to ensure test independence
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Always clean up mocks in `afterEach` using `vi.resetAllMocks()`
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:02.324Z
Learning: Applies to test/**/*.{ts,tsx,js,jsx} : Use Vitest for all tests (both `test/` and `src/app/`)
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/app/AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:48.482Z
Learning: Applies to src/app/**/*.test.{ts,tsx} : Use `vi.fn()` for mocks and `vi.mock()` for module mocking in Vitest test files
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Test failures should be deterministic
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Prefer shallow mocking over deep mocking when using Vitest
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:14.828Z
Learning: Applies to src/redteam/test/redteam/**/*.ts : Add tests for new red team plugins in `test/redteam/` directory following the pattern in `src/redteam/plugins/pii.ts`
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Reset mocks between tests to prevent test pollution

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/AGENTS.md
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • vitest.integration.config.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Clean up test data and mocks after each test

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/evaluator.test.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Always clean up mocks in `afterEach` using `vi.resetAllMocks()`

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/AGENTS.md
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Ensure all tests are independent and can run in any order

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/AGENTS.md
  • vitest.config.ts
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • vitest.integration.config.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-10T02:05:13.020Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.020Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Never increase test timeouts - fix the slow test instead

Applied to files:

  • test/updates.test.ts
  • test/AGENTS.md
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : For database tests, use in-memory instances or proper test fixtures

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/AGENTS.md
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Mock external dependencies but not the code being tested

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/AGENTS.md
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-09T06:09:06.028Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/providers/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:06.028Z
Learning: Applies to src/providers/test/providers/**/*.ts : Test provider success AND error cases, including rate limits, timeouts, and invalid configs

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/AGENTS.md
  • test/evaluator.test.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-09T06:09:06.028Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/providers/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:06.028Z
Learning: Applies to src/providers/test/providers/**/*.ts : Provider tests must NEVER make real API calls - mock all HTTP requests using `vi.mock`

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/AGENTS.md
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Test failures should be deterministic

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/AGENTS.md
  • vitest.config.ts
  • test/assertions/python.test.ts
  • vitest.integration.config.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Use Vitest's mocking utilities (`vi.mock`, `vi.fn`, `vi.spyOn`) rather than other mocking libraries

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/AGENTS.md
  • vitest.config.ts
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • vitest.integration.config.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-09T06:08:48.482Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/app/AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:48.482Z
Learning: Applies to src/app/**/*.test.{ts,tsx} : Use `vi.fn()` for mocks and `vi.mock()` for module mocking in Vitest test files

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/globalConfig/accounts.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/AGENTS.md
  • vitest.config.ts
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • vitest.integration.config.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Import test utilities explicitly from 'vitest': `describe`, `it`, `expect`, `beforeEach`, `afterEach`, `vi`

Applied to files:

  • test/updates.test.ts
  • test/redteam/commands/generate.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/commands/modelScan.test.ts
  • test/util/testCaseReader.test.ts
  • test/AGENTS.md
  • vitest.config.ts
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • vitest.integration.config.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-09T06:08:02.324Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:02.324Z
Learning: Applies to test/**/*.test.{ts,tsx,js} : Backend tests in `test/` should use Vitest with globals enabled (`describe`, `it`, `expect` available without imports)

Applied to files:

  • test/updates.test.ts
  • vitest.config.ts
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • vitest.integration.config.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-09T06:09:14.828Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:14.828Z
Learning: Applies to src/redteam/test/redteam/**/*.ts : Add tests for new red team plugins in `test/redteam/` directory following the pattern in `src/redteam/plugins/pii.ts`

Applied to files:

  • test/redteam/commands/generate.test.ts
  • test/redteam/providers/iterative.test.ts
  • vitest.config.ts
  • test/assertions/python.test.ts
📚 Learning: 2025-12-09T06:09:14.828Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:14.828Z
Learning: Applies to src/redteam/plugins/*.ts : Generate targeted test cases for specific vulnerability types in plugin implementations

Applied to files:

  • test/redteam/commands/generate.test.ts
📚 Learning: 2025-12-09T06:09:14.828Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/AGENTS.md:0-0
Timestamp: 2025-12-09T06:09:14.828Z
Learning: Applies to src/redteam/plugins/*.ts : Include assertions defining failure conditions in plugin test cases

Applied to files:

  • test/redteam/commands/generate.test.ts
  • test/redteam/providers/iterative.test.ts
  • test/assertions/python.test.ts
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/providers/**/*.test.{ts,tsx,js,jsx} : For provider testing, include test coverage for: success case, error cases (4xx, 5xx, rate limits), configuration validation, and token usage tracking

Applied to files:

  • test/redteam/providers/iterative.test.ts
  • test/AGENTS.md
  • test/evaluator.test.ts
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Applies to test/**/*.test.{ts,tsx,js,jsx} : Prefer shallow mocking over deep mocking when using Vitest

Applied to files:

  • test/redteam/providers/iterative.test.ts
  • test/util/testCaseReader.test.ts
  • test/AGENTS.md
  • vitest.config.ts
  • test/evaluator.test.ts
  • test/assertions/python.test.ts
  • vitest.integration.config.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-12-09T06:08:12.794Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: drizzle/AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:12.794Z
Learning: Applies to drizzle/test/**/*.{js,ts} : Use in-memory SQLite databases in test files to verify migrations work correctly without affecting production data

Applied to files:

  • test/commands/modelScan.test.ts
📚 Learning: 2025-12-09T06:08:55.096Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/commands/AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:55.096Z
Learning: Follow the standard CLI command structure demonstrated in `src/commands/eval.ts`: register with Commander, setup environment, track telemetry, use logger for output, and handle errors properly

Applied to files:

  • test/commands/modelScan.test.ts
📚 Learning: 2025-11-29T00:24:24.883Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-11-29T00:24:24.883Z
Learning: Applies to test/**/AGENTS.md : Document all agent implementations and capabilities in AGENTS.md

Applied to files:

  • test/AGENTS.md
📚 Learning: 2025-11-29T00:24:20.916Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/server/CLAUDE.md:0-0
Timestamp: 2025-11-29T00:24:20.916Z
Learning: Applies to src/server/**/AGENTS.md : Maintain clear documentation of agent architecture and design decisions in AGENTS.md

Applied to files:

  • test/AGENTS.md
📚 Learning: 2025-11-29T00:24:07.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/commands/CLAUDE.md:0-0
Timestamp: 2025-11-29T00:24:07.021Z
Learning: Applies to src/commands/**/*.md : Documentation about agents should be maintained in AGENTS.md

Applied to files:

  • test/AGENTS.md
📚 Learning: 2025-11-29T00:24:20.916Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/server/CLAUDE.md:0-0
Timestamp: 2025-11-29T00:24:20.916Z
Learning: Applies to src/server/**/AGENTS.md : Document agent responsibilities, capabilities, and interactions in AGENTS.md

Applied to files:

  • test/AGENTS.md
📚 Learning: 2025-11-29T00:24:17.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/CLAUDE.md:0-0
Timestamp: 2025-11-29T00:24:17.021Z
Learning: Applies to src/redteam/**/*agent*.{ts,tsx,js,jsx} : Maintain clear agent interface definitions and usage patterns

Applied to files:

  • test/AGENTS.md
📚 Learning: 2025-12-10T02:05:13.021Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2025-12-10T02:05:13.021Z
Learning: Always run tests with `--randomize` flag to ensure test independence

Applied to files:

  • test/AGENTS.md
  • vitest.config.ts
  • vitest.integration.config.ts
📚 Learning: 2025-12-09T06:08:02.324Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:02.324Z
Learning: Applies to test/**/*.{ts,tsx,js,jsx} : Use Vitest for all tests (both `test/` and `src/app/`)

Applied to files:

  • test/AGENTS.md
  • vitest.config.ts
  • vitest.integration.config.ts
📚 Learning: 2025-07-18T17:25:57.700Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/gh-cli-workflow.mdc:0-0
Timestamp: 2025-07-18T17:25:57.700Z
Learning: Applies to **/*.{test,spec}.{js,ts,jsx,tsx} : Avoid disabling or skipping tests unless absolutely necessary and documented

Applied to files:

  • test/AGENTS.md
📚 Learning: 2025-12-09T06:08:02.324Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-09T06:08:02.324Z
Learning: Applies to src/app/**/*.test.{ts,tsx} : Frontend tests in `src/app/` should use Vitest with explicit imports

Applied to files:

  • vitest.config.ts
  • test/assertions/python.test.ts
  • vitest.integration.config.ts
  • test/util/config/load.test.ts
📚 Learning: 2025-10-06T03:43:01.653Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-06T03:43:01.653Z
Learning: Applies to test/**/*.{ts,tsx,js,jsx} : Follow Jest best practices using describe and it blocks in tests

Applied to files:

  • test/util/config/load.test.ts
🧬 Code graph analysis (4)
test/updates.test.ts (1)
src/util/fetch/index.ts (1)
  • fetchWithTimeout (141-173)
test/globalConfig/accounts.test.ts (1)
src/globalConfig/globalConfig.ts (1)
  • readGlobalConfig (21-39)
test/evaluator.test.ts (1)
src/evaluatorHelpers.ts (1)
  • runExtensionHook (542-595)
test/assertions/python.test.ts (3)
scripts/generate-blog-image.cjs (1)
  • path (3-3)
scripts/update-changelog-version.cjs (1)
  • path (11-11)
scripts/validate-changelog.cjs (1)
  • path (13-13)
🪛 markdownlint-cli2 (0.18.1)
test/AGENTS.md

72-72: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: Test on Node 24.x and ubuntu-latest
  • GitHub Check: webui tests
  • GitHub Check: Test on Node 24.x and windows-latest
  • GitHub Check: Build Docs
  • GitHub Check: Share Test
  • GitHub Check: Test on Node 22.x and ubuntu-latest
  • GitHub Check: Test on Node 20.x and ubuntu-latest
  • GitHub Check: Test on Node 20.x and macOS-latest
  • GitHub Check: Test on Node 20.x and windows-latest
  • GitHub Check: Test on Node 22.x and macOS-latest
  • GitHub Check: Test on Node 22.x and windows-latest
  • GitHub Check: Redteam (Staging API)
  • GitHub Check: Redteam (Production API)
  • GitHub Check: Generate Assets
  • GitHub Check: Build on Node 24.x
  • GitHub Check: Build on Node 22.x
  • GitHub Check: Build on Node 20.x
  • GitHub Check: security-scan
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (7)
test/globalConfig/accounts.test.ts (1)

153-164: Explicitly mocking readGlobalConfig avoids cross-test leakage

Mocking readGlobalConfig to a minimal { id: 'test-id' } here ensures setUserEmail is exercised against a clean config and won’t accidentally inherit account state from other tests when order is randomized. This is a good, tight fix for test isolation.

test/assertions/python.test.ts (1)

379-447: Per-test path.resolve/extname mocks fix order dependency

Adding explicit path.resolve / path.extname mocks in these tests ensures file‑based Python assertions no longer depend on whatever values previous tests left on the shared path mock. This matches the isolation guidance for shared mocks and makes these cases safe under shuffled execution.

vitest.config.ts (1)

20-26: Enabling sequence.shuffle with clear guidance looks good

Turning on sequence.shuffle by default (with comments on how to disable for debugging) is aligned with the test AGENTS guidance about catching order‑coupled tests early. The placement under test config is appropriate and should work well with the existing forked worker setup.

vitest.integration.config.ts (1)

20-26: Consistent randomized sequencing for integration tests

Mirroring sequence.shuffle: true into the integration config keeps unit and integration suites aligned on “random by default, override for debugging,” which is exactly what the updated testing guidance calls for. No issues with the surrounding fork/timeout settings.

test/redteam/providers/iterative.test.ts (1)

47-56: Hoisted mocks are now correctly reset for shuffle-safe isolation

Using mockReset() on the hoisted mocks in beforeEach (on top of vi.clearAllMocks()) ensures both call history and implementations are cleared between tests, which is exactly what you want with vi.hoisted + random test order. This prevents leaked mockReturnValue/mockResolvedValueOnce state across describes.

As per coding guidelines about resetting mocks between tests and the new mock isolation rules in test/AGENTS.md, this looks solid.

test/util/config/load.test.ts (2)

1650-1661: readConfig beforeEach correctly resets $RefParser and path.parse

The new async beforeEach that:

  • clears/restores mocks,
  • resets path.parse to the real implementation, and
  • mockReset()s mockDereference and re-establishes a pass-through implementation,

prevents queued mockResolvedValueOnce calls or custom path.parse implementations from other tests leaking into readConfig behavior.

This directly implements the mock-isolation guidance for hoisted mocks in test/AGENTS.md.


1851-1861: resolveConfigs with external defaultTest now has deterministic basePath, deref, and glob behavior

This beforeEach does three important things for isolation:

  • Sets cliState.basePath to a known value.
  • Resets mockDereference to a pass-through implementation.
  • Restores path.parse via vi.importActual('path').

The explicit vi.mocked(globSync).mockReturnValue(['config.json']); also removes any dependence on prior glob mocks. Together these make the external-defaultTest scenario stable under random test ordering.

If any other suites depend on cliState.basePath’s default value, double-check they explicitly set it in their own setup so this mutation can’t leak across files. Based on learnings about test independence.

Also applies to: 1881-1881

mldangelo and others added 6 commits December 10, 2025 14:11
Replace hard-coded /tmp paths with path.join(os.tmpdir(), ...)
to fix test failures on Windows where /tmp doesn't exist.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add mockReset() calls for mockedCheckModelAuditInstalled and mockedSpawn
in beforeEach to ensure test isolation when tests run in random order.
vi.clearAllMocks() only clears call history, not mock implementations
set via mockResolvedValue/mockReturnValue.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@JustinBeckwith JustinBeckwith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never seen this! Very cool.

mldangelo and others added 2 commits December 11, 2025 01:54
Add mockReset() and mockImplementation() for runExtensionHook in the
'Evaluator with external defaultTest' describe block's beforeEach hook.

This ensures the mock is properly reset between tests when running
with shuffle mode enabled, fixing the Windows Node 24 CI failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mldangelo mldangelo merged commit 14a981f into main Dec 11, 2025
35 checks passed
@mldangelo mldangelo deleted the fix/test-shuffle-isolation branch December 11, 2025 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unit tests are not structured to work when invoked via jest --randomize

2 participants