Skip to content

Conversation

@SamAinsworth-NHS
Copy link
Contributor

@SamAinsworth-NHS SamAinsworth-NHS commented Dec 3, 2025

Description

This PR optimizes the Playwright E2E test retry mechanism to significantly reduce test failure times while maintaining robustness for async operations.

Key Changes:

  • Replaced additive backoff (5s → 10s → 15s...) with capped exponential backoff (1s → 1.5s → 2.25s → 3.38s → 5s max)
  • Reduced total wait time on failure from ~180 seconds to ~28 seconds (85% improvement)
  • Added fail-fast logic for 5xx server errors and 400 Bad Request (immediate failure instead of retrying)
  • Implemented enhanced logging with attempt numbers, elapsed time tracking, and success timing
  • Added configurable maximum wait time via API_MAX_WAIT_MS environment variable
  • Cleaned up unused apiStepMs configuration parameter
  • Refactored elapsed time calculation into reusable helper function

Configuration Changes:

  • API_WAIT_MS: Default changed from 5000ms to 1000ms (initial retry wait)
  • API_MAX_WAIT_MS: New configurable parameter, defaults to 5000ms (maximum wait per retry)
  • API_STEP_MS: Removed (no longer used with exponential backoff)

Context

The existing retry logic used additive backoff, causing test failures to take up to 3 minutes to complete. This significantly slowed down CI/CD pipelines when tests legitimately failed (e.g., due to 5xx errors or bad test data).

Problems Solved:

  1. Long wait times for genuine failures (server errors, bad requests)
  2. Lack of visibility into retry attempts and timing
  3. No differentiation between retryable and non-retryable errors
  4. Tests waiting unnecessarily long when data would never appear

Example Impact:

  • Before: Failed test with 8 retries = 5s + 10s + 15s + 20s + 25s + 30s + 35s + 40s = 180 seconds
  • After: Failed test with 8 retries = 1s + 1.5s + 2.25s + 3.38s + 5s + 5s + 5s + 5s = ~28 seconds
  • Additional savings: 5xx/400 errors now fail in ~1 second instead of 28 seconds

https://nhsd-jira.digital.nhs.uk/browse/DTOSS-10703

Type of changes

  • Refactoring (non-breaking change)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would change existing functionality)
  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I am familiar with the contributing guidelines
  • I have followed the code style of the project
  • I have added tests to cover my changes
  • I have updated the documentation accordingly
  • This PR is a result of pair or mob programming

Sensitive Information Declaration

To ensure the utmost confidentiality and protect your and others privacy, we kindly ask you to NOT including PII (Personal Identifiable Information) / PID (Personal Identifiable Data) or any other sensitive data in this PR (Pull Request) and the codebase changes. We will remove any PR that do contain any sensitive information. We really appreciate your cooperation in this matter.

  • I confirm that neither PII/PID nor sensitive data are included in this PR and the codebase changes.

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

Unit Test Results

✔️ Tests 887 / 887 - passed in 67.8s
📝 Coverage 46.78%
📏 4528 / 9921 lines covered 🌿 1095 / 2100 branches covered
🔍 click here for more details

✏️ updated for commit a52c9e8

@SamAinsworth-NHS SamAinsworth-NHS marked this pull request as ready for review December 4, 2025 09:36
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 4, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants