Skip to content

Conversation

@prjh2-nhs
Copy link
Contributor

@prjh2-nhs prjh2-nhs commented Dec 22, 2025

Description

1. Security-First Validation Pipeline

  • Pre-decode scanning: Detects encoded attack patterns.
  • Controlled decoding: Only decodes safe HTML entities.
  • Encoding safeguards: Rejects nested/double-encoded entities.
  • Post-decode validation: Applies strict regex for character patterns.

2. GPPracticeValidator (service.py)

  • Extends ServiceValidator for GP-specific name rules.
  • Enforces:
    • Max length: 200 characters.
    • Ampersand spacing: must have spaces on both sides.
    • Whitespace normalization.
  • Provides detailed error diagnostics via character categorization.

3. Allowed Characters

  • Alphanumeric (a–z, A–Z, 0–9).
  • Spaces, hyphens, apostrophes.
  • Periods, commas, parentheses.
  • Ampersands (with spacing rules).

4. Tests

  • test cases covering:
    • Security (XSS, SQL injection, nested encoding).
    • Functional and edge cases.
    • Whitespace handling, length limits, empty values.
  • Parameterized tests for decoding and validation.

5. Hyphen-Splitting Rule

  • Split on - (space-hyphen-space), keep only first part
  • Removes extra suffixes (e.g., "Practice - Branch 1" → "Practice")
    • Examples:
    • Abbey-Dale Medical Centre is transformed to Abbey-Dale Medical Centre
    • Lockside Medical Centre - T+G is transformed to Lockside Medical Centre
  • Applied before validation for security and efficiency

Context

Problems Solved:

  • XSS via encoded tags.
  • Encoding exploits (double/nested).
  • Injection attempts.

Data Quality Issues:

  • Inconsistent entity encoding.
  • Apostrophes encoded incorrectly.
  • Uncontrolled special characters.

Business Requirements:

  • Support apostrophes (e.g., O’Brien) and ampersands
  • Reject suspicious characters.

Compliance:

  • Structured error codes.
  • Safe logging (no log injection).
  • Audit trail for sanitization.

Impact:

  • Before →
    • FTRS-1964: Names migrated without validation, risking attacks and poor formatting.
    • FTRS-1961: Split on ANY hyphen (-), truncating names like "Abbey-Dale Medical Centre" → "Abbey" 
  • After →
    • FTRS-1964: Secure, compliant, and business-rule-aligned names with full audit trail.
    • FTRS-1961: Split only on - (space-hyphen-space), preserving hyphenated names

Sensitive Information Declaration

To ensure the utmost confidentiality and protect your and others privacy, we kindly ask you to NOT including PII (Personal Identifiable Information) / PID (Personal Identifiable Data) or any other sensitive data in this PR (Pull Request) and the codebase changes. We will remove any PR that do contain any sensitive information. We really appreciate your cooperation in this matter.

  • I confirm that neither PII/PID nor sensitive data are included in this PR and the codebase changes.

@prjh2-nhs prjh2-nhs changed the title fix(data-migration): FTRS-1964 html encoded characters content copied fix(data-migration): FTRS-1964 FTRS-1961 handle HTML-encoded chars and truncated GP names Dec 23, 2025
@prjh2-nhs prjh2-nhs marked this pull request as ready for review December 23, 2025 11:56
Copy link
Contributor

@nhs-shruthi-gowda nhs-shruthi-gowda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work small comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants