Skip to content

Fast Regex Fallback #60

@sidmohan0

Description

@sidmohan0
  • Story 1.1 – Design fallback spec

    • Enumerate MVP entities (email, phone, SSN, credit-card, IPv4/6, DOB, ZIP).
    • Define return schema (same Pydantic model spaCy uses).
  • Story 1.2 – Implement RegexAnnotator ← blocked-by 1.1

    • Write hardened regex patterns.
    • Create RegexAnnotator.annotate(text) wrapper.
    • Add unit tests for each entity and edge cases.
  • Story 1.3 – Integrate into TextService ← blocked-by 1.2

    • Add engine="regex" | "spacy" | "auto" flag.
    • Default "auto": regex first, fall back to spaCy when needed.
    • Preserve behaviour when users ask explicitly for spaCy.
  • Story 1.4 – Performance guardrail ← blocked-by 1.3

    • Add pytest-benchmark test for 1 kB sample (< 20 µs target).
    • Fail CI if runtime > 110 % of saved baseline.

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions