This repository provides synthetic data collections for testing, validating, and ensuring the effectiveness of Data Loss Prevention (DLP) and Data Security Posture Management (DSPM) solutions and other use cases wiki.
It includes sample datasets and scripts to generate new data, covering various categories such as Personal Identifiable Information (PII), Human Resources (HR), Payment Card Industry (PCI) compliant data, Protected Health Information (PHI), and others.
The provided python scripts also allow you to customize the generated data in different file types (e.g., 'json', 'csv', 'excel', 'word', 'pdf', or 'txt') by choosing different locales (e.g., en_US, fr_FR), record counts, and other parameters, enabling tailored testing scenarios for your specific use cases.
