feat: Add PII probing transforms and scoring #315
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Key Changes:
Added:
dreadnode/transforms/pii_extraction.py: 5 transformsrepeat_word_divergence: Trigger memorization (Carlini technique)continue_exact_text: Force prefix completioncomplete_from_internet: Probe memorized web contentpartial_pii_completion: Adaptive extraction with hintspublic_figure_pii_probe: Test public figure disclosuredreadnode/scorers/pii_advanced.py: 3 scorers + 2 helperstraining_data_memorization: Entropy/pattern detectioncredential_leakage: 13 credential types (API keys, tokens)pii_disclosure_rate: Binary scorer for eval aggregationwilson_score_interval: Statistical confidence intervalscalculate_disclosure_rate_with_ci: Helper for 95% CI analysisexamples/airt/pii_extraction_attacks.ipynb: Usage examplestests/test_pii_extraction_transforms.py: 21 transform teststests/test_pii_advanced_scorers.py: 38 scorer testsChanged:
dreadnode/transforms/__init__.py: Export pii_extraction moduledreadnode/scorers/__init__.py: Export new scorers and helpersGenerated Summary:
pii_advanced.pymodule.training_data_memorization: Detects verbatim memorized text from training data.credential_leakage: Identifies potential leaked credentials, API keys, and tokens.pii_disclosure_rate: Binary detection of PII for evaluation purposes.wilson_score_interval: Calculates statistical confidence intervals for PII disclosure rates.calculate_disclosure_rate_with_ci: Aggregates PII detection results to compute disclosure rates.__init__.pyfiles to include new scorer functions and maintain module imports.pii_extraction.pywith functions targeting specific PII extraction techniques:repeat_word_divergence,continue_exact_text,complete_from_internet,partial_pii_completion, andpublic_figure_pii_probe.This summary was generated with ❤️ by rigging
Research References: