Skip to content

Commit 1152889

Browse files
sidmohan0claude
andcommitted
feat(release): implement weekly release plan infrastructure
- Add automated release pipeline with bump2version configuration - Create GitHub Actions workflow for weekly Friday releases - Implement changelog automation script with commit categorization - Update setup.py with lightweight core + optional extras structure - Create new core.py with simple detect_pii() and anonymize_text() API - Update CI to test both lightweight core and full feature installs - Add release announcement and social media templates - Create weekly metrics tracking script for performance monitoring This implements the complete technical foundation for the 8-week weekly release strategy outlined in the release plan. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent a9eaffd commit 1152889

File tree

11 files changed

+913
-8
lines changed

11 files changed

+913
-8
lines changed

.bumpversion.cfg

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
[bumpversion]
2+
current_version = 4.1.1
3+
commit = True
4+
tag = True
5+
tag_name = v{new_version}
6+
message = Bump version: {current_version} → {new_version}
7+
8+
[bumpversion:file:datafog/__about__.py]
9+
search = __version__ = "{current_version}"
10+
replace = __version__ = "{new_version}"
11+
12+
[bumpversion:file:setup.py]
13+
search = version="{current_version}"
14+
replace = version="{new_version}"

.github/workflows/ci.yml

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,32 @@ jobs:
2020
- name: Run pre-commit
2121
run: pre-commit run --all-files --show-diff-on-failure
2222

23-
build:
23+
test-core:
24+
runs-on: ubuntu-latest
25+
strategy:
26+
matrix:
27+
python-version: ["3.10", "3.11", "3.12"]
28+
steps:
29+
- uses: actions/checkout@v4
30+
- name: Set up Python ${{ matrix.python-version }}
31+
uses: actions/setup-python@v5
32+
with:
33+
python-version: ${{ matrix.python-version }}
34+
cache: "pip"
35+
36+
- name: Install core dependencies only
37+
run: |
38+
python -m pip install --upgrade pip
39+
pip install -e .
40+
pip install pytest pytest-cov
41+
42+
- name: Test core functionality
43+
run: |
44+
python -c "from datafog import detect_pii, anonymize_text; print('Core API works')"
45+
python -c "from datafog import detect, process; print('Legacy API works')"
46+
python -m pytest tests/test_regex_annotator.py -v
47+
48+
test-full:
2449
runs-on: ubuntu-latest
2550
strategy:
2651
matrix:
@@ -38,13 +63,13 @@ jobs:
3863
sudo apt-get update
3964
sudo apt-get install -y tesseract-ocr libtesseract-dev
4065
41-
- name: Install dependencies
66+
- name: Install all dependencies
4267
run: |
4368
python -m pip install --upgrade pip
44-
pip install -e ".[nlp,ocr]"
69+
pip install -e ".[all]"
4570
pip install -r requirements-dev.txt
4671
47-
- name: Run tests
72+
- name: Run full test suite
4873
run: |
4974
python -m pytest tests/ --cov=datafog --cov-report=xml --cov-report=term
5075
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
name: Weekly Release
2+
3+
on:
4+
schedule:
5+
# Every Friday at 2 PM UTC
6+
- cron: "0 14 * * 5"
7+
workflow_dispatch:
8+
inputs:
9+
release_type:
10+
description: "Release type"
11+
required: true
12+
default: "patch"
13+
type: choice
14+
options:
15+
- patch
16+
- minor
17+
- major
18+
19+
jobs:
20+
release:
21+
runs-on: ubuntu-latest
22+
if: github.ref == 'refs/heads/dev'
23+
24+
steps:
25+
- uses: actions/checkout@v4
26+
with:
27+
fetch-depth: 0
28+
token: ${{ secrets.GITHUB_TOKEN }}
29+
30+
- name: Set up Python
31+
uses: actions/setup-python@v5
32+
with:
33+
python-version: "3.10"
34+
35+
- name: Install dependencies
36+
run: |
37+
python -m pip install --upgrade pip
38+
pip install bump2version build twine
39+
pip install -e .[all]
40+
41+
- name: Run full test suite
42+
run: |
43+
python -m pytest tests/ --cov=datafog
44+
python -m pytest tests/benchmark_text_service.py
45+
46+
- name: Generate changelog
47+
run: |
48+
python scripts/generate_changelog.py
49+
50+
- name: Determine version bump
51+
id: version
52+
run: |
53+
if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
54+
echo "bump_type=${{ github.event.inputs.release_type }}" >> $GITHUB_OUTPUT
55+
else
56+
# Auto-determine based on commit messages
57+
if git log --oneline $(git describe --tags --abbrev=0)..HEAD | grep -q "BREAKING"; then
58+
echo "bump_type=major" >> $GITHUB_OUTPUT
59+
elif git log --oneline $(git describe --tags --abbrev=0)..HEAD | grep -q "feat:"; then
60+
echo "bump_type=minor" >> $GITHUB_OUTPUT
61+
else
62+
echo "bump_type=patch" >> $GITHUB_OUTPUT
63+
fi
64+
fi
65+
66+
- name: Bump version
67+
run: |
68+
git config --local user.email "action@github.com"
69+
git config --local user.name "GitHub Action"
70+
bump2version ${{ steps.version.outputs.bump_type }}
71+
echo "NEW_VERSION=$(python -c 'from datafog import __version__; print(__version__)')" >> $GITHUB_ENV
72+
73+
- name: Build package
74+
run: |
75+
python -m build
76+
77+
- name: Check wheel size
78+
run: |
79+
WHEEL_SIZE=$(du -m dist/*.whl | cut -f1)
80+
if [ "$WHEEL_SIZE" -ge 5 ]; then
81+
echo "❌ Wheel size too large: ${WHEEL_SIZE}MB"
82+
exit 1
83+
fi
84+
echo "✅ Wheel size OK: ${WHEEL_SIZE}MB"
85+
86+
- name: Publish to PyPI
87+
env:
88+
TWINE_USERNAME: __token__
89+
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
90+
run: twine upload dist/*
91+
92+
- name: Create GitHub Release
93+
env:
94+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
95+
run: |
96+
gh release create v${{ env.NEW_VERSION }} \
97+
--title "DataFog v${{ env.NEW_VERSION }}" \
98+
--notes-file CHANGELOG_LATEST.md \
99+
dist/*
100+
101+
- name: Push changes
102+
run: |
103+
git push origin dev --tags
104+
105+
- name: Notify Discord
106+
if: env.DISCORD_WEBHOOK
107+
env:
108+
DISCORD_WEBHOOK: ${{ secrets.DISCORD_WEBHOOK }}
109+
run: |
110+
curl -X POST "$DISCORD_WEBHOOK" \
111+
-H "Content-Type: application/json" \
112+
-d "{\"content\": \"🚀 DataFog v${{ env.NEW_VERSION }} is live! Install with: \`pip install datafog==${{ env.NEW_VERSION }}\`\"}"

Claude.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
- **Graceful Degradation**: Smart imports with helpful error messages for missing extras
2828
- **Fair Benchmark Analysis**: Independent performance validation scripts
2929

30-
### ✅ Critical Bug Fixes Resolved (December 2024)
30+
### ✅ Critical Bug Fixes Resolved (May 2025)
3131
- **CI/CD Stability**: Fixed GitHub Actions failures while preserving lean architecture
3232
- **Structured Output Bug**: Resolved multi-chunk text processing in TextService
3333
- **Test Suite Health**: Improved from 33% to 87% test success rate (156/180 passing)

datafog/__init__.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,8 +77,11 @@ def _missing_dependency(*args, **kwargs):
7777
"SparkService", "datafog.services.spark_service", "distributed"
7878
)
7979

80+
# Import core API functions
81+
from .core import anonymize_text, detect_pii, get_supported_entities, scan_text
8082

81-
# Simple API for core functionality
83+
84+
# Simple API for core functionality (backward compatibility)
8285
def detect(text: str) -> list:
8386
"""
8487
Detect PII in text using regex patterns.
@@ -169,6 +172,10 @@ def process(text: str, anonymize: bool = False, method: str = "redact") -> dict:
169172
"__version__",
170173
"detect",
171174
"process",
175+
"detect_pii",
176+
"anonymize_text",
177+
"scan_text",
178+
"get_supported_entities",
172179
"AnnotationResult",
173180
"AnnotatorRequest",
174181
"AnonymizationResult",

datafog/core.py

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
"""
2+
DataFog Core API - Lightweight PII detection functions.
3+
4+
This module provides simple, lightweight functions for PII detection and anonymization
5+
without requiring heavy dependencies like spaCy or PyTorch.
6+
"""
7+
8+
from typing import Dict, List, Union
9+
10+
from datafog.models.anonymizer import AnonymizerType
11+
12+
# Engine types as constants
13+
REGEX_ENGINE = "regex"
14+
SPACY_ENGINE = "spacy"
15+
AUTO_ENGINE = "auto"
16+
17+
18+
def detect_pii(text: str) -> Dict[str, List[str]]:
19+
"""
20+
Simple PII detection using lightweight regex engine.
21+
22+
Args:
23+
text: Text to scan for PII
24+
25+
Returns:
26+
Dictionary mapping entity types to lists of detected values
27+
28+
Example:
29+
>>> result = detect_pii("Contact john@example.com at (555) 123-4567")
30+
>>> print(result)
31+
{'EMAIL': ['john@example.com'], 'PHONE': ['(555) 123-4567']}
32+
"""
33+
try:
34+
from datafog.services.text_service import TextService
35+
36+
# Use lightweight regex engine only
37+
service = TextService(engine=REGEX_ENGINE)
38+
result = service.annotate_text_sync(text, structured=True)
39+
40+
# Convert to simple dictionary format, filtering out empty matches
41+
pii_dict = {}
42+
for annotation in result:
43+
if annotation.text.strip(): # Only include non-empty matches
44+
entity_type = annotation.label
45+
if entity_type not in pii_dict:
46+
pii_dict[entity_type] = []
47+
pii_dict[entity_type].append(annotation.text)
48+
49+
return pii_dict
50+
51+
except ImportError as e:
52+
raise ImportError(
53+
"Core dependencies missing. Install with: pip install datafog[all]"
54+
) from e
55+
56+
57+
def anonymize_text(text: str, method: Union[str, AnonymizerType] = "redact") -> str:
58+
"""
59+
Simple text anonymization using lightweight regex engine.
60+
61+
Args:
62+
text: Text to anonymize
63+
method: Anonymization method ('redact', 'replace', or 'hash')
64+
65+
Returns:
66+
Anonymized text string
67+
68+
Example:
69+
>>> result = anonymize_text("Contact john@example.com", method="redact")
70+
>>> print(result)
71+
"Contact [EMAIL_REDACTED]"
72+
"""
73+
try:
74+
from datafog.models.anonymizer import Anonymizer, AnonymizerType
75+
from datafog.services.text_service import TextService
76+
77+
# Convert string method to enum if needed
78+
if isinstance(method, str):
79+
method_map = {
80+
"redact": AnonymizerType.REDACT,
81+
"replace": AnonymizerType.REPLACE,
82+
"hash": AnonymizerType.HASH,
83+
}
84+
if method not in method_map:
85+
raise ValueError(
86+
f"Invalid method: {method}. Use 'redact', 'replace', or 'hash'"
87+
)
88+
method = method_map[method]
89+
90+
# Use lightweight regex engine only
91+
service = TextService(engine=REGEX_ENGINE)
92+
span_results = service.annotate_text_sync(text, structured=True)
93+
94+
# Convert Span objects to AnnotationResult format for anonymizer, filtering empty matches
95+
from datafog.models.annotator import AnnotationResult
96+
97+
annotations = []
98+
for span in span_results:
99+
if span.text.strip(): # Only include non-empty matches
100+
annotation = AnnotationResult(
101+
entity_type=span.label,
102+
start=span.start,
103+
end=span.end,
104+
score=1.0, # Regex matches are certain
105+
recognition_metadata=None,
106+
)
107+
annotations.append(annotation)
108+
109+
# Create anonymizer and apply
110+
anonymizer = Anonymizer(anonymizer_type=method)
111+
result = anonymizer.anonymize(text, annotations)
112+
return result.anonymized_text
113+
114+
except ImportError as e:
115+
raise ImportError(
116+
"Core dependencies missing. Install with: pip install datafog[all]"
117+
) from e
118+
119+
120+
def scan_text(
121+
text: str, return_entities: bool = False
122+
) -> Union[bool, Dict[str, List[str]]]:
123+
"""
124+
Quick scan to check if text contains any PII.
125+
126+
Args:
127+
text: Text to scan
128+
return_entities: If True, return detected entities; if False, return boolean
129+
130+
Returns:
131+
Boolean indicating PII presence, or dictionary of detected entities
132+
133+
Example:
134+
>>> has_pii = scan_text("Contact john@example.com")
135+
>>> print(has_pii)
136+
True
137+
138+
>>> entities = scan_text("Contact john@example.com", return_entities=True)
139+
>>> print(entities)
140+
{'EMAIL': ['john@example.com']}
141+
"""
142+
entities = detect_pii(text)
143+
144+
if return_entities:
145+
return entities
146+
else:
147+
return len(entities) > 0
148+
149+
150+
def get_supported_entities() -> List[str]:
151+
"""
152+
Get list of PII entity types supported by the regex engine.
153+
154+
Returns:
155+
List of supported entity type names
156+
157+
Example:
158+
>>> entities = get_supported_entities()
159+
>>> print(entities)
160+
['EMAIL', 'PHONE', 'SSN', 'CREDIT_CARD', 'IP_ADDRESS', 'DOB', 'ZIP']
161+
"""
162+
try:
163+
from datafog.processing.text_processing.regex_annotator.regex_annotator import (
164+
RegexAnnotator,
165+
)
166+
167+
annotator = RegexAnnotator()
168+
return [entity.value for entity in annotator.supported_entities]
169+
170+
except ImportError:
171+
# Fallback to basic list if imports fail
172+
return ["EMAIL", "PHONE", "SSN", "CREDIT_CARD", "IP_ADDRESS", "DOB", "ZIP"]
173+
174+
175+
# Backward compatibility aliases
176+
detect = detect_pii
177+
process = anonymize_text

0 commit comments

Comments
 (0)