From 5d956c67523affe85b7efd1ee227b214e57b407a Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Thu, 18 Dec 2025 19:04:18 +0000
Subject: [PATCH 1/2] Initial plan


From e8d6b2a33e93343f1707468f4e219fbc19685a83 Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Thu, 18 Dec 2025 19:11:15 +0000
Subject: [PATCH 2/2] Add comprehensive v3.0 spec.md and progress.md for PyRIT
 0.10.0 integration

Co-authored-by: slister1001 <103153180+slister1001@users.noreply.github.com>
---
 .../azure/ai/evaluation/red_team/progress.md  | 252 ++++++++++++++
 .../azure/ai/evaluation/red_team/spec.md      | 314 ++++++++++++++++++
 2 files changed, 566 insertions(+)
 create mode 100644 sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/progress.md
 create mode 100644 sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/spec.md

diff --git a/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/progress.md b/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/progress.md
new file mode 100644
index 000000000000..4b96503e48f9
--- /dev/null
+++ b/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/progress.md
@@ -0,0 +1,252 @@
+# PyRIT FoundryScenario Integration - Progress Tracking
+
+**Last Updated:** 2025-12-18  
+**Current Phase:** Planning Complete  
+**Next Milestone:** Phase 1 Implementation Start
+
+## Current Status
+
+### Overall Progress: 0% Complete
+
+| Phase | Status | Start Date | Completion Date | Progress |
+|-------|--------|------------|-----------------|----------|
+| Planning & Design | ✅ Complete | 2025-10-01 | 2025-12-18 | 100% |
+| Phase 1: Core Infrastructure | ⏳ Not Started | TBD | TBD | 0% |
+| Phase 2: Result Processing | ⏳ Not Started | TBD | TBD | 0% |
+| Phase 3: End-to-End Integration | ⏳ Not Started | TBD | TBD | 0% |
+| Phase 4: Testing & Documentation | ⏳ Not Started | TBD | TBD | 0% |
+
+## Blockers
+
+### High Priority
+
+1. **PyRIT 0.10.0 Upgrade Required:**
+   - **Issue:** Project currently depends on PyRIT version that may not include 0.10.0 features
+   - **Action:** Upgrade to PyRIT 0.10.0+ in requirements
+   - **Owner:** TBD
+   - **Status:** Not Started
+
+2. **SQLite Migration Path:**
+   - **Issue:** Need to ensure smooth transition from any existing DuckDB usage
+   - **Action:** Audit codebase for DuckDB references
+   - **Owner:** TBD
+   - **Status:** Not Started
+
+3. **RAI Service Endpoint Availability:**
+   - **Issue:** Need to validate RAI simulation endpoint is accessible and configured
+   - **Action:** Verify endpoint credentials and permissions
+   - **Owner:** TBD
+   - **Status:** Not Started
+
+4. **Breaking Change in PyRIT 0.10.0:**
+   - **Issue:** Current code uses `initialize_pyrit(memory_db_type=DUCK_DB)` which no longer exists
+   - **Location:** `_red_team.py` line 222
+   - **Fix Required:** Change to `initialize_pyrit(memory_db_type=SQLITE, memory_db_path=db_path)`
+   - **Impact:** High - must be addressed before Phase 1 implementation
+
+### Medium Priority
+
+5. **Test Infrastructure Setup:**
+   - **Issue:** Need mock PyRIT scenario for testing without external dependencies
+   - **Action:** Create test fixtures and mocks
+   - **Owner:** TBD
+   - **Status:** Not Started
+
+6. **Performance Baseline:**
+   - **Issue:** Need to establish current performance metrics before migration
+   - **Action:** Run performance benchmarks on existing implementation
+   - **Owner:** TBD
+   - **Status:** Not Started
+
+## Phase 1: Core Infrastructure (Not Started)
+
+### Tasks
+
+- [ ] Create strategy mapping module
+  - [ ] Define `ATTACK_STRATEGY_TO_FOUNDRY_STRATEGY` mapping
+  - [ ] Add unit tests for mapping correctness
+  - [ ] Document strategy equivalence
+
+- [ ] Update PyRIT initialization
+  - [ ] Fix breaking change in `_red_team.py` line ~234
+  - [ ] Implement SQLite database path configuration
+  - [ ] Add error handling for initialization failures
+  - [ ] Add logging for initialization steps
+
+- [ ] Implement scenario manager
+  - [ ] Create `_scenario_manager.py` module
+  - [ ] Implement FoundryScenario creation logic
+  - [ ] Add memory label configuration
+  - [ ] Implement scenario execution orchestration
+
+- [ ] Add context preservation
+  - [ ] Design memory label schema
+  - [ ] Implement label attachment during scenario creation
+  - [ ] Add label-based retrieval methods
+  - [ ] Test label persistence and retrieval
+
+### Deliverables
+
+- [ ] `_utils/strategy_mapping.py` with complete mappings
+- [ ] Updated `_red_team.py` with SQLite initialization
+- [ ] `_scenario_manager.py` with basic scenario orchestration
+- [ ] Unit tests with >80% coverage for new modules
+
+## Phase 2: Result Processing (Not Started)
+
+### Tasks
+
+- [ ] Create result converter module
+  - [ ] Implement `_result_converter.py`
+  - [ ] Use `get_message_pieces()` API
+  - [ ] Extract MessagePiece data correctly
+  - [ ] Handle edge cases (empty results, errors)
+
+- [ ] Update result processor
+  - [ ] Migrate from PromptRequestPiece to MessagePiece
+  - [ ] Update data access patterns
+  - [ ] Preserve existing result format
+  - [ ] Add backward compatibility checks
+
+- [ ] Integration with evaluation pipeline
+  - [ ] Connect result converter to evaluation processor
+  - [ ] Validate result schema compatibility
+  - [ ] Add result export functionality
+  - [ ] Test end-to-end result flow
+
+### Deliverables
+
+- [ ] `_result_converter.py` with full conversion logic
+- [ ] Updated `_result_processor.py` using MessagePiece
+- [ ] Integration tests for result processing
+- [ ] Documentation for result schema
+
+## Phase 3: End-to-End Integration (Not Started)
+
+### Tasks
+
+- [ ] Connect all components
+  - [ ] Wire scenario manager into `_red_team.py`
+  - [ ] Connect result converter to main flow
+  - [ ] Add orchestration logic
+  - [ ] Implement cleanup procedures
+
+- [ ] Error handling and resilience
+  - [ ] Add retry logic for transient failures
+  - [ ] Implement proper error propagation
+  - [ ] Add logging and diagnostics
+  - [ ] Handle partial success scenarios
+
+- [ ] Progress tracking
+  - [ ] Implement progress callbacks
+  - [ ] Add status reporting
+  - [ ] Create progress persistence
+  - [ ] Add cancellation support
+
+### Deliverables
+
+- [ ] Fully integrated red team scan functionality
+- [ ] Comprehensive error handling
+- [ ] Progress tracking implementation
+- [ ] Integration tests covering all strategies
+
+## Phase 4: Testing & Documentation (Not Started)
+
+### Tasks
+
+- [ ] Unit testing
+  - [ ] Achieve >90% coverage for new modules
+  - [ ] Add edge case tests
+  - [ ] Add error scenario tests
+  - [ ] Add performance tests
+
+- [ ] Integration testing
+  - [ ] End-to-end scan tests
+  - [ ] Multi-strategy tests
+  - [ ] Context preservation tests
+  - [ ] Backward compatibility tests
+
+- [ ] Documentation
+  - [ ] Update API documentation
+  - [ ] Create migration guide
+  - [ ] Add code examples
+  - [ ] Create sample notebooks
+
+### Deliverables
+
+- [ ] Test suite with >90% coverage
+- [ ] Published API documentation
+- [ ] Migration guide for users
+- [ ] Sample code and notebooks
+
+## Design Decisions
+
+| Decision | Rationale | Date |
+|----------|-----------|------|
+| Target PyRIT 0.10.0+ | Latest stable version with SQLite-only backend | 2025-12-18 |
+| Use SQLite memory | Only option in PyRIT 0.10.0+ (DuckDB removed) | 2025-12-18 |
+| Use MessagePiece data model | PyRIT 0.10.0 renamed PromptRequestPiece | 2025-12-18 |
+| Preserve public API | Ensure backward compatibility for users | 2025-11-15 |
+| Use memory labels for context | Enable filtering and reconstruction of scan sessions | 2025-11-15 |
+| Abstract FoundryStrategy mapping | Decouple Azure AI Evaluation from PyRIT internals | 2025-10-15 |
+| Maintain abstraction layer | Protect against future PyRIT breaking changes | 2025-10-15 |
+
+## Risk Register
+
+| Risk | Status | Mitigation |
+|------|--------|------------|
+| PyRIT API changes in 0.10.0 | ⚠️ Active | Documented in spec, ready to implement | 
+| SQLite performance at scale | 🔍 Monitoring | Will benchmark during Phase 2 |
+| Memory label key collisions | ✅ Mitigated | Use namespaced keys |
+| Backward compatibility issues | 🔍 Monitoring | Extensive testing planned in Phase 4 |
+
+## Metrics
+
+### Code Quality Targets
+- Test Coverage: >90%
+- Pylint Score: >9.0
+- Type Coverage: >95%
+- Documentation Coverage: 100%
+
+### Performance Targets
+- Scenario Execution Latency: <5% increase vs current
+- Memory Query Performance: <100ms for typical scan
+- Result Processing Throughput: >100 conversations/sec
+- Resource Utilization: <10% increase in memory/CPU
+
+## Team Communication
+
+### Weekly Sync Topics
+1. Blocker review and resolution
+2. Phase progress updates
+3. Design decision review
+4. Risk assessment
+5. Next week planning
+
+### Stakeholder Updates
+- **Weekly:** Progress summary to team leads
+- **Bi-weekly:** Demo to product management
+- **Monthly:** Executive summary with metrics
+
+## Next Actions
+
+1. **Immediate (This Week):**
+   - Assign owners to Phase 1 tasks
+   - Upgrade PyRIT to 0.10.0
+   - Set up development environment
+
+2. **Short Term (Next 2 Weeks):**
+   - Begin Phase 1 implementation
+   - Create strategy mapping module
+   - Fix breaking change in `_red_team.py`
+
+3. **Medium Term (Next Month):**
+   - Complete Phase 1
+   - Begin Phase 2
+   - Conduct first integration tests
+
+---
+
+**Document Version History:**
+- v2.0 (2025-12-18): Updated for PyRIT 0.10.0 alignment
+- v1.0 (2025-10-01): Initial progress tracking document
diff --git a/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/spec.md b/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/spec.md
new file mode 100644
index 000000000000..ee0957b9f1d6
--- /dev/null
+++ b/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/spec.md
@@ -0,0 +1,314 @@
+# PyRIT FoundryScenario Integration - Technical Specification v3.0
+
+**Last Updated:** 2025-12-18  
+**Status:** Planning Complete  
+**Owner:** Azure AI Evaluation Team  
+**Target PyRIT Version:** 0.10.0
+
+> **Breaking Changes in PyRIT 0.10.0:**
+> - DuckDB support removed (SQLite only)
+> - `PromptRequestPiece` renamed to `MessagePiece`
+> - `get_prompt_request_pieces()` renamed to `get_message_pieces()`
+
+## Executive Summary
+
+This specification outlines the integration of PyRIT's FoundryScenario framework into Azure AI Evaluation's red teaming module. The integration leverages:
+
+- **Message-based data model** using `MessagePiece` for conversation tracking
+- **SQLite memory** (only option in PyRIT 0.10.0+) for persistent storage
+- **FoundryStrategy** mapping for attack strategy orchestration
+- **Memory labels** for context preservation across scan sessions
+
+## Core Components
+
+### Strategy Mapping Layer
+
+**File:** `_utils/strategy_mapping.py`
+
+Maps Azure AI Evaluation's `AttackStrategy` enum to PyRIT's `FoundryStrategy`:
+
+```python
+from pyrit.scenario.scenarios.foundry_scenario import FoundryStrategy
+
+ATTACK_STRATEGY_TO_FOUNDRY_STRATEGY: Dict[AttackStrategy, FoundryStrategy] = {
+    AttackStrategy.Direct: FoundryStrategy.Jailbreak,
+    AttackStrategy.PAIR: FoundryStrategy.Pair,
+    AttackStrategy.ROT13: FoundryStrategy.ROT13,
+    AttackStrategy.Base64: FoundryStrategy.Base64,
+}
+```
+
+### Scenario Manager
+
+**File:** `_scenario_manager.py`
+
+Manages FoundryScenario lifecycle and configuration:
+
+- Initialize PyRIT with SQLite (line ~234 breaking change fix)
+- Use RAI service simulation endpoint for adversarial chat
+- Create FoundryScenario instances per risk category
+
+**Key Responsibilities:**
+- Configure memory database paths
+- Set up prompt target endpoints
+- Manage scenario execution lifecycle
+- Handle objective generation and context injection
+
+### Result Converter
+
+**File:** `_result_converter.py`
+
+Converts PyRIT memory data to Azure AI Evaluation results:
+
+- Use `get_message_pieces()` instead of `get_prompt_request_pieces()`
+- Access `MessagePiece` properties (not `PromptRequestPiece`)
+- Extract conversation history and metadata
+- Generate evaluation-compatible result objects
+
+## PyRIT Memory: SQLite (v0.10.0+)
+
+**PyRIT 0.10.0 removed DuckDB support.** SQLite is now the only supported memory backend.
+
+### Implementation
+
+```python
+from pyrit.common import initialize_pyrit, SQLITE
+
+# In ScenarioManager.__init__()
+db_path = os.path.join(self.output_dir, "pyrit_memory.db")
+initialize_pyrit(memory_db_type=SQLITE, memory_db_path=db_path)
+```
+
+### Memory Retrieval
+
+```python
+from pyrit.memory import CentralMemory
+
+memory = CentralMemory.get_memory_instance()
+message_pieces = memory.get_message_pieces(
+    labels={"risk_category": risk_category.value}
+)
+```
+
+## Context Preservation with Memory Labels
+
+Memory labels attach metadata to each conversation turn, enabling filtering and reconstruction:
+
+```python
+# Attach labels when creating scenario
+scenario._memory_labels = {
+    "risk_category": risk_category.value,
+    "scan_session_id": scan_session_id,
+    "objective": objective,
+    "context": context_data,
+    "risk_subtype": risk_subtype,
+}
+
+# Retrieve during result processing
+memory = CentralMemory.get_memory_instance()
+message_pieces = memory.get_message_pieces(labels={"risk_category": "violence"})
+
+for piece in message_pieces:
+    context = piece.labels.get("context", {})
+    risk_subtype = piece.labels.get("risk_subtype", "")
+```
+
+## Migration Strategy
+
+### Phase 1: Core Infrastructure (Weeks 1-2)
+
+**⚠️ Breaking Change Alert:** Current `_red_team.py` (line 222) uses:
+```python
+initialize_pyrit(memory_db_type=DUCK_DB)  # ❌ Removed in PyRIT 0.10.0
+```
+Must update to:
+```python
+initialize_pyrit(memory_db_type=SQLITE, memory_db_path=db_path)  # ✅ Only option
+```
+
+**Tasks:**
+1. Create `_utils/strategy_mapping.py` with FoundryStrategy mappings
+2. Update PyRIT initialization in `_red_team.py` to use SQLite
+3. Implement `_scenario_manager.py` for FoundryScenario orchestration
+4. Add memory label configuration for context preservation
+
+**Deliverables:**
+- Working FoundryScenario execution for single risk category
+- SQLite memory database with proper labeling
+- Unit tests for strategy mapping
+
+### Phase 2: Result Processing (Week 3)
+
+**Tasks:**
+1. Implement `_result_converter.py` using `get_message_pieces()`
+2. Update `_result_processor.py` to use MessagePiece data model
+3. Integrate with existing evaluation pipeline
+4. Add result formatting and export logic
+
+**Deliverables:**
+- Complete result conversion pipeline
+- Integration tests with mock PyRIT scenarios
+- Documentation for result schema
+
+### Phase 3: End-to-End Integration (Week 4)
+
+**Tasks:**
+1. Connect all components in `_red_team.py`
+2. Add error handling and retry logic
+3. Implement progress tracking and logging
+4. Performance optimization and validation
+
+**Deliverables:**
+- Full red team scan execution
+- Integration tests covering all attack strategies
+- Performance benchmarks
+
+### Phase 4: Testing & Documentation (Week 5)
+
+**Tasks:**
+1. Comprehensive unit and integration tests
+2. Update API documentation
+3. Create migration guide for existing users
+4. Add code examples and usage samples
+
+**Deliverables:**
+- >90% test coverage
+- Published documentation
+- Sample notebooks
+
+## Architecture Diagrams
+
+### Component Interaction Flow
+
+```
+User API Call
+    ↓
+RedTeam.scan()
+    ↓
+ScenarioManager
+    ├── Initialize PyRIT (SQLite)
+    ├── Create FoundryScenario instances
+    └── Execute attack strategies
+         ↓
+PyRIT Memory (SQLite)
+    └── Store MessagePieces with labels
+         ↓
+ResultConverter
+    ├── Query message_pieces by labels
+    ├── Extract conversation history
+    └── Build RedTeamResult objects
+         ↓
+RedTeamResult
+    └── Return to user
+```
+
+### Data Flow
+
+```
+Attack Objective
+    ↓
+FoundryScenario.execute()
+    ↓
+Adversarial Prompt → Target System → Response
+    ↓
+MessagePiece (with labels)
+    ↓
+SQLite Database
+    ↓
+get_message_pieces(labels=...)
+    ↓
+RedTeamResult
+```
+
+## API Design
+
+### Public Interface (No Changes)
+
+The external API remains unchanged to ensure backward compatibility:
+
+```python
+from azure.ai.evaluation.red_team import RedTeam
+
+red_team = RedTeam(...)
+result = red_team.scan(
+    risk_categories=[RiskCategory.VIOLENCE],
+    attack_strategies=[AttackStrategy.Direct, AttackStrategy.PAIR]
+)
+```
+
+### Internal Changes
+
+All changes are internal implementation details:
+- Strategy mapping happens transparently
+- Memory storage is abstracted
+- Result conversion is automatic
+
+## Testing Strategy
+
+### Unit Tests
+- Strategy mapping correctness
+- Memory label configuration
+- Result conversion logic
+- Error handling
+
+### Integration Tests
+- End-to-end scenario execution
+- Memory persistence and retrieval
+- Multi-strategy orchestration
+- Context preservation
+
+### Performance Tests
+- Scenario execution latency
+- Memory query performance
+- Result processing throughput
+- Resource utilization
+
+## Risks and Mitigation
+
+| Risk | Impact | Mitigation |
+|------|--------|------------|
+| PyRIT API instability | High | Pin to stable version 0.10.0+ |
+| SQLite performance issues | Medium | Optimize query patterns, add indexes |
+| Memory label collisions | Low | Use namespaced label keys |
+| Breaking changes in future PyRIT versions | Medium | Maintain abstraction layer |
+
+## Success Criteria
+
+1. ✅ All attack strategies execute via FoundryScenario
+2. ✅ Context preservation works across scan sessions
+3. ✅ Results match existing format (backward compatible)
+4. ✅ Performance meets SLA (<5% degradation)
+5. ✅ Test coverage >90%
+6. ✅ Zero breaking changes to public API
+
+## Appendix
+
+### PyRIT 0.10.0 Breaking Changes Reference
+
+| Old API | New API | Impact |
+|---------|---------|--------|
+| `DUCK_DB` | `SQLITE` | High - required change |
+| `PromptRequestPiece` | `MessagePiece` | High - data model change |
+| `get_prompt_request_pieces()` | `get_message_pieces()` | High - method rename |
+| `memory_db_type=DUCK_DB` | `memory_db_type=SQLITE, memory_db_path=...` | High - signature change |
+
+### Variable Naming Conventions
+
+Use consistent terminology in code:
+- `message_pieces` (not `prompt_request_pieces`)
+- `piece` (not `prompt`)
+- `get_message_pieces()` (not `get_prompt_request_pieces()`)
+
+### Dependencies
+
+```python
+# requirements.txt
+pyrit>=0.10.0
+```
+
+---
+
+**Document Version History:**
+- v3.0 (2025-12-18): Updated for PyRIT 0.10.0 breaking changes
+- v2.0 (2025-11-15): Added context preservation strategy
+- v1.0 (2025-10-01): Initial specification