aws-solutions-library-samples
diff --git a/‎CHANGELOG.md‎
Lines changed: 1 addition & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎config_library/pattern-2/default/config.yaml‎
Lines changed: 4 additions & 0 deletions b/‎config_library/pattern-2/default/config.yaml‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/assessment.md‎
Lines changed: 171 additions & 13 deletions b/‎docs/assessment.md‎
Lines changed: 171 additions & 13 deletions
diff --git a/‎lib/idp_common_pkg/idp_common/appsync/mutations.py‎
Lines changed: 5 additions & 0 deletions b/‎lib/idp_common_pkg/idp_common/appsync/mutations.py‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎lib/idp_common_pkg/idp_common/appsync/service.py‎
Lines changed: 29 additions & 0 deletions b/‎lib/idp_common_pkg/idp_common/appsync/service.py‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎lib/idp_common_pkg/idp_common/assessment/service.py‎
Lines changed: 47 additions & 2 deletions b/‎lib/idp_common_pkg/idp_common/assessment/service.py‎
Lines changed: 47 additions & 2 deletions
@@ -12,7 +12,7 @@ SPDX-License-Identifier: MIT-0
 - **Assessment Feature for Extraction Confidence Evaluation (EXPERIMENTAL)**
   - Added new assessment service that evaluates extraction confidence using LLMs to analyze extraction results against source documents
   - Multi-modal assessment capability combining text analysis with document images for comprehensive confidence scoring
-  - UI integration with explainability_info display showing per-attribute confidence scores and explanations
+  - UI integration with explainability_info display showing per-attribute confidence scores, thresholds, and explanations
   - Optional deployment controlled by `IsAssessmentEnabled` parameter (defaults to false)
   - Added e2e-example-with-assessment.ipynb notebook for testing assessment workflow
 
 
@@ -13,10 +13,13 @@ classes:
     attributes:
       - name: sender_name
         description: The name of the person or entity who wrote or sent the letter. Look for text following or near terms like 'from', 'sender', 'authored by', 'written by', or at the end of the letter before a signature.
+        confidence_threshold: '0.85'
       - name: sender_address
         description: The physical address of the sender, typically appearing at the top of the letter. May be labeled as 'address', 'location', or 'from address'.
+        confidence_threshold: '0.8'
       - name: recipient_name
         description: The name of the person or entity receiving the letter. Look for this after 'to', 'recipient', 'addressee', or at the beginning of the letter.
+        confidence_threshold: '0.9'
       - name: recipient_address
         description: The physical address where the letter is to be delivered. Often labeled as 'to address' or 'delivery address', typically appearing below the recipient name.
       - name: date
@@ -588,6 +591,7 @@ summarization:
   system_prompt: >-
     You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
 assessment:
+  default_confidence_threshold: '0.9'
   top_p: '0.1'
   max_tokens: '4096'
   top_k: '5'
 
@@ -13,6 +13,8 @@ The Assessment feature provides automated confidence evaluation of document extr
 - **Per-Attribute Scoring**: Provides individual confidence scores and explanations for each extracted attribute
 - **Token-Optimized Processing**: Uses condensed text confidence data for 80-90% token reduction compared to full OCR results
 - **UI Integration**: Seamlessly displays assessment results in the web interface with explainability information
+- **Confidence Threshold Support**: Configurable global and per-attribute confidence thresholds with color-coded visual indicators
+- **Enhanced Visual Feedback**: Real-time confidence assessment with green/red/black color coding in all data viewing interfaces
 - **Optional Deployment**: Controlled by `IsAssessmentEnabled` parameter (defaults to false for cost optimization)
 - **Flexible Image Usage**: Images only processed when explicitly requested via `{DOCUMENT_IMAGE}` placeholder
 
@@ -174,11 +176,161 @@ Assessment results are appended to extraction results in the `explainability_inf
 }
 ```
 
+## Confidence Thresholds
+
+### Overview
+
+The assessment feature supports flexible confidence threshold configuration to help users identify extraction results that may require review. Thresholds can be set globally or per-attribute, with the UI providing immediate visual feedback through color-coded displays.
+
+### Configuration Options
+
+#### Global Thresholds
+Set system-wide confidence requirements for all attributes:
+
+```json
+{
+  "inference_result": {
+    "YTDNetPay": "75000",
+    "PayPeriodStartDate": "2024-01-01"
+  },
+  "explainability_info": [
+    {
+      "global_confidence_threshold": 0.85,
+      "YTDNetPay": {
+        "confidence": 0.92,
+        "confidence_reason": "Clear match found in document"
+      },
+      "PayPeriodStartDate": {
+        "confidence": 0.75,
+        "confidence_reason": "Moderate OCR confidence"
+      }
+    }
+  ]
+}
+```
+
+#### Per-Attribute Thresholds
+Override global settings for specific fields requiring different confidence levels:
+
+```json
+{
+  "explainability_info": [
+    {
+      "YTDNetPay": {
+        "confidence": 0.92,
+        "confidence_threshold": 0.95,
+        "confidence_reason": "Financial data requires high confidence"
+      },
+      "PayPeriodStartDate": {
+        "confidence": 0.75,
+        "confidence_threshold": 0.70,
+        "confidence_reason": "Date fields can accept moderate confidence"
+      }
+    }
+  ]
+}
+```
+
+#### Mixed Configuration
+Combine global defaults with attribute-specific overrides:
+
+```json
+{
+  "explainability_info": [
+    {
+      "global_confidence_threshold": 0.80,
+      "CriticalField": {
+        "confidence": 0.85,
+        "confidence_threshold": 0.95,
+        "confidence_reason": "Override: higher threshold for critical data"
+      },
+      "StandardField": {
+        "confidence": 0.82,
+        "confidence_reason": "Uses global threshold of 0.80"
+      }
+    }
+  ]
+}
+```
+
+### Assessment Prompt Integration
+
+Include threshold guidance in your assessment prompts to ensure consistent confidence evaluation:
+
+```yaml
+assessment:
+  task_prompt: |
+    Assess extraction confidence using these thresholds as guidance:
+    - Financial data (amounts, taxes): 0.90+ confidence required
+    - Personal information (names, addresses): 0.85+ confidence required  
+    - Dates and standard fields: 0.75+ confidence acceptable
+    
+    Provide confidence scores between 0.0 and 1.0 with explanatory reasoning:
+    {
+      "attribute_name": {
+        "confidence": 0.85,
+        "confidence_threshold": 0.90,
+        "confidence_reason": "Explanation of confidence assessment"
+      }
+    }
+```
+
 ## UI Integration
 
-Assessment results automatically appear in the web interface:
+Assessment results automatically appear in the web interface with enhanced visual indicators:
+
+### Visual Feedback System
+
+The UI provides immediate confidence feedback through color-coded displays:
+
+#### Color Coding
+- 🟢 **Green**: Confidence meets or exceeds threshold (high confidence)
+- 🔴 **Red**: Confidence falls below threshold (requires review)
+- ⚫ **Black**: Confidence available but no threshold for comparison
 
-1. **Visual Editor Modal**: Confidence scores and explanations display alongside extraction results
+#### Display Modes
+
+**1. With Threshold (Color-Coded)**
+```
+YTDNetPay: 75000
+Confidence: 92.0% / Threshold: 95.0% [RED - Below Threshold]
+
+PayPeriodStartDate: 2024-01-01  
+Confidence: 85.0% / Threshold: 70.0% [GREEN - Above Threshold]
+```
+
+**2. Confidence Only (Black Text)**
+```
+EmployeeName: John Smith
+Confidence: 88.5% [BLACK - No Threshold Set]
+```
+
+**3. No Display**
+When neither confidence nor threshold data is available, no confidence indicator is shown.
+
+### Interface Coverage
+
+**1. Form View (JSONViewer)**
+- Color-coded confidence display in the editable form interface
+- Supports nested data structures (arrays, objects)
+- Real-time visual feedback during data editing
+
+**2. Visual Editor Modal**
+- Same confidence indicators in the document image overlay editor
+- Visual connection between form fields and document bounding boxes
+- Confidence display for deeply nested extraction results
+
+**3. Nested Data Support**
+Confidence indicators work with complex document structures:
+```
+FederalTaxes[0]:
+  ├── YTD: 2111.2 [Confidence: 67.6% / Threshold: 85.0% - RED]
+  └── Period: 40.6 [Confidence: 75.8% - BLACK]
+
+StateTaxes[0]:
+  ├── YTD: 438.36 [Confidence: 84.4% / Threshold: 80.0% - GREEN]
+  └── Period: 8.43 [Confidence: 83.2% / Threshold: 80.0% - GREEN]
+```
 
 ## Cost Optimization
 
@@ -191,14 +343,6 @@ The assessment feature implements several cost optimization techniques:
 3. **Optional Deployment**: Assessment infrastructure only deployed when `IsAssessmentEnabled=true`
 4. **Efficient Prompting**: Optimized prompt templates minimize token usage while maintaining accuracy
 
-### Expected Costs
-
-Cost factors for assessment processing:
-
-- **Text-Only Assessment**: ~500-1,000 tokens per page
-- **Multimodal Assessment**: ~1,500-2,500 tokens per page (including image processing)
-- **Model Choice**: Claude 3.5 Sonnet recommended for balanced cost/performance
-- **Processing Time**: ~2-5 seconds per document section
 
 ## Testing and Validation
 
@@ -252,11 +396,19 @@ ValueError: "Assessment prompt template formatting failed: missing required plac
 - **Claude 3 Haiku**: Consider for high-volume, cost-sensitive scenarios
 - **Temperature 0**: Use deterministic output for consistent confidence scoring
 
-### 4. Integration Patterns
+### 4. Confidence Threshold Configuration
+
+- **Risk-Based Thresholds**: Set higher thresholds (0.90+) for critical financial or personal data
+- **Field-Specific Requirements**: Use per-attribute thresholds for different data types
+- **Global Defaults**: Establish reasonable global thresholds (0.75-0.85) as baselines
+- **Incremental Tuning**: Start with conservative thresholds and adjust based on accuracy analysis
 
-- **Conditional Logic**: Implement business rules based on confidence scores
-- **Human Review**: Route low-confidence extractions for manual review
+### 5. Integration Patterns
+
+- **Conditional Logic**: Implement business rules based on confidence scores and thresholds
+- **Human Review**: Route low-confidence extractions (below threshold) for manual review
 - **Quality Metrics**: Track confidence distributions to identify improvement opportunities
+- **Visual Feedback**: Leverage color-coded UI indicators for immediate quality assessment
 
 ## Troubleshooting
 
@@ -282,6 +434,12 @@ ValueError: "Assessment prompt template formatting failed: missing required plac
    - Consider text-only assessment without images
    - Optimize prompt templates to reduce unnecessary context
 
+5. **Confidence Threshold Issues**
+   - Verify `confidence_threshold` values are between 0.0 and 1.0
+   - Check explainability_info structure includes threshold data
+   - Ensure UI displays match expected color coding (green/red/black)
+   - Validate nested data confidence display for complex structures
+
 ### Monitoring
 
 Key metrics to monitor:
 
@@ -35,6 +35,11 @@
             PageIds
             Class
             OutputJSONUri
+            ConfidenceThresholdAlerts {
+                attributeName
+                confidence
+                confidenceThreshold
+            }
         }
         Pages {
             Id
 
@@ -143,6 +143,19 @@ def _document_to_update_input(self, document: Document) -> Dict[str, Any]:
                     "Class": section.classification,
                     "OutputJSONUri": section.extraction_result_uri or "",
                 }
+
+                # Convert confidence threshold alerts
+                if section.confidence_threshold_alerts:
+                    alerts_data = []
+                    for alert in section.confidence_threshold_alerts:
+                        alert_data = {
+                            "attributeName": alert.get("attribute_name"),
+                            "confidence": alert.get("confidence"),
+                            "confidenceThreshold": alert.get("confidence_threshold"),
+                        }
+                        alerts_data.append(alert_data)
+                    section_data["ConfidenceThresholdAlerts"] = alerts_data
+
                 sections_data.append(section_data)
 
             if sections_data:
@@ -225,12 +238,28 @@ def _appsync_to_document(self, appsync_data: Dict[str, Any]) -> Document:
                 # Convert page IDs to strings
                 page_ids = [str(page_id) for page_id in section_data.get("PageIds", [])]
 
+                # Convert confidence threshold alerts
+                confidence_threshold_alerts = []
+                alerts_data = section_data.get("ConfidenceThresholdAlerts", [])
+                if alerts_data:
+                    for alert in alerts_data:
+                        confidence_threshold_alerts.append(
+                            {
+                                "attribute_name": alert.get("attributeName"),
+                                "confidence": alert.get("confidence"),
+                                "confidence_threshold": alert.get(
+                                    "confidenceThreshold"
+                                ),
+                            }
+                        )
+
                 doc.sections.append(
                     Section(
                         section_id=section_data.get("Id", ""),
                         classification=section_data.get("Class", ""),
                         page_ids=page_ids,
                         extraction_result_uri=section_data.get("OutputJSONUri"),
+                        confidence_threshold_alerts=confidence_threshold_alerts,
                     )
                 )
 
 
@@ -534,8 +534,45 @@ def process_document_section(self, document: Document, section_id: str) -> Docum
                     }
                 parsing_succeeded = False  # Mark that parsing failed
 
-            # Update the existing extraction result with assessment data
-            extraction_data["explainability_info"] = [assessment_data]
+            # Get confidence thresholds
+            default_confidence_threshold = assessment_config.get(
+                "default_confidence_threshold", 0.9
+            )
+
+            # Enhance assessment data with confidence thresholds and create confidence threshold alerts
+            enhanced_assessment_data = {}
+            confidence_threshold_alerts = []
+
+            for attr_name, attr_assessment in assessment_data.items():
+                # Get the attribute config to check for per-attribute confidence threshold
+                attr_threshold = default_confidence_threshold
+                for attr in attributes:
+                    if attr.get("name") == attr_name:
+                        attr_threshold = attr.get(
+                            "confidence_threshold", default_confidence_threshold
+                        )
+                        break
+                attr_threshold = float(attr_threshold)
+
+                # Add confidence_threshold to the assessment data
+                enhanced_assessment_data[attr_name] = {
+                    **attr_assessment,
+                    "confidence_threshold": attr_threshold,
+                }
+
+                # Check if confidence is below threshold and create alert
+                confidence = attr_assessment.get("confidence", 0.0)
+                if confidence < attr_threshold:
+                    confidence_threshold_alerts.append(
+                        {
+                            "attribute_name": attr_name,
+                            "confidence": confidence,
+                            "confidence_threshold": attr_threshold,
+                        }
+                    )
+
+            # Update the existing extraction result with enhanced assessment data
+            extraction_data["explainability_info"] = [enhanced_assessment_data]
             extraction_data["metadata"] = extraction_data.get("metadata", {})
             extraction_data["metadata"]["assessment_time_seconds"] = total_duration
             extraction_data["metadata"]["assessment_parsing_succeeded"] = (
@@ -548,6 +585,14 @@ def process_document_section(self, document: Document, section_id: str) -> Docum
                 extraction_data, bucket, key, content_type="application/json"
             )
 
+            # Update the section in the document with confidence threshold alerts
+            for doc_section in document.sections:
+                if doc_section.section_id == section_id:
+                    doc_section.confidence_threshold_alerts = (
+                        confidence_threshold_alerts
+                    )
+                    break
+
             # Update document with metering data
             document.metering = utils.merge_metering_data(
                 document.metering, metering or {}