Skip to content

Commit 88383ae

Browse files
committed
Merge branch 'feature/assessment-thresholds' into 'develop'
Enhanced Confidence Threshold Support with Visual Indicators See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!163
2 parents 04ca584 + d2981d0 commit 88383ae

File tree

24 files changed

+800
-111
lines changed

24 files changed

+800
-111
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ SPDX-License-Identifier: MIT-0
1212
- **Assessment Feature for Extraction Confidence Evaluation (EXPERIMENTAL)**
1313
- Added new assessment service that evaluates extraction confidence using LLMs to analyze extraction results against source documents
1414
- Multi-modal assessment capability combining text analysis with document images for comprehensive confidence scoring
15-
- UI integration with explainability_info display showing per-attribute confidence scores and explanations
15+
- UI integration with explainability_info display showing per-attribute confidence scores, thresholds, and explanations
1616
- Optional deployment controlled by `IsAssessmentEnabled` parameter (defaults to false)
1717
- Added e2e-example-with-assessment.ipynb notebook for testing assessment workflow
1818

config_library/pattern-2/default/config.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,13 @@ classes:
1313
attributes:
1414
- name: sender_name
1515
description: The name of the person or entity who wrote or sent the letter. Look for text following or near terms like 'from', 'sender', 'authored by', 'written by', or at the end of the letter before a signature.
16+
confidence_threshold: '0.85'
1617
- name: sender_address
1718
description: The physical address of the sender, typically appearing at the top of the letter. May be labeled as 'address', 'location', or 'from address'.
19+
confidence_threshold: '0.8'
1820
- name: recipient_name
1921
description: The name of the person or entity receiving the letter. Look for this after 'to', 'recipient', 'addressee', or at the beginning of the letter.
22+
confidence_threshold: '0.9'
2023
- name: recipient_address
2124
description: The physical address where the letter is to be delivered. Often labeled as 'to address' or 'delivery address', typically appearing below the recipient name.
2225
- name: date
@@ -588,6 +591,7 @@ summarization:
588591
system_prompt: >-
589592
You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
590593
assessment:
594+
default_confidence_threshold: '0.9'
591595
top_p: '0.1'
592596
max_tokens: '4096'
593597
top_k: '5'

docs/assessment.md

Lines changed: 171 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ The Assessment feature provides automated confidence evaluation of document extr
1313
- **Per-Attribute Scoring**: Provides individual confidence scores and explanations for each extracted attribute
1414
- **Token-Optimized Processing**: Uses condensed text confidence data for 80-90% token reduction compared to full OCR results
1515
- **UI Integration**: Seamlessly displays assessment results in the web interface with explainability information
16+
- **Confidence Threshold Support**: Configurable global and per-attribute confidence thresholds with color-coded visual indicators
17+
- **Enhanced Visual Feedback**: Real-time confidence assessment with green/red/black color coding in all data viewing interfaces
1618
- **Optional Deployment**: Controlled by `IsAssessmentEnabled` parameter (defaults to false for cost optimization)
1719
- **Flexible Image Usage**: Images only processed when explicitly requested via `{DOCUMENT_IMAGE}` placeholder
1820

@@ -174,11 +176,161 @@ Assessment results are appended to extraction results in the `explainability_inf
174176
}
175177
```
176178

179+
## Confidence Thresholds
180+
181+
### Overview
182+
183+
The assessment feature supports flexible confidence threshold configuration to help users identify extraction results that may require review. Thresholds can be set globally or per-attribute, with the UI providing immediate visual feedback through color-coded displays.
184+
185+
### Configuration Options
186+
187+
#### Global Thresholds
188+
Set system-wide confidence requirements for all attributes:
189+
190+
```json
191+
{
192+
"inference_result": {
193+
"YTDNetPay": "75000",
194+
"PayPeriodStartDate": "2024-01-01"
195+
},
196+
"explainability_info": [
197+
{
198+
"global_confidence_threshold": 0.85,
199+
"YTDNetPay": {
200+
"confidence": 0.92,
201+
"confidence_reason": "Clear match found in document"
202+
},
203+
"PayPeriodStartDate": {
204+
"confidence": 0.75,
205+
"confidence_reason": "Moderate OCR confidence"
206+
}
207+
}
208+
]
209+
}
210+
```
211+
212+
#### Per-Attribute Thresholds
213+
Override global settings for specific fields requiring different confidence levels:
214+
215+
```json
216+
{
217+
"explainability_info": [
218+
{
219+
"YTDNetPay": {
220+
"confidence": 0.92,
221+
"confidence_threshold": 0.95,
222+
"confidence_reason": "Financial data requires high confidence"
223+
},
224+
"PayPeriodStartDate": {
225+
"confidence": 0.75,
226+
"confidence_threshold": 0.70,
227+
"confidence_reason": "Date fields can accept moderate confidence"
228+
}
229+
}
230+
]
231+
}
232+
```
233+
234+
#### Mixed Configuration
235+
Combine global defaults with attribute-specific overrides:
236+
237+
```json
238+
{
239+
"explainability_info": [
240+
{
241+
"global_confidence_threshold": 0.80,
242+
"CriticalField": {
243+
"confidence": 0.85,
244+
"confidence_threshold": 0.95,
245+
"confidence_reason": "Override: higher threshold for critical data"
246+
},
247+
"StandardField": {
248+
"confidence": 0.82,
249+
"confidence_reason": "Uses global threshold of 0.80"
250+
}
251+
}
252+
]
253+
}
254+
```
255+
256+
### Assessment Prompt Integration
257+
258+
Include threshold guidance in your assessment prompts to ensure consistent confidence evaluation:
259+
260+
```yaml
261+
assessment:
262+
task_prompt: |
263+
Assess extraction confidence using these thresholds as guidance:
264+
- Financial data (amounts, taxes): 0.90+ confidence required
265+
- Personal information (names, addresses): 0.85+ confidence required
266+
- Dates and standard fields: 0.75+ confidence acceptable
267+
268+
Provide confidence scores between 0.0 and 1.0 with explanatory reasoning:
269+
{
270+
"attribute_name": {
271+
"confidence": 0.85,
272+
"confidence_threshold": 0.90,
273+
"confidence_reason": "Explanation of confidence assessment"
274+
}
275+
}
276+
```
277+
177278
## UI Integration
178279

179-
Assessment results automatically appear in the web interface:
280+
Assessment results automatically appear in the web interface with enhanced visual indicators:
281+
282+
### Visual Feedback System
283+
284+
The UI provides immediate confidence feedback through color-coded displays:
285+
286+
#### Color Coding
287+
- 🟢 **Green**: Confidence meets or exceeds threshold (high confidence)
288+
- 🔴 **Red**: Confidence falls below threshold (requires review)
289+
- ⚫ **Black**: Confidence available but no threshold for comparison
180290

181-
1. **Visual Editor Modal**: Confidence scores and explanations display alongside extraction results
291+
#### Display Modes
292+
293+
**1. With Threshold (Color-Coded)**
294+
```
295+
YTDNetPay: 75000
296+
Confidence: 92.0% / Threshold: 95.0% [RED - Below Threshold]
297+
298+
PayPeriodStartDate: 2024-01-01
299+
Confidence: 85.0% / Threshold: 70.0% [GREEN - Above Threshold]
300+
```
301+
302+
**2. Confidence Only (Black Text)**
303+
```
304+
EmployeeName: John Smith
305+
Confidence: 88.5% [BLACK - No Threshold Set]
306+
```
307+
308+
**3. No Display**
309+
When neither confidence nor threshold data is available, no confidence indicator is shown.
310+
311+
### Interface Coverage
312+
313+
**1. Form View (JSONViewer)**
314+
- Color-coded confidence display in the editable form interface
315+
- Supports nested data structures (arrays, objects)
316+
- Real-time visual feedback during data editing
317+
318+
**2. Visual Editor Modal**
319+
- Same confidence indicators in the document image overlay editor
320+
- Visual connection between form fields and document bounding boxes
321+
- Confidence display for deeply nested extraction results
322+
323+
**3. Nested Data Support**
324+
Confidence indicators work with complex document structures:
325+
```
326+
FederalTaxes[0]:
327+
├── YTD: 2111.2 [Confidence: 67.6% / Threshold: 85.0% - RED]
328+
└── Period: 40.6 [Confidence: 75.8% - BLACK]
329+
330+
StateTaxes[0]:
331+
├── YTD: 438.36 [Confidence: 84.4% / Threshold: 80.0% - GREEN]
332+
└── Period: 8.43 [Confidence: 83.2% / Threshold: 80.0% - GREEN]
333+
```
182334
183335
## Cost Optimization
184336
@@ -191,14 +343,6 @@ The assessment feature implements several cost optimization techniques:
191343
3. **Optional Deployment**: Assessment infrastructure only deployed when `IsAssessmentEnabled=true`
192344
4. **Efficient Prompting**: Optimized prompt templates minimize token usage while maintaining accuracy
193345
194-
### Expected Costs
195-
196-
Cost factors for assessment processing:
197-
198-
- **Text-Only Assessment**: ~500-1,000 tokens per page
199-
- **Multimodal Assessment**: ~1,500-2,500 tokens per page (including image processing)
200-
- **Model Choice**: Claude 3.5 Sonnet recommended for balanced cost/performance
201-
- **Processing Time**: ~2-5 seconds per document section
202346
203347
## Testing and Validation
204348
@@ -252,11 +396,19 @@ ValueError: "Assessment prompt template formatting failed: missing required plac
252396
- **Claude 3 Haiku**: Consider for high-volume, cost-sensitive scenarios
253397
- **Temperature 0**: Use deterministic output for consistent confidence scoring
254398

255-
### 4. Integration Patterns
399+
### 4. Confidence Threshold Configuration
400+
401+
- **Risk-Based Thresholds**: Set higher thresholds (0.90+) for critical financial or personal data
402+
- **Field-Specific Requirements**: Use per-attribute thresholds for different data types
403+
- **Global Defaults**: Establish reasonable global thresholds (0.75-0.85) as baselines
404+
- **Incremental Tuning**: Start with conservative thresholds and adjust based on accuracy analysis
256405

257-
- **Conditional Logic**: Implement business rules based on confidence scores
258-
- **Human Review**: Route low-confidence extractions for manual review
406+
### 5. Integration Patterns
407+
408+
- **Conditional Logic**: Implement business rules based on confidence scores and thresholds
409+
- **Human Review**: Route low-confidence extractions (below threshold) for manual review
259410
- **Quality Metrics**: Track confidence distributions to identify improvement opportunities
411+
- **Visual Feedback**: Leverage color-coded UI indicators for immediate quality assessment
260412

261413
## Troubleshooting
262414

@@ -282,6 +434,12 @@ ValueError: "Assessment prompt template formatting failed: missing required plac
282434
- Consider text-only assessment without images
283435
- Optimize prompt templates to reduce unnecessary context
284436

437+
5. **Confidence Threshold Issues**
438+
- Verify `confidence_threshold` values are between 0.0 and 1.0
439+
- Check explainability_info structure includes threshold data
440+
- Ensure UI displays match expected color coding (green/red/black)
441+
- Validate nested data confidence display for complex structures
442+
285443
### Monitoring
286444

287445
Key metrics to monitor:

lib/idp_common_pkg/idp_common/appsync/mutations.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,11 @@
3535
PageIds
3636
Class
3737
OutputJSONUri
38+
ConfidenceThresholdAlerts {
39+
attributeName
40+
confidence
41+
confidenceThreshold
42+
}
3843
}
3944
Pages {
4045
Id

lib/idp_common_pkg/idp_common/appsync/service.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,19 @@ def _document_to_update_input(self, document: Document) -> Dict[str, Any]:
143143
"Class": section.classification,
144144
"OutputJSONUri": section.extraction_result_uri or "",
145145
}
146+
147+
# Convert confidence threshold alerts
148+
if section.confidence_threshold_alerts:
149+
alerts_data = []
150+
for alert in section.confidence_threshold_alerts:
151+
alert_data = {
152+
"attributeName": alert.get("attribute_name"),
153+
"confidence": alert.get("confidence"),
154+
"confidenceThreshold": alert.get("confidence_threshold"),
155+
}
156+
alerts_data.append(alert_data)
157+
section_data["ConfidenceThresholdAlerts"] = alerts_data
158+
146159
sections_data.append(section_data)
147160

148161
if sections_data:
@@ -225,12 +238,28 @@ def _appsync_to_document(self, appsync_data: Dict[str, Any]) -> Document:
225238
# Convert page IDs to strings
226239
page_ids = [str(page_id) for page_id in section_data.get("PageIds", [])]
227240

241+
# Convert confidence threshold alerts
242+
confidence_threshold_alerts = []
243+
alerts_data = section_data.get("ConfidenceThresholdAlerts", [])
244+
if alerts_data:
245+
for alert in alerts_data:
246+
confidence_threshold_alerts.append(
247+
{
248+
"attribute_name": alert.get("attributeName"),
249+
"confidence": alert.get("confidence"),
250+
"confidence_threshold": alert.get(
251+
"confidenceThreshold"
252+
),
253+
}
254+
)
255+
228256
doc.sections.append(
229257
Section(
230258
section_id=section_data.get("Id", ""),
231259
classification=section_data.get("Class", ""),
232260
page_ids=page_ids,
233261
extraction_result_uri=section_data.get("OutputJSONUri"),
262+
confidence_threshold_alerts=confidence_threshold_alerts,
234263
)
235264
)
236265

lib/idp_common_pkg/idp_common/assessment/service.py

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -534,8 +534,45 @@ def process_document_section(self, document: Document, section_id: str) -> Docum
534534
}
535535
parsing_succeeded = False # Mark that parsing failed
536536

537-
# Update the existing extraction result with assessment data
538-
extraction_data["explainability_info"] = [assessment_data]
537+
# Get confidence thresholds
538+
default_confidence_threshold = assessment_config.get(
539+
"default_confidence_threshold", 0.9
540+
)
541+
542+
# Enhance assessment data with confidence thresholds and create confidence threshold alerts
543+
enhanced_assessment_data = {}
544+
confidence_threshold_alerts = []
545+
546+
for attr_name, attr_assessment in assessment_data.items():
547+
# Get the attribute config to check for per-attribute confidence threshold
548+
attr_threshold = default_confidence_threshold
549+
for attr in attributes:
550+
if attr.get("name") == attr_name:
551+
attr_threshold = attr.get(
552+
"confidence_threshold", default_confidence_threshold
553+
)
554+
break
555+
attr_threshold = float(attr_threshold)
556+
557+
# Add confidence_threshold to the assessment data
558+
enhanced_assessment_data[attr_name] = {
559+
**attr_assessment,
560+
"confidence_threshold": attr_threshold,
561+
}
562+
563+
# Check if confidence is below threshold and create alert
564+
confidence = attr_assessment.get("confidence", 0.0)
565+
if confidence < attr_threshold:
566+
confidence_threshold_alerts.append(
567+
{
568+
"attribute_name": attr_name,
569+
"confidence": confidence,
570+
"confidence_threshold": attr_threshold,
571+
}
572+
)
573+
574+
# Update the existing extraction result with enhanced assessment data
575+
extraction_data["explainability_info"] = [enhanced_assessment_data]
539576
extraction_data["metadata"] = extraction_data.get("metadata", {})
540577
extraction_data["metadata"]["assessment_time_seconds"] = total_duration
541578
extraction_data["metadata"]["assessment_parsing_succeeded"] = (
@@ -548,6 +585,14 @@ def process_document_section(self, document: Document, section_id: str) -> Docum
548585
extraction_data, bucket, key, content_type="application/json"
549586
)
550587

588+
# Update the section in the document with confidence threshold alerts
589+
for doc_section in document.sections:
590+
if doc_section.section_id == section_id:
591+
doc_section.confidence_threshold_alerts = (
592+
confidence_threshold_alerts
593+
)
594+
break
595+
551596
# Update document with metering data
552597
document.metering = utils.merge_metering_data(
553598
document.metering, metering or {}

0 commit comments

Comments
 (0)