Skip to content

Commit d2981d0

Browse files
author
Bob Strahan
committed
doc updates
1 parent a2997bc commit d2981d0

File tree

2 files changed

+172
-14
lines changed

2 files changed

+172
-14
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ SPDX-License-Identifier: MIT-0
1212
- **Assessment Feature for Extraction Confidence Evaluation (EXPERIMENTAL)**
1313
- Added new assessment service that evaluates extraction confidence using LLMs to analyze extraction results against source documents
1414
- Multi-modal assessment capability combining text analysis with document images for comprehensive confidence scoring
15-
- UI integration with explainability_info display showing per-attribute confidence scores and explanations
15+
- UI integration with explainability_info display showing per-attribute confidence scores, thresholds, and explanations
1616
- Optional deployment controlled by `IsAssessmentEnabled` parameter (defaults to false)
1717
- Added e2e-example-with-assessment.ipynb notebook for testing assessment workflow
1818

docs/assessment.md

Lines changed: 171 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ The Assessment feature provides automated confidence evaluation of document extr
1313
- **Per-Attribute Scoring**: Provides individual confidence scores and explanations for each extracted attribute
1414
- **Token-Optimized Processing**: Uses condensed text confidence data for 80-90% token reduction compared to full OCR results
1515
- **UI Integration**: Seamlessly displays assessment results in the web interface with explainability information
16+
- **Confidence Threshold Support**: Configurable global and per-attribute confidence thresholds with color-coded visual indicators
17+
- **Enhanced Visual Feedback**: Real-time confidence assessment with green/red/black color coding in all data viewing interfaces
1618
- **Optional Deployment**: Controlled by `IsAssessmentEnabled` parameter (defaults to false for cost optimization)
1719
- **Flexible Image Usage**: Images only processed when explicitly requested via `{DOCUMENT_IMAGE}` placeholder
1820

@@ -174,11 +176,161 @@ Assessment results are appended to extraction results in the `explainability_inf
174176
}
175177
```
176178

179+
## Confidence Thresholds
180+
181+
### Overview
182+
183+
The assessment feature supports flexible confidence threshold configuration to help users identify extraction results that may require review. Thresholds can be set globally or per-attribute, with the UI providing immediate visual feedback through color-coded displays.
184+
185+
### Configuration Options
186+
187+
#### Global Thresholds
188+
Set system-wide confidence requirements for all attributes:
189+
190+
```json
191+
{
192+
"inference_result": {
193+
"YTDNetPay": "75000",
194+
"PayPeriodStartDate": "2024-01-01"
195+
},
196+
"explainability_info": [
197+
{
198+
"global_confidence_threshold": 0.85,
199+
"YTDNetPay": {
200+
"confidence": 0.92,
201+
"confidence_reason": "Clear match found in document"
202+
},
203+
"PayPeriodStartDate": {
204+
"confidence": 0.75,
205+
"confidence_reason": "Moderate OCR confidence"
206+
}
207+
}
208+
]
209+
}
210+
```
211+
212+
#### Per-Attribute Thresholds
213+
Override global settings for specific fields requiring different confidence levels:
214+
215+
```json
216+
{
217+
"explainability_info": [
218+
{
219+
"YTDNetPay": {
220+
"confidence": 0.92,
221+
"confidence_threshold": 0.95,
222+
"confidence_reason": "Financial data requires high confidence"
223+
},
224+
"PayPeriodStartDate": {
225+
"confidence": 0.75,
226+
"confidence_threshold": 0.70,
227+
"confidence_reason": "Date fields can accept moderate confidence"
228+
}
229+
}
230+
]
231+
}
232+
```
233+
234+
#### Mixed Configuration
235+
Combine global defaults with attribute-specific overrides:
236+
237+
```json
238+
{
239+
"explainability_info": [
240+
{
241+
"global_confidence_threshold": 0.80,
242+
"CriticalField": {
243+
"confidence": 0.85,
244+
"confidence_threshold": 0.95,
245+
"confidence_reason": "Override: higher threshold for critical data"
246+
},
247+
"StandardField": {
248+
"confidence": 0.82,
249+
"confidence_reason": "Uses global threshold of 0.80"
250+
}
251+
}
252+
]
253+
}
254+
```
255+
256+
### Assessment Prompt Integration
257+
258+
Include threshold guidance in your assessment prompts to ensure consistent confidence evaluation:
259+
260+
```yaml
261+
assessment:
262+
task_prompt: |
263+
Assess extraction confidence using these thresholds as guidance:
264+
- Financial data (amounts, taxes): 0.90+ confidence required
265+
- Personal information (names, addresses): 0.85+ confidence required
266+
- Dates and standard fields: 0.75+ confidence acceptable
267+
268+
Provide confidence scores between 0.0 and 1.0 with explanatory reasoning:
269+
{
270+
"attribute_name": {
271+
"confidence": 0.85,
272+
"confidence_threshold": 0.90,
273+
"confidence_reason": "Explanation of confidence assessment"
274+
}
275+
}
276+
```
277+
177278
## UI Integration
178279

179-
Assessment results automatically appear in the web interface:
280+
Assessment results automatically appear in the web interface with enhanced visual indicators:
281+
282+
### Visual Feedback System
283+
284+
The UI provides immediate confidence feedback through color-coded displays:
285+
286+
#### Color Coding
287+
- 🟢 **Green**: Confidence meets or exceeds threshold (high confidence)
288+
- 🔴 **Red**: Confidence falls below threshold (requires review)
289+
- ⚫ **Black**: Confidence available but no threshold for comparison
180290

181-
1. **Visual Editor Modal**: Confidence scores and explanations display alongside extraction results
291+
#### Display Modes
292+
293+
**1. With Threshold (Color-Coded)**
294+
```
295+
YTDNetPay: 75000
296+
Confidence: 92.0% / Threshold: 95.0% [RED - Below Threshold]
297+
298+
PayPeriodStartDate: 2024-01-01
299+
Confidence: 85.0% / Threshold: 70.0% [GREEN - Above Threshold]
300+
```
301+
302+
**2. Confidence Only (Black Text)**
303+
```
304+
EmployeeName: John Smith
305+
Confidence: 88.5% [BLACK - No Threshold Set]
306+
```
307+
308+
**3. No Display**
309+
When neither confidence nor threshold data is available, no confidence indicator is shown.
310+
311+
### Interface Coverage
312+
313+
**1. Form View (JSONViewer)**
314+
- Color-coded confidence display in the editable form interface
315+
- Supports nested data structures (arrays, objects)
316+
- Real-time visual feedback during data editing
317+
318+
**2. Visual Editor Modal**
319+
- Same confidence indicators in the document image overlay editor
320+
- Visual connection between form fields and document bounding boxes
321+
- Confidence display for deeply nested extraction results
322+
323+
**3. Nested Data Support**
324+
Confidence indicators work with complex document structures:
325+
```
326+
FederalTaxes[0]:
327+
├── YTD: 2111.2 [Confidence: 67.6% / Threshold: 85.0% - RED]
328+
└── Period: 40.6 [Confidence: 75.8% - BLACK]
329+
330+
StateTaxes[0]:
331+
├── YTD: 438.36 [Confidence: 84.4% / Threshold: 80.0% - GREEN]
332+
└── Period: 8.43 [Confidence: 83.2% / Threshold: 80.0% - GREEN]
333+
```
182334
183335
## Cost Optimization
184336
@@ -191,14 +343,6 @@ The assessment feature implements several cost optimization techniques:
191343
3. **Optional Deployment**: Assessment infrastructure only deployed when `IsAssessmentEnabled=true`
192344
4. **Efficient Prompting**: Optimized prompt templates minimize token usage while maintaining accuracy
193345
194-
### Expected Costs
195-
196-
Cost factors for assessment processing:
197-
198-
- **Text-Only Assessment**: ~500-1,000 tokens per page
199-
- **Multimodal Assessment**: ~1,500-2,500 tokens per page (including image processing)
200-
- **Model Choice**: Claude 3.5 Sonnet recommended for balanced cost/performance
201-
- **Processing Time**: ~2-5 seconds per document section
202346
203347
## Testing and Validation
204348
@@ -252,11 +396,19 @@ ValueError: "Assessment prompt template formatting failed: missing required plac
252396
- **Claude 3 Haiku**: Consider for high-volume, cost-sensitive scenarios
253397
- **Temperature 0**: Use deterministic output for consistent confidence scoring
254398

255-
### 4. Integration Patterns
399+
### 4. Confidence Threshold Configuration
400+
401+
- **Risk-Based Thresholds**: Set higher thresholds (0.90+) for critical financial or personal data
402+
- **Field-Specific Requirements**: Use per-attribute thresholds for different data types
403+
- **Global Defaults**: Establish reasonable global thresholds (0.75-0.85) as baselines
404+
- **Incremental Tuning**: Start with conservative thresholds and adjust based on accuracy analysis
256405

257-
- **Conditional Logic**: Implement business rules based on confidence scores
258-
- **Human Review**: Route low-confidence extractions for manual review
406+
### 5. Integration Patterns
407+
408+
- **Conditional Logic**: Implement business rules based on confidence scores and thresholds
409+
- **Human Review**: Route low-confidence extractions (below threshold) for manual review
259410
- **Quality Metrics**: Track confidence distributions to identify improvement opportunities
411+
- **Visual Feedback**: Leverage color-coded UI indicators for immediate quality assessment
260412

261413
## Troubleshooting
262414

@@ -282,6 +434,12 @@ ValueError: "Assessment prompt template formatting failed: missing required plac
282434
- Consider text-only assessment without images
283435
- Optimize prompt templates to reduce unnecessary context
284436

437+
5. **Confidence Threshold Issues**
438+
- Verify `confidence_threshold` values are between 0.0 and 1.0
439+
- Check explainability_info structure includes threshold data
440+
- Ensure UI displays match expected color coding (green/red/black)
441+
- Validate nested data confidence display for complex structures
442+
285443
### Monitoring
286444

287445
Key metrics to monitor:

0 commit comments

Comments
 (0)