Skip to content

Commit 6cfbda1

Browse files
author
Bob Strahan
committed
Add bounding box extraction support to assessment service
1 parent 0b3af05 commit 6cfbda1

File tree

6 files changed

+1437
-20
lines changed

6 files changed

+1437
-20
lines changed

docs/assessment-bounding-boxes.md

Lines changed: 344 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,344 @@
1+
# Bounding Box Integration in Assessment Service
2+
3+
This document describes the bounding box functionality integrated into the IDP Assessment Service, enabling spatial localization of extracted data fields within document images.
4+
5+
## Overview
6+
7+
The Assessment Service now supports **optional bounding box extraction** as part of its confidence assessment workflow. When enabled, the service can:
8+
9+
- Extract bounding box coordinates for each assessed field
10+
- Convert coordinates to UI-compatible geometry format
11+
- Provide spatial localization alongside confidence scores
12+
- Maintain full backward compatibility when disabled
13+
14+
## Features
15+
16+
### Core Capabilities
17+
18+
- **Optional Feature**: Disabled by default, enabled via configuration
19+
- **UI Compatible**: Outputs geometry format compatible with existing pattern-1 UI
20+
- **Multi-page Support**: Handles bounding boxes across multiple document pages
21+
- **Error Resilient**: Gracefully handles invalid or incomplete bounding box data
22+
- **Coordinate Normalization**: Converts from 0-1000 scale to 0-1 normalized coordinates
23+
24+
### Output Format
25+
26+
When bounding boxes are enabled, the assessment output includes `geometry` arrays:
27+
28+
```json
29+
{
30+
"account_number": {
31+
"confidence": 0.95,
32+
"confidence_reason": "Clear text with high OCR confidence",
33+
"confidence_threshold": 0.9,
34+
"geometry": [
35+
{
36+
"boundingBox": {
37+
"top": 0.3751128193254686,
38+
"left": 0.4474376978868207,
39+
"width": 0.05959462312246394,
40+
"height": 0.010745484576798636
41+
},
42+
"page": 1
43+
}
44+
]
45+
}
46+
}
47+
```
48+
49+
## Configuration
50+
51+
### Basic Configuration
52+
53+
Add the `bounding_boxes` section to your assessment configuration and enhance your existing prompt template:
54+
55+
```yaml
56+
assessment:
57+
enabled: true
58+
model: us.amazon.nova-pro-v1:0
59+
temperature: 0.0
60+
61+
# Enable bounding box extraction
62+
bounding_boxes:
63+
enabled: true
64+
65+
# Enhanced prompt template extending existing assessment sophistication
66+
task_prompt: |
67+
<background>
68+
You are an expert document analysis assessment system. Your task is to evaluate the confidence of extraction results for a document of class {DOCUMENT_CLASS} and provide precise spatial localization for each field.
69+
</background>
70+
71+
<task>
72+
Analyze the extraction results against the source document and provide confidence assessments AND bounding box coordinates for each extracted attribute. Consider factors such as:
73+
1. Text clarity and OCR quality in the source regions
74+
2. Alignment between extracted values and document content
75+
3. Presence of clear evidence supporting the extraction
76+
4. Potential ambiguity or uncertainty in the source material
77+
5. Completeness and accuracy of the extracted information
78+
6. Precise spatial location of each field in the document
79+
</task>
80+
81+
<assessment-guidelines>
82+
For each attribute, provide:
83+
- A confidence score between 0.0 and 1.0 where:
84+
- 1.0 = Very high confidence, clear and unambiguous evidence
85+
- 0.8-0.9 = High confidence, strong evidence with minor uncertainty
86+
- 0.6-0.7 = Medium confidence, reasonable evidence but some ambiguity
87+
- 0.4-0.5 = Low confidence, weak or unclear evidence
88+
- 0.0-0.3 = Very low confidence, little to no supporting evidence
89+
- A clear explanation of the confidence reasoning
90+
- Precise spatial coordinates where the field appears in the document
91+
</assessment-guidelines>
92+
93+
<spatial-localization-guidelines>
94+
For each field, provide bounding box coordinates:
95+
- bbox: [x1, y1, x2, y2] coordinates in normalized 0-1000 scale
96+
- page: Page number where the field appears (starting from 1)
97+
98+
Coordinate system:
99+
- Use normalized scale 0-1000 for both x and y axes
100+
- x1, y1 = top-left corner of bounding box
101+
- x2, y2 = bottom-right corner of bounding box
102+
- Ensure x2 > x1 and y2 > y1
103+
- Make bounding boxes tight around the actual text content
104+
</spatial-localization-guidelines>
105+
106+
# ... (rest of comprehensive prompt structure)
107+
```
108+
109+
### Configuration Options
110+
111+
| Option | Type | Default | Description |
112+
|--------|------|---------|-------------|
113+
| `bounding_boxes.enabled` | boolean | `false` | Enable/disable bounding box extraction |
114+
115+
## LLM Prompt Requirements
116+
117+
### Expected LLM Response Format
118+
119+
The LLM must return assessment data in this format when bounding boxes are enabled:
120+
121+
```json
122+
{
123+
"field_name": {
124+
"confidence": 0.95,
125+
"confidence_reason": "Clear, readable text with high OCR confidence",
126+
"bbox": [100, 200, 300, 250],
127+
"page": 1
128+
}
129+
}
130+
```
131+
132+
### Coordinate System
133+
134+
- **Scale**: 0-1000 normalized coordinates
135+
- **Format**: `[x1, y1, x2, y2]` (top-left to bottom-right corners)
136+
- **Validation**: x2 > x1 and y2 > y1 (automatically corrected if reversed)
137+
- **Page Numbers**: Start from 1 (not 0-indexed)
138+
139+
### Prompt Template Guidelines
140+
141+
1. **Include `{DOCUMENT_IMAGE}` placeholder** for multimodal analysis
142+
2. **Request both confidence and bbox data** in the prompt
143+
3. **Specify coordinate system clearly** (0-1000 scale)
144+
4. **Provide clear JSON format examples**
145+
5. **Include page number requirements**
146+
147+
## Technical Implementation
148+
149+
### Architecture
150+
151+
```mermaid
152+
flowchart TD
153+
A[Assessment Service] --> B{Bounding Box Enabled?}
154+
B -->|No| C[Standard Assessment]
155+
B -->|Yes| D[Enhanced Assessment with Bounding Boxes]
156+
157+
D --> E[LLM Invocation with Images]
158+
E --> F[Parse LLM Response]
159+
F --> G[Extract Geometry Data]
160+
G --> H[Convert Coordinates]
161+
H --> I[Generate UI-Compatible Output]
162+
163+
C --> J[Final Assessment Result]
164+
I --> J
165+
```
166+
167+
### Core Methods
168+
169+
#### `_is_bounding_box_enabled()`
170+
Checks configuration to determine if bounding box extraction is enabled.
171+
172+
#### `_convert_bbox_to_geometry(bbox_coords, page_num)`
173+
Converts `[x1, y1, x2, y2]` coordinates to geometry format:
174+
- Normalizes from 0-1000 scale to 0-1
175+
- Converts corner coordinates to position + dimensions
176+
- Ensures proper coordinate ordering
177+
178+
#### `_extract_geometry_from_assessment(assessment_data)`
179+
Processes LLM response to extract and convert bounding box data:
180+
- Validates bbox and page data completeness
181+
- Handles error cases gracefully
182+
- Removes raw bbox data from final output
183+
184+
### Error Handling
185+
186+
The implementation includes comprehensive error handling:
187+
188+
1. **Invalid Coordinates**: Logs warning and removes invalid data
189+
2. **Missing Page Numbers**: Removes incomplete bounding box data
190+
3. **Malformed Responses**: Continues with confidence assessment only
191+
4. **Coordinate Validation**: Automatically corrects reversed coordinates
192+
193+
## Usage Examples
194+
195+
### Basic Usage
196+
197+
```python
198+
from idp_common.assessment.service import AssessmentService
199+
200+
# Configuration with bounding boxes enabled
201+
config = {
202+
"assessment": {
203+
"model": "us.amazon.nova-pro-v1:0",
204+
"bounding_boxes": {
205+
"enabled": True
206+
},
207+
"task_prompt": "... enhanced prompt template ..."
208+
}
209+
}
210+
211+
# Initialize service
212+
assessment_service = AssessmentService(config=config)
213+
214+
# Process document section
215+
document = assessment_service.process_document_section(document, section_id)
216+
```
217+
218+
### Checking Results
219+
220+
```python
221+
# Check if geometry data was generated
222+
extraction_data = s3.get_json_content(section.extraction_result_uri)
223+
explainability_info = extraction_data.get("explainability_info", [])
224+
225+
if explainability_info:
226+
assessment_result = explainability_info[0]
227+
228+
for field_name, field_assessment in assessment_result.items():
229+
if "geometry" in field_assessment:
230+
geometry = field_assessment["geometry"][0]
231+
bbox = geometry["boundingBox"]
232+
page = geometry["page"]
233+
234+
print(f"{field_name} found on page {page}")
235+
print(f"Location: top={bbox['top']}, left={bbox['left']}")
236+
print(f"Size: width={bbox['width']}, height={bbox['height']}")
237+
```
238+
239+
## Integration with UI
240+
241+
The geometry format is fully compatible with the existing pattern-1 UI:
242+
243+
- **Coordinate System**: Normalized 0-1 coordinates
244+
- **Bounding Box Format**: `{top, left, width, height}`
245+
- **Page Support**: Page numbers for multi-page documents
246+
- **Array Structure**: Supports multiple bounding boxes per field
247+
248+
The UI can immediately render bounding box overlays without additional processing.
249+
250+
## Testing
251+
252+
### Unit Tests
253+
254+
Comprehensive unit tests are provided in:
255+
`lib/idp_common_pkg/tests/unit/assessment/test_bounding_box_integration.py`
256+
257+
Test coverage includes:
258+
- Configuration validation
259+
- Coordinate conversion accuracy
260+
- Error handling for invalid data
261+
- Edge cases (reversed coordinates, missing data)
262+
- Integration with existing assessment workflow
263+
264+
### Running Tests
265+
266+
```bash
267+
cd lib/idp_common_pkg
268+
python -m pytest tests/unit/assessment/test_bounding_box_integration.py -v
269+
```
270+
271+
## Performance Considerations
272+
273+
### Impact on Processing Time
274+
275+
- **Minimal Overhead**: When disabled, no performance impact
276+
- **LLM Processing**: When enabled, may slightly increase inference time due to additional coordinate generation
277+
- **Coordinate Conversion**: Negligible computational overhead
278+
279+
### Memory Usage
280+
281+
- **Geometry Data**: Small additional memory footprint for coordinate storage
282+
- **Error Handling**: Graceful degradation prevents memory issues with invalid data
283+
284+
## Migration and Compatibility
285+
286+
### Backward Compatibility
287+
288+
- **Default Behavior**: Feature is disabled by default
289+
- **Existing Workflows**: No changes required to existing assessment configurations
290+
- **Output Format**: Standard assessment results unchanged when feature is disabled
291+
292+
### Migration Steps
293+
294+
1. **Update Configuration**: Add `bounding_boxes.enabled: true` to assessment config
295+
2. **Enhance Prompts**: Update prompt templates to request bounding box data
296+
3. **Test Integration**: Verify bounding box extraction with sample documents
297+
4. **Monitor Performance**: Validate processing time and accuracy
298+
299+
## Troubleshooting
300+
301+
### Common Issues
302+
303+
#### No Bounding Boxes Generated
304+
- **Check Configuration**: Ensure `bounding_boxes.enabled: true`
305+
- **Verify Prompt**: Confirm prompt requests bbox data
306+
- **Check Logs**: Look for geometry extraction warnings
307+
308+
#### Invalid Coordinates
309+
- **LLM Response**: Verify LLM returns valid `[x1, y1, x2, y2]` format
310+
- **Scale Validation**: Ensure coordinates are in 0-1000 range
311+
- **Page Numbers**: Confirm page numbers start from 1
312+
313+
#### UI Display Issues
314+
- **Coordinate Format**: Verify geometry format matches UI expectations
315+
- **Page Mapping**: Ensure page numbers align with UI page indexing
316+
317+
### Debug Logging
318+
319+
Enable debug logging to trace bounding box processing:
320+
321+
```python
322+
import logging
323+
logging.getLogger('idp_common.assessment.service').setLevel(logging.DEBUG)
324+
```
325+
326+
## Future Enhancements
327+
328+
Potential future improvements include:
329+
330+
1. **Multiple Bounding Boxes**: Support for fields spanning multiple locations
331+
2. **Confidence-Based Filtering**: Only generate bounding boxes for high-confidence fields
332+
3. **Coordinate Validation**: Enhanced validation against document dimensions
333+
4. **Performance Optimization**: Caching and batch processing improvements
334+
335+
## Conclusion
336+
337+
The bounding box integration provides powerful spatial localization capabilities while maintaining the robustness and reliability of the existing Assessment Service. The feature is designed to be:
338+
339+
- **Optional and Non-Intrusive**
340+
- **UI-Compatible**
341+
- **Error-Resilient**
342+
- **Easy to Configure**
343+
344+
This enhancement enables rich document annotation and visualization capabilities while preserving all existing functionality.

0 commit comments

Comments
 (0)