|
| 1 | +# Bounding Box Integration in Assessment Service |
| 2 | + |
| 3 | +This document describes the bounding box functionality integrated into the IDP Assessment Service, enabling spatial localization of extracted data fields within document images. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The Assessment Service now supports **optional bounding box extraction** as part of its confidence assessment workflow. When enabled, the service can: |
| 8 | + |
| 9 | +- Extract bounding box coordinates for each assessed field |
| 10 | +- Convert coordinates to UI-compatible geometry format |
| 11 | +- Provide spatial localization alongside confidence scores |
| 12 | +- Maintain full backward compatibility when disabled |
| 13 | + |
| 14 | +## Features |
| 15 | + |
| 16 | +### Core Capabilities |
| 17 | + |
| 18 | +- **Optional Feature**: Disabled by default, enabled via configuration |
| 19 | +- **UI Compatible**: Outputs geometry format compatible with existing pattern-1 UI |
| 20 | +- **Multi-page Support**: Handles bounding boxes across multiple document pages |
| 21 | +- **Error Resilient**: Gracefully handles invalid or incomplete bounding box data |
| 22 | +- **Coordinate Normalization**: Converts from 0-1000 scale to 0-1 normalized coordinates |
| 23 | + |
| 24 | +### Output Format |
| 25 | + |
| 26 | +When bounding boxes are enabled, the assessment output includes `geometry` arrays: |
| 27 | + |
| 28 | +```json |
| 29 | +{ |
| 30 | + "account_number": { |
| 31 | + "confidence": 0.95, |
| 32 | + "confidence_reason": "Clear text with high OCR confidence", |
| 33 | + "confidence_threshold": 0.9, |
| 34 | + "geometry": [ |
| 35 | + { |
| 36 | + "boundingBox": { |
| 37 | + "top": 0.3751128193254686, |
| 38 | + "left": 0.4474376978868207, |
| 39 | + "width": 0.05959462312246394, |
| 40 | + "height": 0.010745484576798636 |
| 41 | + }, |
| 42 | + "page": 1 |
| 43 | + } |
| 44 | + ] |
| 45 | + } |
| 46 | +} |
| 47 | +``` |
| 48 | + |
| 49 | +## Configuration |
| 50 | + |
| 51 | +### Basic Configuration |
| 52 | + |
| 53 | +Add the `bounding_boxes` section to your assessment configuration and enhance your existing prompt template: |
| 54 | + |
| 55 | +```yaml |
| 56 | +assessment: |
| 57 | + enabled: true |
| 58 | + model: us.amazon.nova-pro-v1:0 |
| 59 | + temperature: 0.0 |
| 60 | + |
| 61 | + # Enable bounding box extraction |
| 62 | + bounding_boxes: |
| 63 | + enabled: true |
| 64 | + |
| 65 | + # Enhanced prompt template extending existing assessment sophistication |
| 66 | + task_prompt: | |
| 67 | + <background> |
| 68 | + You are an expert document analysis assessment system. Your task is to evaluate the confidence of extraction results for a document of class {DOCUMENT_CLASS} and provide precise spatial localization for each field. |
| 69 | + </background> |
| 70 | +
|
| 71 | + <task> |
| 72 | + Analyze the extraction results against the source document and provide confidence assessments AND bounding box coordinates for each extracted attribute. Consider factors such as: |
| 73 | + 1. Text clarity and OCR quality in the source regions |
| 74 | + 2. Alignment between extracted values and document content |
| 75 | + 3. Presence of clear evidence supporting the extraction |
| 76 | + 4. Potential ambiguity or uncertainty in the source material |
| 77 | + 5. Completeness and accuracy of the extracted information |
| 78 | + 6. Precise spatial location of each field in the document |
| 79 | + </task> |
| 80 | +
|
| 81 | + <assessment-guidelines> |
| 82 | + For each attribute, provide: |
| 83 | + - A confidence score between 0.0 and 1.0 where: |
| 84 | + - 1.0 = Very high confidence, clear and unambiguous evidence |
| 85 | + - 0.8-0.9 = High confidence, strong evidence with minor uncertainty |
| 86 | + - 0.6-0.7 = Medium confidence, reasonable evidence but some ambiguity |
| 87 | + - 0.4-0.5 = Low confidence, weak or unclear evidence |
| 88 | + - 0.0-0.3 = Very low confidence, little to no supporting evidence |
| 89 | + - A clear explanation of the confidence reasoning |
| 90 | + - Precise spatial coordinates where the field appears in the document |
| 91 | + </assessment-guidelines> |
| 92 | +
|
| 93 | + <spatial-localization-guidelines> |
| 94 | + For each field, provide bounding box coordinates: |
| 95 | + - bbox: [x1, y1, x2, y2] coordinates in normalized 0-1000 scale |
| 96 | + - page: Page number where the field appears (starting from 1) |
| 97 | + |
| 98 | + Coordinate system: |
| 99 | + - Use normalized scale 0-1000 for both x and y axes |
| 100 | + - x1, y1 = top-left corner of bounding box |
| 101 | + - x2, y2 = bottom-right corner of bounding box |
| 102 | + - Ensure x2 > x1 and y2 > y1 |
| 103 | + - Make bounding boxes tight around the actual text content |
| 104 | + </spatial-localization-guidelines> |
| 105 | +
|
| 106 | + # ... (rest of comprehensive prompt structure) |
| 107 | +``` |
| 108 | +
|
| 109 | +### Configuration Options |
| 110 | +
|
| 111 | +| Option | Type | Default | Description | |
| 112 | +|--------|------|---------|-------------| |
| 113 | +| `bounding_boxes.enabled` | boolean | `false` | Enable/disable bounding box extraction | |
| 114 | + |
| 115 | +## LLM Prompt Requirements |
| 116 | + |
| 117 | +### Expected LLM Response Format |
| 118 | + |
| 119 | +The LLM must return assessment data in this format when bounding boxes are enabled: |
| 120 | + |
| 121 | +```json |
| 122 | +{ |
| 123 | + "field_name": { |
| 124 | + "confidence": 0.95, |
| 125 | + "confidence_reason": "Clear, readable text with high OCR confidence", |
| 126 | + "bbox": [100, 200, 300, 250], |
| 127 | + "page": 1 |
| 128 | + } |
| 129 | +} |
| 130 | +``` |
| 131 | + |
| 132 | +### Coordinate System |
| 133 | + |
| 134 | +- **Scale**: 0-1000 normalized coordinates |
| 135 | +- **Format**: `[x1, y1, x2, y2]` (top-left to bottom-right corners) |
| 136 | +- **Validation**: x2 > x1 and y2 > y1 (automatically corrected if reversed) |
| 137 | +- **Page Numbers**: Start from 1 (not 0-indexed) |
| 138 | + |
| 139 | +### Prompt Template Guidelines |
| 140 | + |
| 141 | +1. **Include `{DOCUMENT_IMAGE}` placeholder** for multimodal analysis |
| 142 | +2. **Request both confidence and bbox data** in the prompt |
| 143 | +3. **Specify coordinate system clearly** (0-1000 scale) |
| 144 | +4. **Provide clear JSON format examples** |
| 145 | +5. **Include page number requirements** |
| 146 | + |
| 147 | +## Technical Implementation |
| 148 | + |
| 149 | +### Architecture |
| 150 | + |
| 151 | +```mermaid |
| 152 | +flowchart TD |
| 153 | + A[Assessment Service] --> B{Bounding Box Enabled?} |
| 154 | + B -->|No| C[Standard Assessment] |
| 155 | + B -->|Yes| D[Enhanced Assessment with Bounding Boxes] |
| 156 | + |
| 157 | + D --> E[LLM Invocation with Images] |
| 158 | + E --> F[Parse LLM Response] |
| 159 | + F --> G[Extract Geometry Data] |
| 160 | + G --> H[Convert Coordinates] |
| 161 | + H --> I[Generate UI-Compatible Output] |
| 162 | + |
| 163 | + C --> J[Final Assessment Result] |
| 164 | + I --> J |
| 165 | +``` |
| 166 | + |
| 167 | +### Core Methods |
| 168 | + |
| 169 | +#### `_is_bounding_box_enabled()` |
| 170 | +Checks configuration to determine if bounding box extraction is enabled. |
| 171 | + |
| 172 | +#### `_convert_bbox_to_geometry(bbox_coords, page_num)` |
| 173 | +Converts `[x1, y1, x2, y2]` coordinates to geometry format: |
| 174 | +- Normalizes from 0-1000 scale to 0-1 |
| 175 | +- Converts corner coordinates to position + dimensions |
| 176 | +- Ensures proper coordinate ordering |
| 177 | + |
| 178 | +#### `_extract_geometry_from_assessment(assessment_data)` |
| 179 | +Processes LLM response to extract and convert bounding box data: |
| 180 | +- Validates bbox and page data completeness |
| 181 | +- Handles error cases gracefully |
| 182 | +- Removes raw bbox data from final output |
| 183 | + |
| 184 | +### Error Handling |
| 185 | + |
| 186 | +The implementation includes comprehensive error handling: |
| 187 | + |
| 188 | +1. **Invalid Coordinates**: Logs warning and removes invalid data |
| 189 | +2. **Missing Page Numbers**: Removes incomplete bounding box data |
| 190 | +3. **Malformed Responses**: Continues with confidence assessment only |
| 191 | +4. **Coordinate Validation**: Automatically corrects reversed coordinates |
| 192 | + |
| 193 | +## Usage Examples |
| 194 | + |
| 195 | +### Basic Usage |
| 196 | + |
| 197 | +```python |
| 198 | +from idp_common.assessment.service import AssessmentService |
| 199 | +
|
| 200 | +# Configuration with bounding boxes enabled |
| 201 | +config = { |
| 202 | + "assessment": { |
| 203 | + "model": "us.amazon.nova-pro-v1:0", |
| 204 | + "bounding_boxes": { |
| 205 | + "enabled": True |
| 206 | + }, |
| 207 | + "task_prompt": "... enhanced prompt template ..." |
| 208 | + } |
| 209 | +} |
| 210 | +
|
| 211 | +# Initialize service |
| 212 | +assessment_service = AssessmentService(config=config) |
| 213 | +
|
| 214 | +# Process document section |
| 215 | +document = assessment_service.process_document_section(document, section_id) |
| 216 | +``` |
| 217 | + |
| 218 | +### Checking Results |
| 219 | + |
| 220 | +```python |
| 221 | +# Check if geometry data was generated |
| 222 | +extraction_data = s3.get_json_content(section.extraction_result_uri) |
| 223 | +explainability_info = extraction_data.get("explainability_info", []) |
| 224 | +
|
| 225 | +if explainability_info: |
| 226 | + assessment_result = explainability_info[0] |
| 227 | + |
| 228 | + for field_name, field_assessment in assessment_result.items(): |
| 229 | + if "geometry" in field_assessment: |
| 230 | + geometry = field_assessment["geometry"][0] |
| 231 | + bbox = geometry["boundingBox"] |
| 232 | + page = geometry["page"] |
| 233 | + |
| 234 | + print(f"{field_name} found on page {page}") |
| 235 | + print(f"Location: top={bbox['top']}, left={bbox['left']}") |
| 236 | + print(f"Size: width={bbox['width']}, height={bbox['height']}") |
| 237 | +``` |
| 238 | + |
| 239 | +## Integration with UI |
| 240 | + |
| 241 | +The geometry format is fully compatible with the existing pattern-1 UI: |
| 242 | + |
| 243 | +- **Coordinate System**: Normalized 0-1 coordinates |
| 244 | +- **Bounding Box Format**: `{top, left, width, height}` |
| 245 | +- **Page Support**: Page numbers for multi-page documents |
| 246 | +- **Array Structure**: Supports multiple bounding boxes per field |
| 247 | + |
| 248 | +The UI can immediately render bounding box overlays without additional processing. |
| 249 | + |
| 250 | +## Testing |
| 251 | + |
| 252 | +### Unit Tests |
| 253 | + |
| 254 | +Comprehensive unit tests are provided in: |
| 255 | +`lib/idp_common_pkg/tests/unit/assessment/test_bounding_box_integration.py` |
| 256 | + |
| 257 | +Test coverage includes: |
| 258 | +- Configuration validation |
| 259 | +- Coordinate conversion accuracy |
| 260 | +- Error handling for invalid data |
| 261 | +- Edge cases (reversed coordinates, missing data) |
| 262 | +- Integration with existing assessment workflow |
| 263 | + |
| 264 | +### Running Tests |
| 265 | + |
| 266 | +```bash |
| 267 | +cd lib/idp_common_pkg |
| 268 | +python -m pytest tests/unit/assessment/test_bounding_box_integration.py -v |
| 269 | +``` |
| 270 | + |
| 271 | +## Performance Considerations |
| 272 | + |
| 273 | +### Impact on Processing Time |
| 274 | + |
| 275 | +- **Minimal Overhead**: When disabled, no performance impact |
| 276 | +- **LLM Processing**: When enabled, may slightly increase inference time due to additional coordinate generation |
| 277 | +- **Coordinate Conversion**: Negligible computational overhead |
| 278 | + |
| 279 | +### Memory Usage |
| 280 | + |
| 281 | +- **Geometry Data**: Small additional memory footprint for coordinate storage |
| 282 | +- **Error Handling**: Graceful degradation prevents memory issues with invalid data |
| 283 | + |
| 284 | +## Migration and Compatibility |
| 285 | + |
| 286 | +### Backward Compatibility |
| 287 | + |
| 288 | +- **Default Behavior**: Feature is disabled by default |
| 289 | +- **Existing Workflows**: No changes required to existing assessment configurations |
| 290 | +- **Output Format**: Standard assessment results unchanged when feature is disabled |
| 291 | + |
| 292 | +### Migration Steps |
| 293 | + |
| 294 | +1. **Update Configuration**: Add `bounding_boxes.enabled: true` to assessment config |
| 295 | +2. **Enhance Prompts**: Update prompt templates to request bounding box data |
| 296 | +3. **Test Integration**: Verify bounding box extraction with sample documents |
| 297 | +4. **Monitor Performance**: Validate processing time and accuracy |
| 298 | + |
| 299 | +## Troubleshooting |
| 300 | + |
| 301 | +### Common Issues |
| 302 | + |
| 303 | +#### No Bounding Boxes Generated |
| 304 | +- **Check Configuration**: Ensure `bounding_boxes.enabled: true` |
| 305 | +- **Verify Prompt**: Confirm prompt requests bbox data |
| 306 | +- **Check Logs**: Look for geometry extraction warnings |
| 307 | + |
| 308 | +#### Invalid Coordinates |
| 309 | +- **LLM Response**: Verify LLM returns valid `[x1, y1, x2, y2]` format |
| 310 | +- **Scale Validation**: Ensure coordinates are in 0-1000 range |
| 311 | +- **Page Numbers**: Confirm page numbers start from 1 |
| 312 | + |
| 313 | +#### UI Display Issues |
| 314 | +- **Coordinate Format**: Verify geometry format matches UI expectations |
| 315 | +- **Page Mapping**: Ensure page numbers align with UI page indexing |
| 316 | + |
| 317 | +### Debug Logging |
| 318 | + |
| 319 | +Enable debug logging to trace bounding box processing: |
| 320 | + |
| 321 | +```python |
| 322 | +import logging |
| 323 | +logging.getLogger('idp_common.assessment.service').setLevel(logging.DEBUG) |
| 324 | +``` |
| 325 | + |
| 326 | +## Future Enhancements |
| 327 | + |
| 328 | +Potential future improvements include: |
| 329 | + |
| 330 | +1. **Multiple Bounding Boxes**: Support for fields spanning multiple locations |
| 331 | +2. **Confidence-Based Filtering**: Only generate bounding boxes for high-confidence fields |
| 332 | +3. **Coordinate Validation**: Enhanced validation against document dimensions |
| 333 | +4. **Performance Optimization**: Caching and batch processing improvements |
| 334 | + |
| 335 | +## Conclusion |
| 336 | + |
| 337 | +The bounding box integration provides powerful spatial localization capabilities while maintaining the robustness and reliability of the existing Assessment Service. The feature is designed to be: |
| 338 | + |
| 339 | +- **Optional and Non-Intrusive** |
| 340 | +- **UI-Compatible** |
| 341 | +- **Error-Resilient** |
| 342 | +- **Easy to Configure** |
| 343 | + |
| 344 | +This enhancement enables rich document annotation and visualization capabilities while preserving all existing functionality. |
0 commit comments