Skip to content

Commit 42ee455

Browse files
author
Bob Strahan
committed
Update assessment documentation with automatic bounding box processing feature
1 parent 2469417 commit 42ee455

File tree

1 file changed

+255
-22
lines changed

1 file changed

+255
-22
lines changed

docs/assessment.md

Lines changed: 255 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,101 @@ task_prompt: |
209209

210210
**Important**: Images are only processed when the `{DOCUMENT_IMAGE}` placeholder is explicitly present in the prompt template.
211211

212+
## Automatic Bounding Box Processing
213+
214+
The assessment feature includes automatic spatial localization capabilities that extract bounding box coordinates from LLM responses and convert them to a UI-compatible geometry format. This provides visual field localization consistent with Pattern-1 (BDA) without requiring additional configuration.
215+
216+
### How It Works
217+
218+
#### 1. Spatial Localization in Task Prompts
219+
220+
Include spatial localization guidelines in your assessment task prompts to request bounding box coordinates from the LLM:
221+
222+
```yaml
223+
assessment:
224+
task_prompt: |
225+
<spatial-localization-guidelines>
226+
For each field, provide bounding box coordinates:
227+
- bbox: [x1, y1, x2, y2] coordinates in normalized 0-1000 scale
228+
- page: Page number where the field appears (starting from 1)
229+
230+
Coordinate system:
231+
- Use normalized scale 0-1000 for both x and y axes
232+
- x1, y1 = top-left corner of bounding box
233+
- x2, y2 = bottom-right corner of bounding box
234+
- Ensure x2 > x1 and y2 > y1
235+
- Make bounding boxes tight around the actual text content
236+
</spatial-localization-guidelines>
237+
238+
Provide confidence assessments with spatial localization in JSON format:
239+
{
240+
"attribute_name": {
241+
"confidence": 0.85,
242+
"confidence_reason": "Clear text with high OCR confidence",
243+
"bbox": [100, 200, 300, 250],
244+
"page": 1
245+
}
246+
}
247+
```
248+
249+
#### 2. Automatic Coordinate Conversion
250+
251+
When the LLM provides bounding box data in the assessment response, the system automatically:
252+
253+
1. **Detects spatial data**: Identifies `bbox` and `page` fields in the LLM response
254+
2. **Converts coordinates**: Transforms from 0-1000 normalized scale to 0-1 decimal format
255+
3. **Calculates dimensions**: Converts [x1, y1, x2, y2] to {top, left, width, height} format
256+
4. **Creates geometry objects**: Formats data for Pattern-1/BDA UI compatibility
257+
5. **Processes recursively**: Handles nested group attributes and list items automatically
258+
259+
#### 3. Coordinate System Transformation
260+
261+
The conversion process transforms coordinates from the LLM's 0-1000 scale to the UI's 0-1 decimal format:
262+
263+
```python
264+
# LLM Response Format
265+
{
266+
"StatementDate": {
267+
"confidence": 0.95,
268+
"bbox": [100, 200, 400, 250], # [x1, y1, x2, y2] in 0-1000 scale
269+
"page": 1
270+
}
271+
}
272+
273+
# Automatically Converted to UI Format
274+
{
275+
"StatementDate": {
276+
"confidence": 0.95,
277+
"confidence_threshold": 0.85,
278+
"geometry": [{
279+
"boundingBox": {
280+
"top": 0.2, # y1 / 1000
281+
"left": 0.1, # x1 / 1000
282+
"width": 0.3, # (x2 - x1) / 1000
283+
"height": 0.05 # (y2 - y1) / 1000
284+
},
285+
"page": 1
286+
}]
287+
}
288+
}
289+
```
290+
291+
#### 4. Pattern-1 Compatibility
292+
293+
The geometry format exactly matches Pattern-1 (BDA) specifications:
294+
- **boundingBox object**: Contains top, left, width, height as decimal values (0-1)
295+
- **page field**: 1-based page numbering
296+
- **Array structure**: geometry as array to support multiple regions per field
297+
- **Recursive processing**: Handles nested attributes like `CompanyAddress.State`
298+
299+
### Configuration-Free Operation
300+
301+
The bounding box feature requires no additional configuration:
302+
- **Automatic detection**: System detects when LLM provides spatial data
303+
- **Fallback handling**: Works normally when no bounding boxes are provided
304+
- **Backward compatibility**: Existing configurations continue to work unchanged
305+
- **Optional enhancement**: Bounding boxes enhance existing assessment without breaking changes
306+
212307
## Output Format
213308

214309
Assessment results are appended to extraction results in the `explainability_info` format expected by the UI. The format varies based on the attribute type defined in your document class configuration.
@@ -229,7 +324,7 @@ attributes:
229324
description: "The date of the bank statement"
230325
```
231326

232-
**Assessment Response:**
327+
**Assessment Response (without spatial data):**
233328
```json
234329
{
235330
"StatementDate": {
@@ -239,6 +334,26 @@ attributes:
239334
}
240335
```
241336

337+
**Assessment Response (with automatic spatial data):**
338+
```json
339+
{
340+
"StatementDate": {
341+
"confidence": 0.85,
342+
"confidence_reason": "Date clearly visible in statement header",
343+
"confidence_threshold": 0.85,
344+
"geometry": [{
345+
"boundingBox": {
346+
"top": 0.2,
347+
"left": 0.1,
348+
"width": 0.15,
349+
"height": 0.03
350+
},
351+
"page": 1
352+
}]
353+
}
354+
}
355+
```
356+
242357
#### 2. Group Attributes
243358

244359
For nested object structures with multiple related fields:
@@ -256,17 +371,37 @@ attributes:
256371
description: "The bank routing number"
257372
```
258373

259-
**Assessment Response:**
374+
**Assessment Response (with automatic spatial data):**
260375
```json
261376
{
262377
"AccountDetails": {
263378
"AccountNumber": {
264379
"confidence": 0.90,
265-
"confidence_reason": "Account number clearly printed in standard location"
380+
"confidence_reason": "Account number clearly printed in standard location",
381+
"confidence_threshold": 0.90,
382+
"geometry": [{
383+
"boundingBox": {
384+
"top": 0.15,
385+
"left": 0.2,
386+
"width": 0.25,
387+
"height": 0.04
388+
},
389+
"page": 1
390+
}]
266391
},
267392
"RoutingNumber": {
268393
"confidence": 0.75,
269-
"confidence_reason": "Routing number visible but slightly blurred"
394+
"confidence_reason": "Routing number visible but slightly blurred",
395+
"confidence_threshold": 0.90,
396+
"geometry": [{
397+
"boundingBox": {
398+
"top": 0.2,
399+
"left": 0.2,
400+
"width": 0.2,
401+
"height": 0.03
402+
},
403+
"page": 1
404+
}]
270405
}
271406
}
272407
}
@@ -293,36 +428,96 @@ attributes:
293428
description: "Transaction amount"
294429
```
295430

296-
**Assessment Response:**
431+
**Assessment Response (with automatic spatial data):**
297432
```json
298433
{
299434
"Transactions": [
300435
{
301436
"Date": {
302437
"confidence": 0.95,
303-
"confidence_reason": "Date clearly printed in standard format"
438+
"confidence_reason": "Date clearly printed in standard format",
439+
"confidence_threshold": 0.80,
440+
"geometry": [{
441+
"boundingBox": {
442+
"top": 0.3,
443+
"left": 0.1,
444+
"width": 0.12,
445+
"height": 0.025
446+
},
447+
"page": 1
448+
}]
304449
},
305450
"Description": {
306451
"confidence": 0.88,
307-
"confidence_reason": "Description text is clear and readable"
452+
"confidence_reason": "Description text is clear and readable",
453+
"confidence_threshold": 0.75,
454+
"geometry": [{
455+
"boundingBox": {
456+
"top": 0.3,
457+
"left": 0.25,
458+
"width": 0.35,
459+
"height": 0.025
460+
},
461+
"page": 1
462+
}]
308463
},
309464
"Amount": {
310465
"confidence": 0.92,
311-
"confidence_reason": "Amount aligned in currency column with clear digits"
466+
"confidence_reason": "Amount aligned in currency column with clear digits",
467+
"confidence_threshold": 0.85,
468+
"geometry": [{
469+
"boundingBox": {
470+
"top": 0.3,
471+
"left": 0.65,
472+
"width": 0.15,
473+
"height": 0.025
474+
},
475+
"page": 1
476+
}]
312477
}
313478
},
314479
{
315480
"Date": {
316481
"confidence": 0.90,
317-
"confidence_reason": "Date visible but slightly smudged"
482+
"confidence_reason": "Date visible but slightly smudged",
483+
"confidence_threshold": 0.80,
484+
"geometry": [{
485+
"boundingBox": {
486+
"top": 0.33,
487+
"left": 0.1,
488+
"width": 0.12,
489+
"height": 0.025
490+
},
491+
"page": 1
492+
}]
318493
},
319494
"Description": {
320495
"confidence": 0.85,
321-
"confidence_reason": "Description partially cut off but main text readable"
496+
"confidence_reason": "Description partially cut off but main text readable",
497+
"confidence_threshold": 0.75,
498+
"geometry": [{
499+
"boundingBox": {
500+
"top": 0.33,
501+
"left": 0.25,
502+
"width": 0.3,
503+
"height": 0.025
504+
},
505+
"page": 1
506+
}]
322507
},
323508
"Amount": {
324509
"confidence": 0.94,
325-
"confidence_reason": "Amount clearly printed with proper decimal alignment"
510+
"confidence_reason": "Amount clearly printed with proper decimal alignment",
511+
"confidence_threshold": 0.85,
512+
"geometry": [{
513+
"boundingBox": {
514+
"top": 0.33,
515+
"left": 0.65,
516+
"width": 0.15,
517+
"height": 0.025
518+
},
519+
"page": 1
520+
}]
326521
}
327522
}
328523
]
@@ -359,53 +554,89 @@ Here's a complete example showing all three attribute types in a single assessme
359554
"StatementDate": {
360555
"confidence": 0.95,
361556
"confidence_reason": "Statement date clearly printed in header",
362-
"confidence_threshold": 0.85
557+
"confidence_threshold": 0.85,
558+
"geometry": [{
559+
"boundingBox": {"top": 0.1, "left": 0.1, "width": 0.15, "height": 0.03},
560+
"page": 1
561+
}]
363562
},
364563
"AccountDetails": {
365564
"AccountNumber": {
366565
"confidence": 0.90,
367566
"confidence_reason": "Account number clearly visible in account section",
368-
"confidence_threshold": 0.90
567+
"confidence_threshold": 0.90,
568+
"geometry": [{
569+
"boundingBox": {"top": 0.15, "left": 0.2, "width": 0.25, "height": 0.04},
570+
"page": 1
571+
}]
369572
},
370573
"RoutingNumber": {
371574
"confidence": 0.85,
372-
"confidence_reason": "Routing number printed clearly below account number",
373-
"confidence_threshold": 0.90
575+
"confidence_reason": "Routing number printed clearly below account number",
576+
"confidence_threshold": 0.90,
577+
"geometry": [{
578+
"boundingBox": {"top": 0.2, "left": 0.2, "width": 0.2, "height": 0.03},
579+
"page": 1
580+
}]
374581
}
375582
},
376583
"Transactions": [
377584
{
378585
"Date": {
379586
"confidence": 0.95,
380587
"confidence_reason": "Transaction date clearly printed",
381-
"confidence_threshold": 0.80
588+
"confidence_threshold": 0.80,
589+
"geometry": [{
590+
"boundingBox": {"top": 0.3, "left": 0.1, "width": 0.12, "height": 0.025},
591+
"page": 1
592+
}]
382593
},
383594
"Description": {
384595
"confidence": 0.88,
385596
"confidence_reason": "Description text is clear and complete",
386-
"confidence_threshold": 0.75
597+
"confidence_threshold": 0.75,
598+
"geometry": [{
599+
"boundingBox": {"top": 0.3, "left": 0.25, "width": 0.35, "height": 0.025},
600+
"page": 1
601+
}]
387602
},
388603
"Amount": {
389604
"confidence": 0.92,
390605
"confidence_reason": "Amount properly aligned in currency format",
391-
"confidence_threshold": 0.85
606+
"confidence_threshold": 0.85,
607+
"geometry": [{
608+
"boundingBox": {"top": 0.3, "left": 0.65, "width": 0.15, "height": 0.025},
609+
"page": 1
610+
}]
392611
}
393612
},
394613
{
395614
"Date": {
396615
"confidence": 0.90,
397616
"confidence_reason": "Date readable with minor print quality issues",
398-
"confidence_threshold": 0.80
617+
"confidence_threshold": 0.80,
618+
"geometry": [{
619+
"boundingBox": {"top": 0.33, "left": 0.1, "width": 0.12, "height": 0.025},
620+
"page": 1
621+
}]
399622
},
400623
"Description": {
401624
"confidence": 0.85,
402625
"confidence_reason": "Description clear, standard ATM format",
403-
"confidence_threshold": 0.75
626+
"confidence_threshold": 0.75,
627+
"geometry": [{
628+
"boundingBox": {"top": 0.33, "left": 0.25, "width": 0.3, "height": 0.025},
629+
"page": 1
630+
}]
404631
},
405632
"Amount": {
406633
"confidence": 0.94,
407634
"confidence_reason": "Negative amount clearly indicated with proper formatting",
408-
"confidence_threshold": 0.85
635+
"confidence_threshold": 0.85,
636+
"geometry": [{
637+
"boundingBox": {"top": 0.33, "left": 0.65, "width": 0.15, "height": 0.025},
638+
"page": 1
639+
}]
409640
}
410641
}
411642
]
@@ -569,7 +800,9 @@ When neither confidence nor threshold data is available, no confidence indicator
569800
570801
**2. Visual Editor Modal**
571802
- Same confidence indicators in the document image overlay editor
572-
- Visual connection between form fields and document bounding boxes
803+
- **Bounding Box Visualization**: When assessment includes geometry data, bounding boxes are automatically displayed on the document page image
804+
- Visual connection between form fields and document bounding boxes with spatial localization
805+
- Interactive overlay showing precise field locations from assessment spatial data
573806
- Confidence display for deeply nested extraction results
574807
575808
**3. Nested Data Support**

0 commit comments

Comments
 (0)