@@ -209,6 +209,101 @@ task_prompt: |
209209
210210**Important**: Images are only processed when the `{DOCUMENT_IMAGE}` placeholder is explicitly present in the prompt template.
211211
212+ # # Automatic Bounding Box Processing
213+
214+ The assessment feature includes automatic spatial localization capabilities that extract bounding box coordinates from LLM responses and convert them to a UI-compatible geometry format. This provides visual field localization consistent with Pattern-1 (BDA) without requiring additional configuration.
215+
216+ # ## How It Works
217+
218+ # ### 1. Spatial Localization in Task Prompts
219+
220+ Include spatial localization guidelines in your assessment task prompts to request bounding box coordinates from the LLM :
221+
222+ ` ` ` yaml
223+ assessment:
224+ task_prompt: |
225+ <spatial-localization-guidelines>
226+ For each field, provide bounding box coordinates:
227+ - bbox: [x1, y1, x2, y2] coordinates in normalized 0-1000 scale
228+ - page: Page number where the field appears (starting from 1)
229+
230+ Coordinate system:
231+ - Use normalized scale 0-1000 for both x and y axes
232+ - x1, y1 = top-left corner of bounding box
233+ - x2, y2 = bottom-right corner of bounding box
234+ - Ensure x2 > x1 and y2 > y1
235+ - Make bounding boxes tight around the actual text content
236+ </spatial-localization-guidelines>
237+
238+ Provide confidence assessments with spatial localization in JSON format:
239+ {
240+ "attribute_name": {
241+ "confidence": 0.85,
242+ "confidence_reason": "Clear text with high OCR confidence",
243+ "bbox": [100, 200, 300, 250],
244+ "page": 1
245+ }
246+ }
247+ ` ` `
248+
249+ # ### 2. Automatic Coordinate Conversion
250+
251+ When the LLM provides bounding box data in the assessment response, the system automatically :
252+
253+ 1. **Detects spatial data** : Identifies `bbox` and `page` fields in the LLM response
254+ 2. **Converts coordinates** : Transforms from 0-1000 normalized scale to 0-1 decimal format
255+ 3. **Calculates dimensions** : Converts [x1, y1, x2, y2] to {top, left, width, height} format
256+ 4. **Creates geometry objects** : Formats data for Pattern-1/BDA UI compatibility
257+ 5. **Processes recursively** : Handles nested group attributes and list items automatically
258+
259+ # ### 3. Coordinate System Transformation
260+
261+ The conversion process transforms coordinates from the LLM's 0-1000 scale to the UI's 0-1 decimal format :
262+
263+ ` ` ` python
264+ # LLM Response Format
265+ {
266+ "StatementDate": {
267+ "confidence": 0.95,
268+ "bbox": [100, 200, 400, 250], # [x1, y1, x2, y2] in 0-1000 scale
269+ "page": 1
270+ }
271+ }
272+
273+ # Automatically Converted to UI Format
274+ {
275+ "StatementDate": {
276+ "confidence": 0.95,
277+ "confidence_threshold": 0.85,
278+ "geometry": [{
279+ "boundingBox": {
280+ "top": 0.2, # y1 / 1000
281+ "left": 0.1, # x1 / 1000
282+ "width": 0.3, # (x2 - x1) / 1000
283+ "height": 0.05 # (y2 - y1) / 1000
284+ },
285+ "page": 1
286+ }]
287+ }
288+ }
289+ ` ` `
290+
291+ # ### 4. Pattern-1 Compatibility
292+
293+ The geometry format exactly matches Pattern-1 (BDA) specifications :
294+ - **boundingBox object**: Contains top, left, width, height as decimal values (0-1)
295+ - **page field**: 1-based page numbering
296+ - **Array structure**: geometry as array to support multiple regions per field
297+ - **Recursive processing**: Handles nested attributes like `CompanyAddress.State`
298+
299+ # ## Configuration-Free Operation
300+
301+ The bounding box feature requires no additional configuration :
302+ - **Automatic detection**: System detects when LLM provides spatial data
303+ - **Fallback handling**: Works normally when no bounding boxes are provided
304+ - **Backward compatibility**: Existing configurations continue to work unchanged
305+ - **Optional enhancement**: Bounding boxes enhance existing assessment without breaking changes
306+
212307# # Output Format
213308
214309Assessment results are appended to extraction results in the `explainability_info` format expected by the UI. The format varies based on the attribute type defined in your document class configuration.
@@ -229,7 +324,7 @@ attributes:
229324 description: "The date of the bank statement"
230325` ` `
231326
232- **Assessment Response:**
327+ **Assessment Response (without spatial data) :**
233328` ` ` json
234329{
235330 "StatementDate": {
@@ -239,6 +334,26 @@ attributes:
239334}
240335` ` `
241336
337+ **Assessment Response (with automatic spatial data):**
338+ ` ` ` json
339+ {
340+ "StatementDate": {
341+ "confidence": 0.85,
342+ "confidence_reason": "Date clearly visible in statement header",
343+ "confidence_threshold": 0.85,
344+ "geometry": [{
345+ "boundingBox": {
346+ "top": 0.2,
347+ "left": 0.1,
348+ "width": 0.15,
349+ "height": 0.03
350+ },
351+ "page": 1
352+ }]
353+ }
354+ }
355+ ` ` `
356+
242357# ### 2. Group Attributes
243358
244359For nested object structures with multiple related fields :
@@ -256,17 +371,37 @@ attributes:
256371 description: "The bank routing number"
257372` ` `
258373
259- **Assessment Response:**
374+ **Assessment Response (with automatic spatial data) :**
260375` ` ` json
261376{
262377 "AccountDetails": {
263378 "AccountNumber": {
264379 "confidence": 0.90,
265- "confidence_reason": "Account number clearly printed in standard location"
380+ "confidence_reason": "Account number clearly printed in standard location",
381+ "confidence_threshold": 0.90,
382+ "geometry": [{
383+ "boundingBox": {
384+ "top": 0.15,
385+ "left": 0.2,
386+ "width": 0.25,
387+ "height": 0.04
388+ },
389+ "page": 1
390+ }]
266391 },
267392 "RoutingNumber": {
268393 "confidence": 0.75,
269- "confidence_reason": "Routing number visible but slightly blurred"
394+ "confidence_reason": "Routing number visible but slightly blurred",
395+ "confidence_threshold": 0.90,
396+ "geometry": [{
397+ "boundingBox": {
398+ "top": 0.2,
399+ "left": 0.2,
400+ "width": 0.2,
401+ "height": 0.03
402+ },
403+ "page": 1
404+ }]
270405 }
271406 }
272407}
@@ -293,36 +428,96 @@ attributes:
293428 description: "Transaction amount"
294429` ` `
295430
296- **Assessment Response:**
431+ **Assessment Response (with automatic spatial data) :**
297432` ` ` json
298433{
299434 "Transactions": [
300435 {
301436 "Date": {
302437 "confidence": 0.95,
303- "confidence_reason": "Date clearly printed in standard format"
438+ "confidence_reason": "Date clearly printed in standard format",
439+ "confidence_threshold": 0.80,
440+ "geometry": [{
441+ "boundingBox": {
442+ "top": 0.3,
443+ "left": 0.1,
444+ "width": 0.12,
445+ "height": 0.025
446+ },
447+ "page": 1
448+ }]
304449 },
305450 "Description": {
306451 "confidence": 0.88,
307- "confidence_reason": "Description text is clear and readable"
452+ "confidence_reason": "Description text is clear and readable",
453+ "confidence_threshold": 0.75,
454+ "geometry": [{
455+ "boundingBox": {
456+ "top": 0.3,
457+ "left": 0.25,
458+ "width": 0.35,
459+ "height": 0.025
460+ },
461+ "page": 1
462+ }]
308463 },
309464 "Amount": {
310465 "confidence": 0.92,
311- "confidence_reason": "Amount aligned in currency column with clear digits"
466+ "confidence_reason": "Amount aligned in currency column with clear digits",
467+ "confidence_threshold": 0.85,
468+ "geometry": [{
469+ "boundingBox": {
470+ "top": 0.3,
471+ "left": 0.65,
472+ "width": 0.15,
473+ "height": 0.025
474+ },
475+ "page": 1
476+ }]
312477 }
313478 },
314479 {
315480 "Date": {
316481 "confidence": 0.90,
317- "confidence_reason": "Date visible but slightly smudged"
482+ "confidence_reason": "Date visible but slightly smudged",
483+ "confidence_threshold": 0.80,
484+ "geometry": [{
485+ "boundingBox": {
486+ "top": 0.33,
487+ "left": 0.1,
488+ "width": 0.12,
489+ "height": 0.025
490+ },
491+ "page": 1
492+ }]
318493 },
319494 "Description": {
320495 "confidence": 0.85,
321- "confidence_reason": "Description partially cut off but main text readable"
496+ "confidence_reason": "Description partially cut off but main text readable",
497+ "confidence_threshold": 0.75,
498+ "geometry": [{
499+ "boundingBox": {
500+ "top": 0.33,
501+ "left": 0.25,
502+ "width": 0.3,
503+ "height": 0.025
504+ },
505+ "page": 1
506+ }]
322507 },
323508 "Amount": {
324509 "confidence": 0.94,
325- "confidence_reason": "Amount clearly printed with proper decimal alignment"
510+ "confidence_reason": "Amount clearly printed with proper decimal alignment",
511+ "confidence_threshold": 0.85,
512+ "geometry": [{
513+ "boundingBox": {
514+ "top": 0.33,
515+ "left": 0.65,
516+ "width": 0.15,
517+ "height": 0.025
518+ },
519+ "page": 1
520+ }]
326521 }
327522 }
328523 ]
@@ -359,53 +554,89 @@ Here's a complete example showing all three attribute types in a single assessme
359554 "StatementDate": {
360555 "confidence": 0.95,
361556 "confidence_reason": "Statement date clearly printed in header",
362- "confidence_threshold": 0.85
557+ "confidence_threshold": 0.85,
558+ "geometry": [{
559+ "boundingBox": {"top": 0.1, "left": 0.1, "width": 0.15, "height": 0.03},
560+ "page": 1
561+ }]
363562 },
364563 "AccountDetails": {
365564 "AccountNumber": {
366565 "confidence": 0.90,
367566 "confidence_reason": "Account number clearly visible in account section",
368- "confidence_threshold": 0.90
567+ "confidence_threshold": 0.90,
568+ "geometry": [{
569+ "boundingBox": {"top": 0.15, "left": 0.2, "width": 0.25, "height": 0.04},
570+ "page": 1
571+ }]
369572 },
370573 "RoutingNumber": {
371574 "confidence": 0.85,
372- "confidence_reason": "Routing number printed clearly below account number",
373- "confidence_threshold": 0.90
575+ "confidence_reason": "Routing number printed clearly below account number",
576+ "confidence_threshold": 0.90,
577+ "geometry": [{
578+ "boundingBox": {"top": 0.2, "left": 0.2, "width": 0.2, "height": 0.03},
579+ "page": 1
580+ }]
374581 }
375582 },
376583 "Transactions": [
377584 {
378585 "Date": {
379586 "confidence": 0.95,
380587 "confidence_reason": "Transaction date clearly printed",
381- "confidence_threshold": 0.80
588+ "confidence_threshold": 0.80,
589+ "geometry": [{
590+ "boundingBox": {"top": 0.3, "left": 0.1, "width": 0.12, "height": 0.025},
591+ "page": 1
592+ }]
382593 },
383594 "Description": {
384595 "confidence": 0.88,
385596 "confidence_reason": "Description text is clear and complete",
386- "confidence_threshold": 0.75
597+ "confidence_threshold": 0.75,
598+ "geometry": [{
599+ "boundingBox": {"top": 0.3, "left": 0.25, "width": 0.35, "height": 0.025},
600+ "page": 1
601+ }]
387602 },
388603 "Amount": {
389604 "confidence": 0.92,
390605 "confidence_reason": "Amount properly aligned in currency format",
391- "confidence_threshold": 0.85
606+ "confidence_threshold": 0.85,
607+ "geometry": [{
608+ "boundingBox": {"top": 0.3, "left": 0.65, "width": 0.15, "height": 0.025},
609+ "page": 1
610+ }]
392611 }
393612 },
394613 {
395614 "Date": {
396615 "confidence": 0.90,
397616 "confidence_reason": "Date readable with minor print quality issues",
398- "confidence_threshold": 0.80
617+ "confidence_threshold": 0.80,
618+ "geometry": [{
619+ "boundingBox": {"top": 0.33, "left": 0.1, "width": 0.12, "height": 0.025},
620+ "page": 1
621+ }]
399622 },
400623 "Description": {
401624 "confidence": 0.85,
402625 "confidence_reason": "Description clear, standard ATM format",
403- "confidence_threshold": 0.75
626+ "confidence_threshold": 0.75,
627+ "geometry": [{
628+ "boundingBox": {"top": 0.33, "left": 0.25, "width": 0.3, "height": 0.025},
629+ "page": 1
630+ }]
404631 },
405632 "Amount": {
406633 "confidence": 0.94,
407634 "confidence_reason": "Negative amount clearly indicated with proper formatting",
408- "confidence_threshold": 0.85
635+ "confidence_threshold": 0.85,
636+ "geometry": [{
637+ "boundingBox": {"top": 0.33, "left": 0.65, "width": 0.15, "height": 0.025},
638+ "page": 1
639+ }]
409640 }
410641 }
411642 ]
@@ -569,7 +800,9 @@ When neither confidence nor threshold data is available, no confidence indicator
569800
570801**2. Visual Editor Modal**
571802- Same confidence indicators in the document image overlay editor
572- - Visual connection between form fields and document bounding boxes
803+ - **Bounding Box Visualization**: When assessment includes geometry data, bounding boxes are automatically displayed on the document page image
804+ - Visual connection between form fields and document bounding boxes with spatial localization
805+ - Interactive overlay showing precise field locations from assessment spatial data
573806- Confidence display for deeply nested extraction results
574807
575808**3. Nested Data Support**
0 commit comments