You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+17Lines changed: 17 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,23 @@ SPDX-License-Identifier: MIT-0
5
5
6
6
## [Unreleased]
7
7
8
+
## [0.3.1]
9
+
10
+
### Added
11
+
12
+
-**{DOCUMENT_IMAGE} Placeholder Support in Pattern-2**
13
+
- Added new `{DOCUMENT_IMAGE}` placeholder for precise image positioning in classification and extraction prompts
14
+
- Enables strategic placement of document images within prompt templates for enhanced multimodal understanding
15
+
- Supports both single images and multi-page documents (up to 20 images per Bedrock constraints)
16
+
- Full backward compatibility - existing prompts without placeholder continue to work unchanged
17
+
- Seamless integration with existing `{FEW_SHOT_EXAMPLES}` functionality
18
+
- Added warning logging when image limits are exceeded to help with debugging
19
+
- Enhanced documentation across classification.md, extraction.md, few-shot-examples.md, and pattern-2.md
20
+
21
+
### Fixed
22
+
- When encountering excessive Bedrock throttling, service returned 'unclassified' instead of retrying, when using multi-modal page level classification method.
Copy file name to clipboardExpand all lines: config_library/pattern-2/default/config.yaml
+27-1Lines changed: 27 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -441,17 +441,25 @@ extraction:
441
441
top_k: '5'
442
442
task_prompt: >-
443
443
<background>
444
+
444
445
You are an expert in document analysis and information extraction.
445
446
You can understand and extract key information from documents classified as type
447
+
446
448
{DOCUMENT_CLASS}.
449
+
447
450
</background>
448
451
452
+
449
453
<task>
454
+
450
455
Your task is to take the unstructured text provided and convert it into a well-organized table format using JSON. Identify the main entities, attributes, or categories mentioned in the attributes list below and use them as keys in the JSON object.
451
-
Then, extract the relevant information from the text and populate the corresponding values in the JSON object.
456
+
Then, extract the relevant information from the text and populate the corresponding values in the JSON object.
457
+
452
458
</task>
453
459
460
+
454
461
<extraction-guidelines>
462
+
455
463
Guidelines:
456
464
1. Ensure that the data is accurately represented and properly formatted within
457
465
the JSON structure
@@ -474,19 +482,36 @@ extraction:
474
482
- When a mark touches multiple options, analyze which option was most likely intended based on position and density. For handwritten checks, the mark typically flows from the selected checkbox outward.
475
483
- Carefully analyze visual cues and contextual hints. Think from a human perspective, anticipate natural tendencies, and apply thoughtful reasoning to make the best possible judgment.
476
484
10. Think step by step first and then answer.
485
+
477
486
</extraction-guidelines>
478
487
488
+
479
489
<attributes>
490
+
480
491
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
492
+
481
493
</attributes>
482
494
495
+
483
496
<<CACHEPOINT>>
484
497
498
+
485
499
<document-text>
500
+
486
501
{DOCUMENT_TEXT}
502
+
487
503
</document-text>
504
+
505
+
506
+
<document_image>
507
+
508
+
{DOCUMENT_IMAGE}
509
+
510
+
</document_image>
511
+
488
512
489
513
<final-instructions>
514
+
490
515
Extract key information from the document and return a JSON object with the following key steps:
491
516
1. Carefully analyze the document text to identify the requested attributes
492
517
2. Extract only information explicitly found in the document - never make up data
@@ -495,6 +520,7 @@ extraction:
495
520
5. Use null for any fields not found in the document
496
521
6. Ensure the output is properly formatted JSON with quoted keys and values
497
522
7. Think step by step before finalizing your answer
Copy file name to clipboardExpand all lines: config_library/pattern-2/few_shot_example_with_multimodal_page_classification/config.yaml
+84-44Lines changed: 84 additions & 44 deletions
Original file line number
Diff line number
Diff line change
@@ -612,6 +612,21 @@ classes:
612
612
description: >-
613
613
A bank statement document containing account information, transactions,
614
614
and financial details
615
+
attributes:
616
+
- name: account_holder_name
617
+
description: >-
618
+
The name of the account holder.
619
+
- name: account_name
620
+
description: >-
621
+
The name or type of the bank account.
622
+
- name: account_number
623
+
description: >-
624
+
The unique identifier for the bank account. Look for text following
625
+
'account number', 'account id', or 'account identifier'.
626
+
- name: transactions
627
+
description: >-
628
+
The list of transactions on the account. Look for text following
629
+
'transactions', 'transaction history', or 'transaction details'.
615
630
examples:
616
631
- classPrompt: Here are example images for each page of a 3 page 'bank-statement '
617
632
name: BankStatement1
@@ -657,22 +672,32 @@ classification:
657
672
{CLASS_NAMES_AND_DESCRIPTIONS}
658
673
659
674
660
-
Respond only with a JSON object containing the class label. For example:
661
-
{{"class": "letter"}}
662
-
663
-
<few_shot_examples>
675
+
<few_shot_examples>
664
676
665
677
{FEW_SHOT_EXAMPLES}
666
678
667
679
</few_shot_examples>
668
680
681
+
669
682
<<CACHEPOINT>>
670
683
671
-
<document_ocr_data>
672
684
673
-
{DOCUMENT_TEXT}
685
+
<document_ocr_data>
686
+
687
+
{DOCUMENT_TEXT}
674
688
675
689
</document_ocr_data>
690
+
691
+
692
+
<document_image>
693
+
694
+
{DOCUMENT_IMAGE}
695
+
696
+
</document_image>
697
+
698
+
699
+
Respond only with a JSON object containing the class label. For example:
700
+
{{"class": "letter"}}
676
701
extraction:
677
702
model: us.amazon.nova-pro-v1:0
678
703
temperature: '0.0'
@@ -685,71 +710,86 @@ extraction:
685
710
task_prompt: >
686
711
<background>
687
712
688
-
You are an expert in business document analysis and information extraction.
713
+
You are an expert in document analysis and information extraction.
714
+
You can understand and extract key information from documents classified as type
689
715
690
-
You can understand and extract key information from business documents.
716
+
{DOCUMENT_CLASS}.
691
717
692
-
<task>
718
+
</background>
693
719
694
-
Your task is to take the unstructured text provided and convert it into a
695
720
696
-
well-organized table format using JSON. Identify the main entities,
721
+
<task>
697
722
698
-
attributes, or categories mentioned in the attributes list below and use
723
+
Your task is to take the unstructured text provided and convert it into a well-organized table format using JSON. Identify the main entities, attributes, or categories mentioned in the attributes list below and use them as keys in the JSON object.
724
+
Then, extract the relevant information from the text and populate the corresponding values in the JSON object.
699
725
700
-
them as keys in the JSON object.
726
+
</task>
701
727
702
-
Then, extract the relevant information from the text and populate the
703
728
704
-
corresponding values in the JSON object.
729
+
<extraction-guidelines>
705
730
706
731
Guidelines:
732
+
1. Ensure that the data is accurately represented and properly formatted within
733
+
the JSON structure
734
+
2. Include double quotes around all keys and values
735
+
3. Do not make up data - only extract information explicitly found in the
736
+
document
737
+
4. Do not use /n for new lines, use a space instead
738
+
5. If a field is not found or if unsure, return null
739
+
6. All dates should be in MM/DD/YYYY format
740
+
7. Do not perform calculations or summations unless totals are explicitly given
741
+
8. If an alias is not found in the document, return null
742
+
9. Guidelines for checkboxes:
743
+
9.A. CAREFULLY examine each checkbox, radio button, and selection field:
744
+
- Look for marks like ✓, ✗, x, filled circles (●), darkened areas, or handwritten checks indicating selection
745
+
- For checkboxes and multi-select fields, ONLY INCLUDE options that show clear visual evidence of selection
746
+
- DO NOT list options that have no visible selection mark
747
+
9.B. For ambiguous or overlapping tick marks:
748
+
- If a mark overlaps between two or more checkboxes, determine which option contains the majority of the mark
749
+
- Consider a checkbox selected if the mark is primarily inside the check box or over the option text
750
+
- When a mark touches multiple options, analyze which option was most likely intended based on position and density. For handwritten checks, the mark typically flows from the selected checkbox outward.
751
+
- Carefully analyze visual cues and contextual hints. Think from a human perspective, anticipate natural tendencies, and apply thoughtful reasoning to make the best possible judgment.
752
+
10. Think step by step first and then answer.
753
+
754
+
</extraction-guidelines>
707
755
708
-
Ensure that the data is accurately represented and properly formatted within
709
-
the JSON structure
710
-
711
-
Include double quotes around all keys and values
712
-
713
-
Do not make up data - only extract information explicitly found in the
714
-
document
715
-
716
-
Do not use /n for new lines, use a space instead
717
-
718
-
If a field is not found or if unsure, return null
719
-
720
-
All dates should be in MM/DD/YYYY format
721
-
722
-
Do not perform calculations or summations unless totals are explicitly given
723
-
724
-
If an alias is not found in the document, return null
725
-
726
-
Here are the attributes you should extract:
727
756
728
757
<attributes>
729
758
730
759
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
731
760
732
761
</attributes>
733
762
734
-
<few_shot_examples>
735
763
736
-
{FEW_SHOT_EXAMPLES}
764
+
<<CACHEPOINT>>
737
765
738
-
</few_shot_examples>
739
766
740
-
</task>
767
+
<document-text>
741
768
742
-
</background>
769
+
{DOCUMENT_TEXT}
743
770
744
-
<<CACHEPOINT>>
771
+
</document-text>
772
+
745
773
746
-
The document tpe is {DOCUMENT_CLASS}. Here is the document content:
774
+
<document_image>
747
775
748
-
<document_ocr_data>
776
+
{DOCUMENT_IMAGE}
749
777
750
-
{DOCUMENT_TEXT}
778
+
</document_image>
751
779
752
-
</document_ocr_data>
780
+
781
+
<final-instructions>
782
+
783
+
Extract key information from the document and return a JSON object with the following key steps:
784
+
1. Carefully analyze the document text to identify the requested attributes
785
+
2. Extract only information explicitly found in the document - never make up data
786
+
3. Format all dates as MM/DD/YYYY and replace newlines with spaces
787
+
4. For checkboxes, only include options with clear visual selection marks
788
+
5. Use null for any fields not found in the document
789
+
6. Ensure the output is properly formatted JSON with quoted keys and values
790
+
7. Think step by step before finalizing your answer
Copy file name to clipboardExpand all lines: docs/classification.md
+98-1Lines changed: 98 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ The solution supports multiple classification approaches that vary by pattern:
11
11
12
12
### Pattern 1: BDA-Based Classification
13
13
14
-
- Classification is performed by the BDA (Business Document Analysis) project configuration
14
+
- Classification is performed by the BDA (Bedrock Data Automation) project configuration
15
15
- Uses BDA blueprints to define classification rules
16
16
- Not configurable inside the GenAIIDP solution itself
17
17
- Configuration happens at the BDA project level
@@ -197,6 +197,103 @@ You can define custom document classes through the Web UI configuration:
197
197
- Detailed description (to guide the classification model)
198
198
5. Save changes
199
199
200
+
## Image Placement with {DOCUMENT_IMAGE} Placeholder
201
+
202
+
Pattern 2 supports precise control over where document images are positioned within your classification prompts using the `{DOCUMENT_IMAGE}` placeholder. This feature allows you to specify exactly where images should appear in your prompt template, rather than having them automatically appended at the end.
203
+
204
+
### How {DOCUMENT_IMAGE} Works
205
+
206
+
**Without Placeholder (Default Behavior):**
207
+
```yaml
208
+
classification:
209
+
task_prompt: |
210
+
Analyze this document:
211
+
212
+
{DOCUMENT_TEXT}
213
+
214
+
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
215
+
```
216
+
Images are automatically appended after the text content.
217
+
218
+
**With Placeholder (Controlled Placement):**
219
+
```yaml
220
+
classification:
221
+
task_prompt: |
222
+
Analyze this document:
223
+
224
+
{DOCUMENT_IMAGE}
225
+
226
+
Text content: {DOCUMENT_TEXT}
227
+
228
+
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
229
+
```
230
+
Images are inserted exactly where `{DOCUMENT_IMAGE}` appears in the prompt.
231
+
232
+
### Usage Examples
233
+
234
+
**Image Before Text Analysis:**
235
+
```yaml
236
+
task_prompt: |
237
+
Look at this document image first:
238
+
239
+
{DOCUMENT_IMAGE}
240
+
241
+
Now read the extracted text:
242
+
{DOCUMENT_TEXT}
243
+
244
+
Based on both the visual layout and text content, classify this document as one of:
245
+
{CLASS_NAMES_AND_DESCRIPTIONS}
246
+
```
247
+
248
+
**Image in the Middle for Context:**
249
+
```yaml
250
+
task_prompt: |
251
+
You are classifying business documents. Here are the possible types:
252
+
{CLASS_NAMES_AND_DESCRIPTIONS}
253
+
254
+
Examine this document image:
255
+
{DOCUMENT_IMAGE}
256
+
257
+
Additional text content extracted from the document:
258
+
{DOCUMENT_TEXT}
259
+
260
+
Classification:
261
+
```
262
+
263
+
### Integration with Few-Shot Examples
264
+
265
+
The `{DOCUMENT_IMAGE}` placeholder works seamlessly with few-shot examples:
266
+
267
+
```yaml
268
+
classification:
269
+
task_prompt: |
270
+
Here are examples of each document type:
271
+
{FEW_SHOT_EXAMPLES}
272
+
273
+
Now classify this new document:
274
+
{DOCUMENT_IMAGE}
275
+
276
+
Text: {DOCUMENT_TEXT}
277
+
278
+
Classification: {CLASS_NAMES_AND_DESCRIPTIONS}
279
+
```
280
+
281
+
### Benefits
282
+
283
+
- **🎯 Contextual Placement**: Position images where they provide maximum context
284
+
- **📱 Better Multimodal Understanding**: Help models correlate visual and textual information
285
+
- **🔄 Flexible Prompt Design**: Create prompts that flow naturally between different content types
- **🔒 Backward Compatible**: Existing prompts without the placeholder continue to work unchanged
288
+
289
+
### Multi-Page Documents
290
+
291
+
For documents with multiple pages, the system automatically handles image limits:
292
+
293
+
- **Bedrock Limit**: Maximum 20 images per request (automatically enforced)
294
+
- **Warning Logging**: System logs warnings when images are truncated due to limits
295
+
- **Smart Handling**: Images are processed in page order, with excess images automatically dropped
296
+
200
297
## Setting Up Few Shot Examples in Pattern 2
201
298
202
299
Pattern 2's multimodal page-level classification supports few-shot example prompting, which can significantly improve classification accuracy by providing concrete document examples. This feature is available when you select the 'few_shot_example_with_multimodal_page_classification' configuration.
0 commit comments