Skip to content

Commit 9255e4d

Browse files
author
Bob Strahan
committed
Merge branch 'develop' v0.3.1
2 parents 64f8587 + 2634e0f commit 9255e4d

File tree

17 files changed

+947
-894
lines changed

17 files changed

+947
-894
lines changed

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,23 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
## [0.3.1]
9+
10+
### Added
11+
12+
- **{DOCUMENT_IMAGE} Placeholder Support in Pattern-2**
13+
- Added new `{DOCUMENT_IMAGE}` placeholder for precise image positioning in classification and extraction prompts
14+
- Enables strategic placement of document images within prompt templates for enhanced multimodal understanding
15+
- Supports both single images and multi-page documents (up to 20 images per Bedrock constraints)
16+
- Full backward compatibility - existing prompts without placeholder continue to work unchanged
17+
- Seamless integration with existing `{FEW_SHOT_EXAMPLES}` functionality
18+
- Added warning logging when image limits are exceeded to help with debugging
19+
- Enhanced documentation across classification.md, extraction.md, few-shot-examples.md, and pattern-2.md
20+
21+
### Fixed
22+
- When encountering excessive Bedrock throttling, service returned 'unclassified' instead of retrying, when using multi-modal page level classification method.
23+
- Minor documentation issues.
24+
825
## [0.3.0]
926

1027
### Added

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.3.0
1+
0.3.1

config_library/pattern-2/default/config.yaml

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -441,17 +441,25 @@ extraction:
441441
top_k: '5'
442442
task_prompt: >-
443443
<background>
444+
444445
You are an expert in document analysis and information extraction.
445446
You can understand and extract key information from documents classified as type
447+
446448
{DOCUMENT_CLASS}.
449+
447450
</background>
448451
452+
449453
<task>
454+
450455
Your task is to take the unstructured text provided and convert it into a well-organized table format using JSON. Identify the main entities, attributes, or categories mentioned in the attributes list below and use them as keys in the JSON object.
451-
Then, extract the relevant information from the text and populate the corresponding values in the JSON object.
456+
Then, extract the relevant information from the text and populate the corresponding values in the JSON object.
457+
452458
</task>
453459
460+
454461
<extraction-guidelines>
462+
455463
Guidelines:
456464
1. Ensure that the data is accurately represented and properly formatted within
457465
the JSON structure
@@ -474,19 +482,36 @@ extraction:
474482
- When a mark touches multiple options, analyze which option was most likely intended based on position and density. For handwritten checks, the mark typically flows from the selected checkbox outward.
475483
- Carefully analyze visual cues and contextual hints. Think from a human perspective, anticipate natural tendencies, and apply thoughtful reasoning to make the best possible judgment.
476484
10. Think step by step first and then answer.
485+
477486
</extraction-guidelines>
478487
488+
479489
<attributes>
490+
480491
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
492+
481493
</attributes>
482494
495+
483496
<<CACHEPOINT>>
484497
498+
485499
<document-text>
500+
486501
{DOCUMENT_TEXT}
502+
487503
</document-text>
504+
505+
506+
<document_image>
507+
508+
{DOCUMENT_IMAGE}
509+
510+
</document_image>
511+
488512
489513
<final-instructions>
514+
490515
Extract key information from the document and return a JSON object with the following key steps:
491516
1. Carefully analyze the document text to identify the requested attributes
492517
2. Extract only information explicitly found in the document - never make up data
@@ -495,6 +520,7 @@ extraction:
495520
5. Use null for any fields not found in the document
496521
6. Ensure the output is properly formatted JSON with quoted keys and values
497522
7. Think step by step before finalizing your answer
523+
498524
</final-instructions>
499525
temperature: '0.0'
500526
model: us.amazon.nova-pro-v1:0

config_library/pattern-2/few_shot_example_with_multimodal_page_classification/config.yaml

Lines changed: 84 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -612,6 +612,21 @@ classes:
612612
description: >-
613613
A bank statement document containing account information, transactions,
614614
and financial details
615+
attributes:
616+
- name: account_holder_name
617+
description: >-
618+
The name of the account holder.
619+
- name: account_name
620+
description: >-
621+
The name or type of the bank account.
622+
- name: account_number
623+
description: >-
624+
The unique identifier for the bank account. Look for text following
625+
'account number', 'account id', or 'account identifier'.
626+
- name: transactions
627+
description: >-
628+
The list of transactions on the account. Look for text following
629+
'transactions', 'transaction history', or 'transaction details'.
615630
examples:
616631
- classPrompt: Here are example images for each page of a 3 page 'bank-statement '
617632
name: BankStatement1
@@ -657,22 +672,32 @@ classification:
657672
{CLASS_NAMES_AND_DESCRIPTIONS}
658673
659674
660-
Respond only with a JSON object containing the class label. For example:
661-
{{"class": "letter"}}
662-
663-
<few_shot_examples>
675+
<few_shot_examples>
664676
665677
{FEW_SHOT_EXAMPLES}
666678
667679
</few_shot_examples>
668680
681+
669682
<<CACHEPOINT>>
670683
671-
<document_ocr_data>
672684
673-
{DOCUMENT_TEXT}
685+
<document_ocr_data>
686+
687+
{DOCUMENT_TEXT}
674688
675689
</document_ocr_data>
690+
691+
692+
<document_image>
693+
694+
{DOCUMENT_IMAGE}
695+
696+
</document_image>
697+
698+
699+
Respond only with a JSON object containing the class label. For example:
700+
{{"class": "letter"}}
676701
extraction:
677702
model: us.amazon.nova-pro-v1:0
678703
temperature: '0.0'
@@ -685,71 +710,86 @@ extraction:
685710
task_prompt: >
686711
<background>
687712
688-
You are an expert in business document analysis and information extraction.
713+
You are an expert in document analysis and information extraction.
714+
You can understand and extract key information from documents classified as type
689715
690-
You can understand and extract key information from business documents.
716+
{DOCUMENT_CLASS}.
691717
692-
<task>
718+
</background>
693719
694-
Your task is to take the unstructured text provided and convert it into a
695720
696-
well-organized table format using JSON. Identify the main entities,
721+
<task>
697722
698-
attributes, or categories mentioned in the attributes list below and use
723+
Your task is to take the unstructured text provided and convert it into a well-organized table format using JSON. Identify the main entities, attributes, or categories mentioned in the attributes list below and use them as keys in the JSON object.
724+
Then, extract the relevant information from the text and populate the corresponding values in the JSON object.
699725
700-
them as keys in the JSON object.
726+
</task>
701727
702-
Then, extract the relevant information from the text and populate the
703728
704-
corresponding values in the JSON object.
729+
<extraction-guidelines>
705730
706731
Guidelines:
732+
1. Ensure that the data is accurately represented and properly formatted within
733+
the JSON structure
734+
2. Include double quotes around all keys and values
735+
3. Do not make up data - only extract information explicitly found in the
736+
document
737+
4. Do not use /n for new lines, use a space instead
738+
5. If a field is not found or if unsure, return null
739+
6. All dates should be in MM/DD/YYYY format
740+
7. Do not perform calculations or summations unless totals are explicitly given
741+
8. If an alias is not found in the document, return null
742+
9. Guidelines for checkboxes:
743+
9.A. CAREFULLY examine each checkbox, radio button, and selection field:
744+
- Look for marks like ✓, ✗, x, filled circles (●), darkened areas, or handwritten checks indicating selection
745+
- For checkboxes and multi-select fields, ONLY INCLUDE options that show clear visual evidence of selection
746+
- DO NOT list options that have no visible selection mark
747+
9.B. For ambiguous or overlapping tick marks:
748+
- If a mark overlaps between two or more checkboxes, determine which option contains the majority of the mark
749+
- Consider a checkbox selected if the mark is primarily inside the check box or over the option text
750+
- When a mark touches multiple options, analyze which option was most likely intended based on position and density. For handwritten checks, the mark typically flows from the selected checkbox outward.
751+
- Carefully analyze visual cues and contextual hints. Think from a human perspective, anticipate natural tendencies, and apply thoughtful reasoning to make the best possible judgment.
752+
10. Think step by step first and then answer.
753+
754+
</extraction-guidelines>
707755
708-
Ensure that the data is accurately represented and properly formatted within
709-
the JSON structure
710-
711-
Include double quotes around all keys and values
712-
713-
Do not make up data - only extract information explicitly found in the
714-
document
715-
716-
Do not use /n for new lines, use a space instead
717-
718-
If a field is not found or if unsure, return null
719-
720-
All dates should be in MM/DD/YYYY format
721-
722-
Do not perform calculations or summations unless totals are explicitly given
723-
724-
If an alias is not found in the document, return null
725-
726-
Here are the attributes you should extract:
727756
728757
<attributes>
729758
730759
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
731760
732761
</attributes>
733762
734-
<few_shot_examples>
735763
736-
{FEW_SHOT_EXAMPLES}
764+
<<CACHEPOINT>>
737765
738-
</few_shot_examples>
739766
740-
</task>
767+
<document-text>
741768
742-
</background>
769+
{DOCUMENT_TEXT}
743770
744-
<<CACHEPOINT>>
771+
</document-text>
772+
745773
746-
The document tpe is {DOCUMENT_CLASS}. Here is the document content:
774+
<document_image>
747775
748-
<document_ocr_data>
776+
{DOCUMENT_IMAGE}
749777
750-
{DOCUMENT_TEXT}
778+
</document_image>
751779
752-
</document_ocr_data>
780+
781+
<final-instructions>
782+
783+
Extract key information from the document and return a JSON object with the following key steps:
784+
1. Carefully analyze the document text to identify the requested attributes
785+
2. Extract only information explicitly found in the document - never make up data
786+
3. Format all dates as MM/DD/YYYY and replace newlines with spaces
787+
4. For checkboxes, only include options with clear visual selection marks
788+
5. Use null for any fields not found in the document
789+
6. Ensure the output is properly formatted JSON with quoted keys and values
790+
7. Think step by step before finalizing your answer
791+
792+
</final-instructions>
753793
pricing:
754794
- name: textract/detect_document_text
755795
units:

docs/classification.md

Lines changed: 98 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ The solution supports multiple classification approaches that vary by pattern:
1111

1212
### Pattern 1: BDA-Based Classification
1313

14-
- Classification is performed by the BDA (Business Document Analysis) project configuration
14+
- Classification is performed by the BDA (Bedrock Data Automation) project configuration
1515
- Uses BDA blueprints to define classification rules
1616
- Not configurable inside the GenAIIDP solution itself
1717
- Configuration happens at the BDA project level
@@ -197,6 +197,103 @@ You can define custom document classes through the Web UI configuration:
197197
- Detailed description (to guide the classification model)
198198
5. Save changes
199199

200+
## Image Placement with {DOCUMENT_IMAGE} Placeholder
201+
202+
Pattern 2 supports precise control over where document images are positioned within your classification prompts using the `{DOCUMENT_IMAGE}` placeholder. This feature allows you to specify exactly where images should appear in your prompt template, rather than having them automatically appended at the end.
203+
204+
### How {DOCUMENT_IMAGE} Works
205+
206+
**Without Placeholder (Default Behavior):**
207+
```yaml
208+
classification:
209+
task_prompt: |
210+
Analyze this document:
211+
212+
{DOCUMENT_TEXT}
213+
214+
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
215+
```
216+
Images are automatically appended after the text content.
217+
218+
**With Placeholder (Controlled Placement):**
219+
```yaml
220+
classification:
221+
task_prompt: |
222+
Analyze this document:
223+
224+
{DOCUMENT_IMAGE}
225+
226+
Text content: {DOCUMENT_TEXT}
227+
228+
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
229+
```
230+
Images are inserted exactly where `{DOCUMENT_IMAGE}` appears in the prompt.
231+
232+
### Usage Examples
233+
234+
**Image Before Text Analysis:**
235+
```yaml
236+
task_prompt: |
237+
Look at this document image first:
238+
239+
{DOCUMENT_IMAGE}
240+
241+
Now read the extracted text:
242+
{DOCUMENT_TEXT}
243+
244+
Based on both the visual layout and text content, classify this document as one of:
245+
{CLASS_NAMES_AND_DESCRIPTIONS}
246+
```
247+
248+
**Image in the Middle for Context:**
249+
```yaml
250+
task_prompt: |
251+
You are classifying business documents. Here are the possible types:
252+
{CLASS_NAMES_AND_DESCRIPTIONS}
253+
254+
Examine this document image:
255+
{DOCUMENT_IMAGE}
256+
257+
Additional text content extracted from the document:
258+
{DOCUMENT_TEXT}
259+
260+
Classification:
261+
```
262+
263+
### Integration with Few-Shot Examples
264+
265+
The `{DOCUMENT_IMAGE}` placeholder works seamlessly with few-shot examples:
266+
267+
```yaml
268+
classification:
269+
task_prompt: |
270+
Here are examples of each document type:
271+
{FEW_SHOT_EXAMPLES}
272+
273+
Now classify this new document:
274+
{DOCUMENT_IMAGE}
275+
276+
Text: {DOCUMENT_TEXT}
277+
278+
Classification: {CLASS_NAMES_AND_DESCRIPTIONS}
279+
```
280+
281+
### Benefits
282+
283+
- **🎯 Contextual Placement**: Position images where they provide maximum context
284+
- **📱 Better Multimodal Understanding**: Help models correlate visual and textual information
285+
- **🔄 Flexible Prompt Design**: Create prompts that flow naturally between different content types
286+
- **⚡ Improved Performance**: Strategic image placement can improve classification accuracy
287+
- **🔒 Backward Compatible**: Existing prompts without the placeholder continue to work unchanged
288+
289+
### Multi-Page Documents
290+
291+
For documents with multiple pages, the system automatically handles image limits:
292+
293+
- **Bedrock Limit**: Maximum 20 images per request (automatically enforced)
294+
- **Warning Logging**: System logs warnings when images are truncated due to limits
295+
- **Smart Handling**: Images are processed in page order, with excess images automatically dropped
296+
200297
## Setting Up Few Shot Examples in Pattern 2
201298

202299
Pattern 2's multimodal page-level classification supports few-shot example prompting, which can significantly improve classification accuracy by providing concrete document examples. This feature is available when you select the 'few_shot_example_with_multimodal_page_classification' configuration.

0 commit comments

Comments
 (0)