You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge branch 'feature/document-image-placeholder' into 'develop'
Add {DOCUMENT_IMAGE} placeholder to classification prompt for multi-modal page...
See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!149
Copy file name to clipboardExpand all lines: config_library/pattern-2/few_shot_example_with_multimodal_page_classification/config.yaml
+55-64Lines changed: 55 additions & 64 deletions
Original file line number
Diff line number
Diff line change
@@ -653,26 +653,24 @@ classification:
653
653
task_prompt: >-
654
654
Classify this document into exactly one of these categories:
655
655
656
-
657
656
{CLASS_NAMES_AND_DESCRIPTIONS}
658
657
659
-
660
-
Respond only with a JSON object containing the class label. For example:
661
-
{{"class": "letter"}}
662
-
663
658
<few_shot_examples>
664
-
665
659
{FEW_SHOT_EXAMPLES}
666
-
667
660
</few_shot_examples>
668
661
669
662
<<CACHEPOINT>>
670
663
671
664
<document_ocr_data>
672
-
673
665
{DOCUMENT_TEXT}
674
-
675
666
</document_ocr_data>
667
+
668
+
<document_image>
669
+
{DOCUMENT_IMAGE}
670
+
</document_image>
671
+
672
+
Respond only with a JSON object containing the class label. For example:
673
+
{{"class": "letter"}}
676
674
extraction:
677
675
model: us.amazon.nova-pro-v1:0
678
676
temperature: '0.0'
@@ -684,72 +682,65 @@ extraction:
684
682
only provide data found in the document being provided.
685
683
task_prompt: >
686
684
<background>
687
-
688
-
You are an expert in business document analysis and information extraction.
689
-
690
-
You can understand and extract key information from business documents.
685
+
You are an expert in document analysis and information extraction.
686
+
You can understand and extract key information from documents classified as type
687
+
{DOCUMENT_CLASS}.
688
+
</background>
691
689
692
690
<task>
691
+
Your task is to take the unstructured text provided and convert it into a well-organized table format using JSON. Identify the main entities, attributes, or categories mentioned in the attributes list below and use them as keys in the JSON object.
692
+
Then, extract the relevant information from the text and populate the corresponding values in the JSON object.
693
+
</task>
693
694
694
-
Your task is to take the unstructured text provided and convert it into a
695
-
696
-
well-organized table format using JSON. Identify the main entities,
697
-
698
-
attributes, or categories mentioned in the attributes list below and use
699
-
700
-
them as keys in the JSON object.
701
-
702
-
Then, extract the relevant information from the text and populate the
703
-
704
-
corresponding values in the JSON object.
705
-
695
+
<extraction-guidelines>
706
696
Guidelines:
707
-
708
-
Ensure that the data is accurately represented and properly formatted within
709
-
the JSON structure
710
-
711
-
Include double quotes around all keys and values
712
-
713
-
Do not make up data - only extract information explicitly found in the
714
-
document
715
-
716
-
Do not use /n for new lines, use a space instead
717
-
718
-
If a field is not found or if unsure, return null
719
-
720
-
All dates should be in MM/DD/YYYY format
721
-
722
-
Do not perform calculations or summations unless totals are explicitly given
723
-
724
-
If an alias is not found in the document, return null
725
-
726
-
Here are the attributes you should extract:
697
+
1. Ensure that the data is accurately represented and properly formatted within
698
+
the JSON structure
699
+
2. Include double quotes around all keys and values
700
+
3. Do not make up data - only extract information explicitly found in the
701
+
document
702
+
4. Do not use /n for new lines, use a space instead
703
+
5. If a field is not found or if unsure, return null
704
+
6. All dates should be in MM/DD/YYYY format
705
+
7. Do not perform calculations or summations unless totals are explicitly given
706
+
8. If an alias is not found in the document, return null
707
+
9. Guidelines for checkboxes:
708
+
9.A. CAREFULLY examine each checkbox, radio button, and selection field:
709
+
- Look for marks like ✓, ✗, x, filled circles (●), darkened areas, or handwritten checks indicating selection
710
+
- For checkboxes and multi-select fields, ONLY INCLUDE options that show clear visual evidence of selection
711
+
- DO NOT list options that have no visible selection mark
712
+
9.B. For ambiguous or overlapping tick marks:
713
+
- If a mark overlaps between two or more checkboxes, determine which option contains the majority of the mark
714
+
- Consider a checkbox selected if the mark is primarily inside the check box or over the option text
715
+
- When a mark touches multiple options, analyze which option was most likely intended based on position and density. For handwritten checks, the mark typically flows from the selected checkbox outward.
716
+
- Carefully analyze visual cues and contextual hints. Think from a human perspective, anticipate natural tendencies, and apply thoughtful reasoning to make the best possible judgment.
717
+
10. Think step by step first and then answer.
718
+
</extraction-guidelines>
727
719
728
720
<attributes>
729
-
730
721
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
731
-
732
722
</attributes>
733
723
734
-
<few_shot_examples>
735
-
736
-
{FEW_SHOT_EXAMPLES}
737
-
738
-
</few_shot_examples>
739
-
740
-
</task>
741
-
742
-
</background>
743
-
744
-
<<CACHEPOINT>>
745
-
746
-
The document tpe is {DOCUMENT_CLASS}. Here is the document content:
747
-
748
-
<document_ocr_data>
724
+
<<CACHEPOINT>>
749
725
726
+
<document-text>
750
727
{DOCUMENT_TEXT}
751
-
752
-
</document_ocr_data>
728
+
</document-text>
729
+
730
+
<document_image>
731
+
{DOCUMENT_IMAGE}
732
+
</document_image>
733
+
734
+
<final-instructions>
735
+
Extract key information from the document and return a JSON object with the following key steps:
736
+
1. Carefully analyze the document text to identify the requested attributes
737
+
2. Extract only information explicitly found in the document - never make up data
738
+
3. Format all dates as MM/DD/YYYY and replace newlines with spaces
739
+
4. For checkboxes, only include options with clear visual selection marks
740
+
5. Use null for any fields not found in the document
741
+
6. Ensure the output is properly formatted JSON with quoted keys and values
742
+
7. Think step by step before finalizing your answer
Copy file name to clipboardExpand all lines: docs/classification.md
+97Lines changed: 97 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -197,6 +197,103 @@ You can define custom document classes through the Web UI configuration:
197
197
- Detailed description (to guide the classification model)
198
198
5. Save changes
199
199
200
+
## Image Placement with {DOCUMENT_IMAGE} Placeholder
201
+
202
+
Pattern 2 supports precise control over where document images are positioned within your classification prompts using the `{DOCUMENT_IMAGE}` placeholder. This feature allows you to specify exactly where images should appear in your prompt template, rather than having them automatically appended at the end.
203
+
204
+
### How {DOCUMENT_IMAGE} Works
205
+
206
+
**Without Placeholder (Default Behavior):**
207
+
```yaml
208
+
classification:
209
+
task_prompt: |
210
+
Analyze this document:
211
+
212
+
{DOCUMENT_TEXT}
213
+
214
+
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
215
+
```
216
+
Images are automatically appended after the text content.
217
+
218
+
**With Placeholder (Controlled Placement):**
219
+
```yaml
220
+
classification:
221
+
task_prompt: |
222
+
Analyze this document:
223
+
224
+
{DOCUMENT_IMAGE}
225
+
226
+
Text content: {DOCUMENT_TEXT}
227
+
228
+
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
229
+
```
230
+
Images are inserted exactly where `{DOCUMENT_IMAGE}` appears in the prompt.
231
+
232
+
### Usage Examples
233
+
234
+
**Image Before Text Analysis:**
235
+
```yaml
236
+
task_prompt: |
237
+
Look at this document image first:
238
+
239
+
{DOCUMENT_IMAGE}
240
+
241
+
Now read the extracted text:
242
+
{DOCUMENT_TEXT}
243
+
244
+
Based on both the visual layout and text content, classify this document as one of:
245
+
{CLASS_NAMES_AND_DESCRIPTIONS}
246
+
```
247
+
248
+
**Image in the Middle for Context:**
249
+
```yaml
250
+
task_prompt: |
251
+
You are classifying business documents. Here are the possible types:
252
+
{CLASS_NAMES_AND_DESCRIPTIONS}
253
+
254
+
Examine this document image:
255
+
{DOCUMENT_IMAGE}
256
+
257
+
Additional text content extracted from the document:
258
+
{DOCUMENT_TEXT}
259
+
260
+
Classification:
261
+
```
262
+
263
+
### Integration with Few-Shot Examples
264
+
265
+
The `{DOCUMENT_IMAGE}` placeholder works seamlessly with few-shot examples:
266
+
267
+
```yaml
268
+
classification:
269
+
task_prompt: |
270
+
Here are examples of each document type:
271
+
{FEW_SHOT_EXAMPLES}
272
+
273
+
Now classify this new document:
274
+
{DOCUMENT_IMAGE}
275
+
276
+
Text: {DOCUMENT_TEXT}
277
+
278
+
Classification: {CLASS_NAMES_AND_DESCRIPTIONS}
279
+
```
280
+
281
+
### Benefits
282
+
283
+
- **🎯 Contextual Placement**: Position images where they provide maximum context
284
+
- **📱 Better Multimodal Understanding**: Help models correlate visual and textual information
285
+
- **🔄 Flexible Prompt Design**: Create prompts that flow naturally between different content types
- **🔒 Backward Compatible**: Existing prompts without the placeholder continue to work unchanged
288
+
289
+
### Multi-Page Documents
290
+
291
+
For documents with multiple pages, the system automatically handles image limits:
292
+
293
+
- **Bedrock Limit**: Maximum 20 images per request (automatically enforced)
294
+
- **Warning Logging**: System logs warnings when images are truncated due to limits
295
+
- **Smart Handling**: Images are processed in page order, with excess images automatically dropped
296
+
200
297
## Setting Up Few Shot Examples in Pattern 2
201
298
202
299
Pattern 2's multimodal page-level classification supports few-shot example prompting, which can significantly improve classification accuracy by providing concrete document examples. This feature is available when you select the 'few_shot_example_with_multimodal_page_classification' configuration.
0 commit comments