You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/classification.md
+97Lines changed: 97 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -197,6 +197,103 @@ You can define custom document classes through the Web UI configuration:
197
197
- Detailed description (to guide the classification model)
198
198
5. Save changes
199
199
200
+
## Image Placement with {DOCUMENT_IMAGE} Placeholder
201
+
202
+
Pattern 2 supports precise control over where document images are positioned within your classification prompts using the `{DOCUMENT_IMAGE}` placeholder. This feature allows you to specify exactly where images should appear in your prompt template, rather than having them automatically appended at the end.
203
+
204
+
### How {DOCUMENT_IMAGE} Works
205
+
206
+
**Without Placeholder (Default Behavior):**
207
+
```yaml
208
+
classification:
209
+
task_prompt: |
210
+
Analyze this document:
211
+
212
+
{DOCUMENT_TEXT}
213
+
214
+
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
215
+
```
216
+
Images are automatically appended after the text content.
217
+
218
+
**With Placeholder (Controlled Placement):**
219
+
```yaml
220
+
classification:
221
+
task_prompt: |
222
+
Analyze this document:
223
+
224
+
{DOCUMENT_IMAGE}
225
+
226
+
Text content: {DOCUMENT_TEXT}
227
+
228
+
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
229
+
```
230
+
Images are inserted exactly where `{DOCUMENT_IMAGE}` appears in the prompt.
231
+
232
+
### Usage Examples
233
+
234
+
**Image Before Text Analysis:**
235
+
```yaml
236
+
task_prompt: |
237
+
Look at this document image first:
238
+
239
+
{DOCUMENT_IMAGE}
240
+
241
+
Now read the extracted text:
242
+
{DOCUMENT_TEXT}
243
+
244
+
Based on both the visual layout and text content, classify this document as one of:
245
+
{CLASS_NAMES_AND_DESCRIPTIONS}
246
+
```
247
+
248
+
**Image in the Middle for Context:**
249
+
```yaml
250
+
task_prompt: |
251
+
You are classifying business documents. Here are the possible types:
252
+
{CLASS_NAMES_AND_DESCRIPTIONS}
253
+
254
+
Examine this document image:
255
+
{DOCUMENT_IMAGE}
256
+
257
+
Additional text content extracted from the document:
258
+
{DOCUMENT_TEXT}
259
+
260
+
Classification:
261
+
```
262
+
263
+
### Integration with Few-Shot Examples
264
+
265
+
The `{DOCUMENT_IMAGE}` placeholder works seamlessly with few-shot examples:
266
+
267
+
```yaml
268
+
classification:
269
+
task_prompt: |
270
+
Here are examples of each document type:
271
+
{FEW_SHOT_EXAMPLES}
272
+
273
+
Now classify this new document:
274
+
{DOCUMENT_IMAGE}
275
+
276
+
Text: {DOCUMENT_TEXT}
277
+
278
+
Classification: {CLASS_NAMES_AND_DESCRIPTIONS}
279
+
```
280
+
281
+
### Benefits
282
+
283
+
- **🎯 Contextual Placement**: Position images where they provide maximum context
284
+
- **📱 Better Multimodal Understanding**: Help models correlate visual and textual information
285
+
- **🔄 Flexible Prompt Design**: Create prompts that flow naturally between different content types
- **🔒 Backward Compatible**: Existing prompts without the placeholder continue to work unchanged
288
+
289
+
### Multi-Page Documents
290
+
291
+
For documents with multiple pages, the system automatically handles image limits:
292
+
293
+
- **Bedrock Limit**: Maximum 20 images per request (automatically enforced)
294
+
- **Warning Logging**: System logs warnings when images are truncated due to limits
295
+
- **Smart Handling**: Images are processed in page order, with excess images automatically dropped
296
+
200
297
## Setting Up Few Shot Examples in Pattern 2
201
298
202
299
Pattern 2's multimodal page-level classification supports few-shot example prompting, which can significantly improve classification accuracy by providing concrete document examples. This feature is available when you select the 'few_shot_example_with_multimodal_page_classification' configuration.
Copy file name to clipboardExpand all lines: docs/extraction.md
+154Lines changed: 154 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -72,6 +72,160 @@ extraction:
72
72
73
73
The extraction service parses the JSON response and makes it available for downstream processing.
74
74
75
+
## Image Placement with {DOCUMENT_IMAGE} Placeholder
76
+
77
+
The extraction service supports precise control over where document images are positioned within your extraction prompts using the `{DOCUMENT_IMAGE}` placeholder. This feature allows you to specify exactly where images should appear in your prompt template, enabling better multimodal extraction by strategically positioning visual content relative to text instructions.
78
+
79
+
### How {DOCUMENT_IMAGE} Works
80
+
81
+
**Without Placeholder (Default Behavior):**
82
+
```yaml
83
+
extraction:
84
+
task_prompt: |
85
+
Extract the following fields from this {DOCUMENT_CLASS} document:
86
+
87
+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
88
+
89
+
Document text:
90
+
{DOCUMENT_TEXT}
91
+
92
+
Respond with valid JSON.
93
+
```
94
+
Images are automatically appended after the text content.
95
+
96
+
**With Placeholder (Controlled Placement):**
97
+
```yaml
98
+
extraction:
99
+
task_prompt: |
100
+
Extract the following fields from this {DOCUMENT_CLASS} document:
101
+
102
+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
103
+
104
+
Examine this document image:
105
+
{DOCUMENT_IMAGE}
106
+
107
+
Text content:
108
+
{DOCUMENT_TEXT}
109
+
110
+
Respond with valid JSON containing the extracted values.
111
+
```
112
+
Images are inserted exactly where `{DOCUMENT_IMAGE}` appears in the prompt.
113
+
114
+
### Usage Examples
115
+
116
+
**Visual-First Extraction:**
117
+
```yaml
118
+
task_prompt: |
119
+
You are extracting data from a {DOCUMENT_CLASS}. Here are the fields to find:
120
+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
121
+
122
+
First, examine the document layout and visual structure:
123
+
{DOCUMENT_IMAGE}
124
+
125
+
Now analyze the extracted text:
126
+
{DOCUMENT_TEXT}
127
+
128
+
Extract the requested fields as JSON:
129
+
```
130
+
131
+
**Image for Context and Verification:**
132
+
```yaml
133
+
task_prompt: |
134
+
Extract these fields from a {DOCUMENT_CLASS}:
135
+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
136
+
137
+
Document text (may contain OCR errors):
138
+
{DOCUMENT_TEXT}
139
+
140
+
Use this image to verify and correct any unclear information:
141
+
{DOCUMENT_IMAGE}
142
+
143
+
Extracted data (JSON format):
144
+
```
145
+
146
+
**Mixed Content Analysis:**
147
+
```yaml
148
+
task_prompt: |
149
+
You are processing a {DOCUMENT_CLASS} that may contain both text and visual elements like tables, stamps, or signatures.
150
+
151
+
Target fields: {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
152
+
153
+
Document image (shows full layout):
154
+
{DOCUMENT_IMAGE}
155
+
156
+
Extracted text (may miss visual-only elements):
157
+
{DOCUMENT_TEXT}
158
+
159
+
Extract all available information as JSON:
160
+
```
161
+
162
+
### Integration with Few-Shot Examples
163
+
164
+
The `{DOCUMENT_IMAGE}` placeholder works seamlessly with few-shot examples:
165
+
166
+
```yaml
167
+
extraction:
168
+
task_prompt: |
169
+
Extract fields from {DOCUMENT_CLASS} documents. Here are examples:
170
+
171
+
{FEW_SHOT_EXAMPLES}
172
+
173
+
Now process this new document:
174
+
175
+
Visual layout:
176
+
{DOCUMENT_IMAGE}
177
+
178
+
Text content:
179
+
{DOCUMENT_TEXT}
180
+
181
+
Fields to extract: {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
182
+
183
+
JSON response:
184
+
```
185
+
186
+
### Benefits for Extraction
187
+
188
+
- **🎯 Enhanced Accuracy**: Visual context helps identify field locations and correct OCR errors
189
+
- **📊 Table and Form Handling**: Better extraction from structured layouts like tables and forms
190
+
- **✍️ Handwritten Content**: Improved handling of signatures, handwritten notes, and annotations
191
+
- **🖼️ Visual-Only Elements**: Extract information from stamps, logos, checkboxes, and visual indicators
192
+
- **🔍 Verification**: Use images to verify and correct text extraction results
193
+
- **📱 Layout Understanding**: Better comprehension of document structure and field relationships
194
+
195
+
### Multi-Page Document Handling
196
+
197
+
For documents with multiple pages, the system provides robust image management:
198
+
199
+
- **Automatic Pagination**: Images are processed in page order
200
+
- **Bedrock Compliance**: Maximum 20 images per request (automatically enforced)
201
+
- **Smart Truncation**: Excess images are dropped with warning logs
202
+
- **Performance Optimization**: Large image sets are efficiently handled
203
+
204
+
```yaml
205
+
# Example configuration for multi-page invoices
206
+
extraction:
207
+
task_prompt: |
208
+
Extract data from this multi-page {DOCUMENT_CLASS}:
209
+
210
+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
211
+
212
+
Document pages (up to 20 images):
213
+
{DOCUMENT_IMAGE}
214
+
215
+
Combined text from all pages:
216
+
{DOCUMENT_TEXT}
217
+
218
+
Return JSON with extracted fields:
219
+
```
220
+
221
+
### Best Practices for Image Placement
222
+
223
+
1. **Place Images Before Complex Instructions**: Show the document before giving detailed extraction rules
224
+
2. **Use Images for Verification**: Position images after text to help verify and correct extractions
225
+
3. **Leverage Visual Context**: Use images when extracting from tables, forms, or structured layouts
226
+
4. **Handle OCR Limitations**: Use images to fill gaps where OCR may miss visual-only content
227
+
5. **Consider Document Types**: Different document types benefit from different image placement strategies
228
+
75
229
## Using CachePoint for Extraction
76
230
77
231
CachePoint is a feature of select Bedrock models that caches partial computations to improve performance and reduce costs. When used with extraction, it provides:
Copy file name to clipboardExpand all lines: docs/few-shot-examples.md
+45-1Lines changed: 45 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -337,8 +337,9 @@ classes:
337
337
338
338
### Step 4: Update Task Prompts with Cache Points
339
339
340
-
Ensure your classification and extraction task prompts include the `{FEW_SHOT_EXAMPLES}` placeholder and use `<<CACHEPOINT>>` for optimal performance:
340
+
Ensure your classification and extraction task prompts include the `{FEW_SHOT_EXAMPLES}` placeholder and use `<<CACHEPOINT>>` for optimal performance. You can also use the `{DOCUMENT_IMAGE}` placeholder for precise image positioning:
341
341
342
+
**Standard Few-Shot Configuration:**
342
343
```yaml
343
344
classification:
344
345
task_prompt: |
@@ -372,6 +373,49 @@ extraction:
372
373
{DOCUMENT_TEXT}
373
374
```
374
375
376
+
**Enhanced Configuration with Image Placement:**
377
+
```yaml
378
+
classification:
379
+
task_prompt: |
380
+
Classify this document into exactly one of these categories:
381
+
382
+
{CLASS_NAMES_AND_DESCRIPTIONS}
383
+
384
+
<few_shot_examples>
385
+
{FEW_SHOT_EXAMPLES}
386
+
</few_shot_examples>
387
+
388
+
<<CACHEPOINT>>
389
+
390
+
Now examine this new document:
391
+
{DOCUMENT_IMAGE}
392
+
393
+
Document text:
394
+
{DOCUMENT_TEXT}
395
+
396
+
Classification:
397
+
398
+
extraction:
399
+
task_prompt: |
400
+
Extract the following attributes from this {DOCUMENT_CLASS} document:
401
+
402
+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
403
+
404
+
<few_shot_examples>
405
+
{FEW_SHOT_EXAMPLES}
406
+
</few_shot_examples>
407
+
408
+
<<CACHEPOINT>>
409
+
410
+
Analyze this document:
411
+
{DOCUMENT_IMAGE}
412
+
413
+
Text content:
414
+
{DOCUMENT_TEXT}
415
+
416
+
Extract as JSON:
417
+
```
418
+
375
419
**Important**: The `<<CACHEPOINT>>` delimiter separates the static portion of your prompt (classes, few-shot examples) from the dynamic portion (document content). This enables Bedrock prompt caching, significantly reducing costs when processing multiple documents with the same configuration.
376
420
377
421
### Step 5: Configure Path Resolution (If Using Images)
0 commit comments