Skip to content

Commit ab341b9

Browse files
author
Bob Strahan
committed
add new {DOCUMENT_IMAGE} prompt placeholder to docs
1 parent 55c505b commit ab341b9

File tree

4 files changed

+355
-2
lines changed

4 files changed

+355
-2
lines changed

docs/classification.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,103 @@ You can define custom document classes through the Web UI configuration:
197197
- Detailed description (to guide the classification model)
198198
5. Save changes
199199

200+
## Image Placement with {DOCUMENT_IMAGE} Placeholder
201+
202+
Pattern 2 supports precise control over where document images are positioned within your classification prompts using the `{DOCUMENT_IMAGE}` placeholder. This feature allows you to specify exactly where images should appear in your prompt template, rather than having them automatically appended at the end.
203+
204+
### How {DOCUMENT_IMAGE} Works
205+
206+
**Without Placeholder (Default Behavior):**
207+
```yaml
208+
classification:
209+
task_prompt: |
210+
Analyze this document:
211+
212+
{DOCUMENT_TEXT}
213+
214+
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
215+
```
216+
Images are automatically appended after the text content.
217+
218+
**With Placeholder (Controlled Placement):**
219+
```yaml
220+
classification:
221+
task_prompt: |
222+
Analyze this document:
223+
224+
{DOCUMENT_IMAGE}
225+
226+
Text content: {DOCUMENT_TEXT}
227+
228+
Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
229+
```
230+
Images are inserted exactly where `{DOCUMENT_IMAGE}` appears in the prompt.
231+
232+
### Usage Examples
233+
234+
**Image Before Text Analysis:**
235+
```yaml
236+
task_prompt: |
237+
Look at this document image first:
238+
239+
{DOCUMENT_IMAGE}
240+
241+
Now read the extracted text:
242+
{DOCUMENT_TEXT}
243+
244+
Based on both the visual layout and text content, classify this document as one of:
245+
{CLASS_NAMES_AND_DESCRIPTIONS}
246+
```
247+
248+
**Image in the Middle for Context:**
249+
```yaml
250+
task_prompt: |
251+
You are classifying business documents. Here are the possible types:
252+
{CLASS_NAMES_AND_DESCRIPTIONS}
253+
254+
Examine this document image:
255+
{DOCUMENT_IMAGE}
256+
257+
Additional text content extracted from the document:
258+
{DOCUMENT_TEXT}
259+
260+
Classification:
261+
```
262+
263+
### Integration with Few-Shot Examples
264+
265+
The `{DOCUMENT_IMAGE}` placeholder works seamlessly with few-shot examples:
266+
267+
```yaml
268+
classification:
269+
task_prompt: |
270+
Here are examples of each document type:
271+
{FEW_SHOT_EXAMPLES}
272+
273+
Now classify this new document:
274+
{DOCUMENT_IMAGE}
275+
276+
Text: {DOCUMENT_TEXT}
277+
278+
Classification: {CLASS_NAMES_AND_DESCRIPTIONS}
279+
```
280+
281+
### Benefits
282+
283+
- **🎯 Contextual Placement**: Position images where they provide maximum context
284+
- **📱 Better Multimodal Understanding**: Help models correlate visual and textual information
285+
- **🔄 Flexible Prompt Design**: Create prompts that flow naturally between different content types
286+
- **⚡ Improved Performance**: Strategic image placement can improve classification accuracy
287+
- **🔒 Backward Compatible**: Existing prompts without the placeholder continue to work unchanged
288+
289+
### Multi-Page Documents
290+
291+
For documents with multiple pages, the system automatically handles image limits:
292+
293+
- **Bedrock Limit**: Maximum 20 images per request (automatically enforced)
294+
- **Warning Logging**: System logs warnings when images are truncated due to limits
295+
- **Smart Handling**: Images are processed in page order, with excess images automatically dropped
296+
200297
## Setting Up Few Shot Examples in Pattern 2
201298

202299
Pattern 2's multimodal page-level classification supports few-shot example prompting, which can significantly improve classification accuracy by providing concrete document examples. This feature is available when you select the 'few_shot_example_with_multimodal_page_classification' configuration.

docs/extraction.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,160 @@ extraction:
7272
7373
The extraction service parses the JSON response and makes it available for downstream processing.
7474
75+
## Image Placement with {DOCUMENT_IMAGE} Placeholder
76+
77+
The extraction service supports precise control over where document images are positioned within your extraction prompts using the `{DOCUMENT_IMAGE}` placeholder. This feature allows you to specify exactly where images should appear in your prompt template, enabling better multimodal extraction by strategically positioning visual content relative to text instructions.
78+
79+
### How {DOCUMENT_IMAGE} Works
80+
81+
**Without Placeholder (Default Behavior):**
82+
```yaml
83+
extraction:
84+
task_prompt: |
85+
Extract the following fields from this {DOCUMENT_CLASS} document:
86+
87+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
88+
89+
Document text:
90+
{DOCUMENT_TEXT}
91+
92+
Respond with valid JSON.
93+
```
94+
Images are automatically appended after the text content.
95+
96+
**With Placeholder (Controlled Placement):**
97+
```yaml
98+
extraction:
99+
task_prompt: |
100+
Extract the following fields from this {DOCUMENT_CLASS} document:
101+
102+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
103+
104+
Examine this document image:
105+
{DOCUMENT_IMAGE}
106+
107+
Text content:
108+
{DOCUMENT_TEXT}
109+
110+
Respond with valid JSON containing the extracted values.
111+
```
112+
Images are inserted exactly where `{DOCUMENT_IMAGE}` appears in the prompt.
113+
114+
### Usage Examples
115+
116+
**Visual-First Extraction:**
117+
```yaml
118+
task_prompt: |
119+
You are extracting data from a {DOCUMENT_CLASS}. Here are the fields to find:
120+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
121+
122+
First, examine the document layout and visual structure:
123+
{DOCUMENT_IMAGE}
124+
125+
Now analyze the extracted text:
126+
{DOCUMENT_TEXT}
127+
128+
Extract the requested fields as JSON:
129+
```
130+
131+
**Image for Context and Verification:**
132+
```yaml
133+
task_prompt: |
134+
Extract these fields from a {DOCUMENT_CLASS}:
135+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
136+
137+
Document text (may contain OCR errors):
138+
{DOCUMENT_TEXT}
139+
140+
Use this image to verify and correct any unclear information:
141+
{DOCUMENT_IMAGE}
142+
143+
Extracted data (JSON format):
144+
```
145+
146+
**Mixed Content Analysis:**
147+
```yaml
148+
task_prompt: |
149+
You are processing a {DOCUMENT_CLASS} that may contain both text and visual elements like tables, stamps, or signatures.
150+
151+
Target fields: {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
152+
153+
Document image (shows full layout):
154+
{DOCUMENT_IMAGE}
155+
156+
Extracted text (may miss visual-only elements):
157+
{DOCUMENT_TEXT}
158+
159+
Extract all available information as JSON:
160+
```
161+
162+
### Integration with Few-Shot Examples
163+
164+
The `{DOCUMENT_IMAGE}` placeholder works seamlessly with few-shot examples:
165+
166+
```yaml
167+
extraction:
168+
task_prompt: |
169+
Extract fields from {DOCUMENT_CLASS} documents. Here are examples:
170+
171+
{FEW_SHOT_EXAMPLES}
172+
173+
Now process this new document:
174+
175+
Visual layout:
176+
{DOCUMENT_IMAGE}
177+
178+
Text content:
179+
{DOCUMENT_TEXT}
180+
181+
Fields to extract: {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
182+
183+
JSON response:
184+
```
185+
186+
### Benefits for Extraction
187+
188+
- **🎯 Enhanced Accuracy**: Visual context helps identify field locations and correct OCR errors
189+
- **📊 Table and Form Handling**: Better extraction from structured layouts like tables and forms
190+
- **✍️ Handwritten Content**: Improved handling of signatures, handwritten notes, and annotations
191+
- **🖼️ Visual-Only Elements**: Extract information from stamps, logos, checkboxes, and visual indicators
192+
- **🔍 Verification**: Use images to verify and correct text extraction results
193+
- **📱 Layout Understanding**: Better comprehension of document structure and field relationships
194+
195+
### Multi-Page Document Handling
196+
197+
For documents with multiple pages, the system provides robust image management:
198+
199+
- **Automatic Pagination**: Images are processed in page order
200+
- **Bedrock Compliance**: Maximum 20 images per request (automatically enforced)
201+
- **Smart Truncation**: Excess images are dropped with warning logs
202+
- **Performance Optimization**: Large image sets are efficiently handled
203+
204+
```yaml
205+
# Example configuration for multi-page invoices
206+
extraction:
207+
task_prompt: |
208+
Extract data from this multi-page {DOCUMENT_CLASS}:
209+
210+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
211+
212+
Document pages (up to 20 images):
213+
{DOCUMENT_IMAGE}
214+
215+
Combined text from all pages:
216+
{DOCUMENT_TEXT}
217+
218+
Return JSON with extracted fields:
219+
```
220+
221+
### Best Practices for Image Placement
222+
223+
1. **Place Images Before Complex Instructions**: Show the document before giving detailed extraction rules
224+
2. **Use Images for Verification**: Position images after text to help verify and correct extractions
225+
3. **Leverage Visual Context**: Use images when extracting from tables, forms, or structured layouts
226+
4. **Handle OCR Limitations**: Use images to fill gaps where OCR may miss visual-only content
227+
5. **Consider Document Types**: Different document types benefit from different image placement strategies
228+
75229
## Using CachePoint for Extraction
76230

77231
CachePoint is a feature of select Bedrock models that caches partial computations to improve performance and reduce costs. When used with extraction, it provides:

docs/few-shot-examples.md

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,8 +337,9 @@ classes:
337337

338338
### Step 4: Update Task Prompts with Cache Points
339339

340-
Ensure your classification and extraction task prompts include the `{FEW_SHOT_EXAMPLES}` placeholder and use `<<CACHEPOINT>>` for optimal performance:
340+
Ensure your classification and extraction task prompts include the `{FEW_SHOT_EXAMPLES}` placeholder and use `<<CACHEPOINT>>` for optimal performance. You can also use the `{DOCUMENT_IMAGE}` placeholder for precise image positioning:
341341

342+
**Standard Few-Shot Configuration:**
342343
```yaml
343344
classification:
344345
task_prompt: |
@@ -372,6 +373,49 @@ extraction:
372373
{DOCUMENT_TEXT}
373374
```
374375

376+
**Enhanced Configuration with Image Placement:**
377+
```yaml
378+
classification:
379+
task_prompt: |
380+
Classify this document into exactly one of these categories:
381+
382+
{CLASS_NAMES_AND_DESCRIPTIONS}
383+
384+
<few_shot_examples>
385+
{FEW_SHOT_EXAMPLES}
386+
</few_shot_examples>
387+
388+
<<CACHEPOINT>>
389+
390+
Now examine this new document:
391+
{DOCUMENT_IMAGE}
392+
393+
Document text:
394+
{DOCUMENT_TEXT}
395+
396+
Classification:
397+
398+
extraction:
399+
task_prompt: |
400+
Extract the following attributes from this {DOCUMENT_CLASS} document:
401+
402+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
403+
404+
<few_shot_examples>
405+
{FEW_SHOT_EXAMPLES}
406+
</few_shot_examples>
407+
408+
<<CACHEPOINT>>
409+
410+
Analyze this document:
411+
{DOCUMENT_IMAGE}
412+
413+
Text content:
414+
{DOCUMENT_TEXT}
415+
416+
Extract as JSON:
417+
```
418+
375419
**Important**: The `<<CACHEPOINT>>` delimiter separates the static portion of your prompt (classes, few-shot examples) from the dynamic portion (document content). This enables Bedrock prompt caching, significantly reducing costs when processing multiple documents with the same configuration.
376420

377421
### Step 5: Configure Path Resolution (If Using Images)

0 commit comments

Comments
 (0)