add new {DOCUMENT_IMAGE} prompt placeholder to docs

Bob Strahan · Bob Strahan · commit ab341b9ce518 · 2025-06-02T21:28:49.000Z
diff --git a/docs/classification.md b/docs/classification.md
@@ -197,6 +197,103 @@ You can define custom document classes through the Web UI configuration:
    - Detailed description (to guide the classification model)
 5. Save changes
 
+## Image Placement with {DOCUMENT_IMAGE} Placeholder
+
+Pattern 2 supports precise control over where document images are positioned within your classification prompts using the `{DOCUMENT_IMAGE}` placeholder. This feature allows you to specify exactly where images should appear in your prompt template, rather than having them automatically appended at the end.
+
+### How {DOCUMENT_IMAGE} Works
+
+**Without Placeholder (Default Behavior):**
+```yaml
+classification:
+  task_prompt: |
+    Analyze this document:
+    
+    {DOCUMENT_TEXT}
+    
+    Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
+```
+Images are automatically appended after the text content.
+
+**With Placeholder (Controlled Placement):**
+```yaml
+classification:
+  task_prompt: |
+    Analyze this document:
+    
+    {DOCUMENT_IMAGE}
+    
+    Text content: {DOCUMENT_TEXT}
+    
+    Classify it as one of: {CLASS_NAMES_AND_DESCRIPTIONS}
+```
+Images are inserted exactly where `{DOCUMENT_IMAGE}` appears in the prompt.
+
+### Usage Examples
+
+**Image Before Text Analysis:**
+```yaml
+task_prompt: |
+  Look at this document image first:
+  
+  {DOCUMENT_IMAGE}
+  
+  Now read the extracted text:
+  {DOCUMENT_TEXT}
+  
+  Based on both the visual layout and text content, classify this document as one of:
+  {CLASS_NAMES_AND_DESCRIPTIONS}
+```
+
+**Image in the Middle for Context:**
+```yaml
+task_prompt: |
+  You are classifying business documents. Here are the possible types:
+  {CLASS_NAMES_AND_DESCRIPTIONS}
+  
+  Examine this document image:
+  {DOCUMENT_IMAGE}
+  
+  Additional text content extracted from the document:
+  {DOCUMENT_TEXT}
+  
+  Classification:
+```
+
+### Integration with Few-Shot Examples
+
+The `{DOCUMENT_IMAGE}` placeholder works seamlessly with few-shot examples:
+
+```yaml
+classification:
+  task_prompt: |
+    Here are examples of each document type:
+    {FEW_SHOT_EXAMPLES}
+    
+    Now classify this new document:
+    {DOCUMENT_IMAGE}
+    
+    Text: {DOCUMENT_TEXT}
+    
+    Classification: {CLASS_NAMES_AND_DESCRIPTIONS}
+```
+
+### Benefits
+
+- **🎯 Contextual Placement**: Position images where they provide maximum context
+- **📱 Better Multimodal Understanding**: Help models correlate visual and textual information
+- **🔄 Flexible Prompt Design**: Create prompts that flow naturally between different content types
+- **⚡ Improved Performance**: Strategic image placement can improve classification accuracy
+- **🔒 Backward Compatible**: Existing prompts without the placeholder continue to work unchanged
+
+### Multi-Page Documents
+
+For documents with multiple pages, the system automatically handles image limits:
+
+- **Bedrock Limit**: Maximum 20 images per request (automatically enforced)
+- **Warning Logging**: System logs warnings when images are truncated due to limits
+- **Smart Handling**: Images are processed in page order, with excess images automatically dropped
+
 ## Setting Up Few Shot Examples in Pattern 2
 
 Pattern 2's multimodal page-level classification supports few-shot example prompting, which can significantly improve classification accuracy by providing concrete document examples. This feature is available when you select the 'few_shot_example_with_multimodal_page_classification' configuration.
diff --git a/docs/extraction.md b/docs/extraction.md
@@ -72,6 +72,160 @@ extraction:
 
 The extraction service parses the JSON response and makes it available for downstream processing.
 
+## Image Placement with {DOCUMENT_IMAGE} Placeholder
+
+The extraction service supports precise control over where document images are positioned within your extraction prompts using the `{DOCUMENT_IMAGE}` placeholder. This feature allows you to specify exactly where images should appear in your prompt template, enabling better multimodal extraction by strategically positioning visual content relative to text instructions.
+
+### How {DOCUMENT_IMAGE} Works
+
+**Without Placeholder (Default Behavior):**
+```yaml
+extraction:
+  task_prompt: |
+    Extract the following fields from this {DOCUMENT_CLASS} document:
+    
+    {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
+    
+    Document text:
+    {DOCUMENT_TEXT}
+    
+    Respond with valid JSON.
+```
+Images are automatically appended after the text content.
+
+**With Placeholder (Controlled Placement):**
+```yaml
+extraction:
+  task_prompt: |
+    Extract the following fields from this {DOCUMENT_CLASS} document:
+    
+    {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
+    
+    Examine this document image:
+    {DOCUMENT_IMAGE}
+    
+    Text content:
+    {DOCUMENT_TEXT}
+    
+    Respond with valid JSON containing the extracted values.
+```
+Images are inserted exactly where `{DOCUMENT_IMAGE}` appears in the prompt.
+
+### Usage Examples
+
+**Visual-First Extraction:**
+```yaml
+task_prompt: |
+  You are extracting data from a {DOCUMENT_CLASS}. Here are the fields to find:
+  {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
+  
+  First, examine the document layout and visual structure:
+  {DOCUMENT_IMAGE}
+  
+  Now analyze the extracted text:
+  {DOCUMENT_TEXT}
+  
+  Extract the requested fields as JSON:
+```
+
+**Image for Context and Verification:**
+```yaml
+task_prompt: |
+  Extract these fields from a {DOCUMENT_CLASS}:
+  {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
+  
+  Document text (may contain OCR errors):
+  {DOCUMENT_TEXT}
+  
+  Use this image to verify and correct any unclear information:
+  {DOCUMENT_IMAGE}
+  
+  Extracted data (JSON format):
+```
+
+**Mixed Content Analysis:**
+```yaml
+task_prompt: |
+  You are processing a {DOCUMENT_CLASS} that may contain both text and visual elements like tables, stamps, or signatures.
+  
+  Target fields: {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
+  
+  Document image (shows full layout):
+  {DOCUMENT_IMAGE}
+  
+  Extracted text (may miss visual-only elements):
+  {DOCUMENT_TEXT}
+  
+  Extract all available information as JSON:
+```
+
+### Integration with Few-Shot Examples
+
+The `{DOCUMENT_IMAGE}` placeholder works seamlessly with few-shot examples:
+
+```yaml
+extraction:
+  task_prompt: |
+    Extract fields from {DOCUMENT_CLASS} documents. Here are examples:
+    
+    {FEW_SHOT_EXAMPLES}
+    
+    Now process this new document:
+    
+    Visual layout:
+    {DOCUMENT_IMAGE}
+    
+    Text content:
+    {DOCUMENT_TEXT}
+    
+    Fields to extract: {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
+    
+    JSON response:
+```
+
+### Benefits for Extraction
+
+- **🎯 Enhanced Accuracy**: Visual context helps identify field locations and correct OCR errors
+- **📊 Table and Form Handling**: Better extraction from structured layouts like tables and forms
+- **✍️ Handwritten Content**: Improved handling of signatures, handwritten notes, and annotations
+- **🖼️ Visual-Only Elements**: Extract information from stamps, logos, checkboxes, and visual indicators
+- **🔍 Verification**: Use images to verify and correct text extraction results
+- **📱 Layout Understanding**: Better comprehension of document structure and field relationships
+
+### Multi-Page Document Handling
+
+For documents with multiple pages, the system provides robust image management:
+
+- **Automatic Pagination**: Images are processed in page order
+- **Bedrock Compliance**: Maximum 20 images per request (automatically enforced)
+- **Smart Truncation**: Excess images are dropped with warning logs
+- **Performance Optimization**: Large image sets are efficiently handled
+
+```yaml
+# Example configuration for multi-page invoices
+extraction:
+  task_prompt: |
+    Extract data from this multi-page {DOCUMENT_CLASS}:
+    
+    {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
+    
+    Document pages (up to 20 images):
+    {DOCUMENT_IMAGE}
+    
+    Combined text from all pages:
+    {DOCUMENT_TEXT}
+    
+    Return JSON with extracted fields:
+```
+
+### Best Practices for Image Placement
+
+1. **Place Images Before Complex Instructions**: Show the document before giving detailed extraction rules
+2. **Use Images for Verification**: Position images after text to help verify and correct extractions
+3. **Leverage Visual Context**: Use images when extracting from tables, forms, or structured layouts
+4. **Handle OCR Limitations**: Use images to fill gaps where OCR may miss visual-only content
+5. **Consider Document Types**: Different document types benefit from different image placement strategies
+
 ## Using CachePoint for Extraction
 
 CachePoint is a feature of select Bedrock models that caches partial computations to improve performance and reduce costs. When used with extraction, it provides:
diff --git a/docs/few-shot-examples.md b/docs/few-shot-examples.md
@@ -337,8 +337,9 @@ classes:
 
 ### Step 4: Update Task Prompts with Cache Points
 
-Ensure your classification and extraction task prompts include the `{FEW_SHOT_EXAMPLES}` placeholder and use `<<CACHEPOINT>>` for optimal performance:
+Ensure your classification and extraction task prompts include the `{FEW_SHOT_EXAMPLES}` placeholder and use `<<CACHEPOINT>>` for optimal performance. You can also use the `{DOCUMENT_IMAGE}` placeholder for precise image positioning:
 
+**Standard Few-Shot Configuration:**
 ```yaml
 classification:
   task_prompt: |
@@ -372,6 +373,49 @@ extraction:
     {DOCUMENT_TEXT}
 ```
 
+**Enhanced Configuration with Image Placement:**
+```yaml
+classification:
+  task_prompt: |
+    Classify this document into exactly one of these categories:
+    
+    {CLASS_NAMES_AND_DESCRIPTIONS}
+    
+    <few_shot_examples>
+    {FEW_SHOT_EXAMPLES}
+    </few_shot_examples>
+    
+    <<CACHEPOINT>>
+    
+    Now examine this new document:
+    {DOCUMENT_IMAGE}
+    
+    Document text:
+    {DOCUMENT_TEXT}
+    
+    Classification:
+
+extraction:
+  task_prompt: |
+    Extract the following attributes from this {DOCUMENT_CLASS} document:
+    
+    {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
+    
+    <few_shot_examples>
+    {FEW_SHOT_EXAMPLES}
+    </few_shot_examples>
+    
+    <<CACHEPOINT>>
+    
+    Analyze this document:
+    {DOCUMENT_IMAGE}
+    
+    Text content:
+    {DOCUMENT_TEXT}
+    
+    Extract as JSON:
+```
+
 **Important**: The `<<CACHEPOINT>>` delimiter separates the static portion of your prompt (classes, few-shot examples) from the dynamic portion (document content). This enables Bedrock prompt caching, significantly reducing costs when processing multiple documents with the same configuration.
 
 ### Step 5: Configure Path Resolution (If Using Images)
diff --git a/docs/pattern-2.md b/docs/pattern-2.md