docs: add configurable image processing and enhanced resizing logic

Bob Strahan · Bob Strahan · commit 3724982e133e · 2025-06-20T16:19:11.000Z
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,13 @@ SPDX-License-Identifier: MIT-0
 ## [Unreleased]
 
 ### Added
+- **Configurable Image Processing and Enhanced Resizing Logic**
+  - **Improved Image Resizing Algorithm**: Enhanced aspect-ratio preserving scaling that only downsizes when necessary (scale factor < 1.0) to prevent image distortion
+  - **Configurable Image Dimensions**: All processing services (Assessment, Classification, Extraction, OCR) now support configurable image dimensions through configuration with default 951×1268 resolution
+  - **Service-Specific Image Optimization**: Each service can use optimal image dimensions for performance and quality tuning
+  - **Enhanced OCR Service**: Added configurable DPI for PDF-to-image conversion (default: 300) and optional image resizing with dual image strategy (stores original high-DPI images while using resized images for processing)
+  - **Runtime Configuration**: No code changes needed to adjust image processing - all configurable through service configuration
+  - **Backward Compatibility**: Default values maintain existing behavior with no immediate action required for existing deployments
 - **Enhanced Configuration Management**
   - **Save as Default**: New button to save current configuration as the new default baseline with confirmation modal and version upgrade warnings
   - **Export Configuration**: Export current configuration to local files in JSON or YAML format with customizable filename
diff --git a/docs/assessment.md b/docs/assessment.md
@@ -522,6 +522,54 @@ StateTaxes[0]:
   └── Period: 8.43 [Confidence: 83.2% / Threshold: 80.0% - GREEN]
 ```
 
+## Image Processing Configuration
+
+The assessment service supports configurable image dimensions for optimal confidence evaluation:
+
+### Default Configuration
+
+```yaml
+assessment:
+  model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
+  # Image processing settings
+  image:
+    target_width: 951    # Default width in pixels
+    target_height: 1268  # Default height in pixels
+```
+
+### Custom Image Dimensions
+
+Configure image dimensions based on assessment requirements:
+
+```yaml
+# For detailed visual assessment
+assessment:
+  image:
+    target_width: 1200
+    target_height: 1600
+
+# For standard confidence evaluation
+assessment:
+  image:
+    target_width: 800
+    target_height: 1000
+```
+
+### Image Resizing Features for Assessment
+
+- **Aspect Ratio Preservation**: Images maintain proportions for accurate visual analysis
+- **Smart Scaling**: Only downsizes when necessary to preserve visual detail
+- **High-Quality Resampling**: Better image quality for confidence assessment
+- **Performance Optimization**: Optimized images reduce assessment processing time
+
+### Configuration Benefits for Assessment
+
+- **Enhanced Visual Analysis**: Appropriate resolution improves confidence evaluation accuracy
+- **Better OCR Verification**: Higher quality images help verify extraction results against visual content
+- **Improved Confidence Scoring**: Better image quality leads to more accurate confidence assessments
+- **Service-Specific Tuning**: Optimize image dimensions for different assessment complexity levels
+- **Resource Optimization**: Balance assessment quality and processing costs
+
 ## Cost Optimization
 
 ### Token Reduction Strategy
@@ -532,6 +580,7 @@ The assessment feature implements several cost optimization techniques:
 2. **Conditional Image Processing**: Images only processed when `{DOCUMENT_IMAGE}` placeholder is present
 3. **Optional Deployment**: Assessment infrastructure only deployed when `IsAssessmentEnabled=true`
 4. **Efficient Prompting**: Optimized prompt templates minimize token usage while maintaining accuracy
+5. **Configurable Image Dimensions**: Adjust image resolution to balance assessment quality and processing costs
 
 
 ## Testing and Validation
diff --git a/docs/classification.md b/docs/classification.md
@@ -337,6 +337,54 @@ The `imagePath` field supports multiple formats:
 
 For comprehensive details on configuring few-shot examples, including multimodal vs. text-only approaches, example management, and advanced features, refer to the [few-shot-examples.md](./few-shot-examples.md) documentation.
 
+## Image Processing Configuration
+
+The classification service supports configurable image dimensions for optimal performance and quality:
+
+### Default Configuration
+
+```yaml
+classification:
+  model: us.amazon.nova-pro-v1:0
+  # Image processing settings
+  image:
+    target_width: 951    # Default width in pixels
+    target_height: 1268  # Default height in pixels
+```
+
+### Custom Image Dimensions
+
+Configure image dimensions based on your specific requirements:
+
+```yaml
+# For high-accuracy classification
+classification:
+  image:
+    target_width: 1200
+    target_height: 1600
+
+# For fast processing with lower resolution
+classification:
+  image:
+    target_width: 600
+    target_height: 800
+```
+
+### Image Resizing Features
+
+- **Aspect Ratio Preservation**: Images are resized proportionally without distortion
+- **Smart Scaling**: Only downsizes images when necessary (scale factor < 1.0)
+- **High-Quality Resampling**: Better visual quality after resizing
+- **Performance Optimization**: Smaller, optimized images process faster with lower memory usage
+
+### Configuration Benefits
+
+- **Service-Specific Tuning**: Each service can use optimal image dimensions
+- **Runtime Configuration**: No code changes needed to adjust image processing
+- **Backward Compatibility**: Default values maintain existing behavior
+- **Memory Optimization**: Configurable dimensions allow memory optimization
+- **Better Resource Utilization**: Service-specific sizing reduces unnecessary processing
+
 ## Best Practices for Classification
 
 1. **Provide Clear Class Descriptions**: Include distinctive features and common elements
@@ -346,3 +394,5 @@ For comprehensive details on configuring few-shot examples, including multimodal
 5. **Monitor and Refine**: Use the evaluation framework to track classification accuracy
 6. **Consider Visual Elements**: Describe visual layout and design patterns in class descriptions
 7. **Test with Real Documents**: Validate classification against actual document samples
+8. **Optimize Image Dimensions**: Configure appropriate image sizes based on document complexity and processing requirements
+9. **Balance Quality vs Performance**: Higher resolution images provide better accuracy but consume more resources
diff --git a/docs/extraction.md b/docs/extraction.md
@@ -399,6 +399,55 @@ extraction:
 
 Examples are class-specific - only examples from the same document class being processed will be included in the prompt.
 
+## Image Processing Configuration
+
+The extraction service supports configurable image dimensions for optimal performance and quality:
+
+### Default Configuration
+
+```yaml
+extraction:
+  model: us.amazon.nova-pro-v1:0
+  # Image processing settings
+  image:
+    target_width: 951    # Default width in pixels
+    target_height: 1268  # Default height in pixels
+```
+
+### Custom Image Dimensions
+
+Configure image dimensions based on your extraction requirements:
+
+```yaml
+# For high-accuracy extraction with detailed visual analysis
+extraction:
+  image:
+    target_width: 1200
+    target_height: 1600
+
+# For fast processing with standard resolution
+extraction:
+  image:
+    target_width: 800
+    target_height: 1000
+```
+
+### Image Resizing Features
+
+- **Aspect Ratio Preservation**: Images are resized proportionally without distortion
+- **Smart Scaling**: Only downsizes images when necessary (scale factor < 1.0)
+- **High-Quality Resampling**: Better visual quality after resizing for improved field detection
+- **Performance Optimization**: Optimized images reduce processing time and memory usage
+
+### Configuration Benefits for Extraction
+
+- **Enhanced Field Detection**: Appropriate image resolution improves accuracy for table and form extraction
+- **Visual Element Processing**: Better handling of signatures, stamps, checkboxes, and visual indicators
+- **OCR Error Correction**: Higher quality images help verify and correct text extraction results
+- **Service-Specific Tuning**: Optimize image dimensions for different document types and extraction complexity
+- **Runtime Configuration**: Adjust image processing without code changes
+- **Resource Optimization**: Balance quality and performance based on extraction requirements
+
 ## Best Practices
 
 1. **Clear Attribute Descriptions**: Include detail on where and how information appears in the document. More specific descriptions lead to better extraction results.
@@ -421,3 +470,7 @@ Examples are class-specific - only examples from the same document class being p
 8. **Handle Document Variations**: Consider creating separate document classes for significantly different layouts of the same document type rather than trying to handle all variations with a single class.
 
 9. **Test Extraction Pipeline End-to-End**: Validate your extraction configuration with the full pipeline including OCR, classification, and extraction to ensure components work together effectively.
+
+10. **Optimize Image Dimensions**: Configure image dimensions based on document complexity - use higher resolution for forms and tables, standard resolution for simple text documents.
+
+11. **Balance Quality vs Performance**: Higher resolution images provide better extraction accuracy but consume more resources and processing time.
diff --git a/docs/pattern-2.md b/docs/pattern-2.md
@@ -58,6 +58,10 @@ Each step includes comprehensive retry logic for handling transient errors:
 - **Purpose**: Processes input PDFs using Amazon Textract
 - **Key Features**:
   - Concurrent page processing with ThreadPoolExecutor
+  - **Configurable Image Processing**: Enhanced image resizing with aspect-ratio preservation
+  - **Configurable DPI**: Adjustable DPI for PDF-to-image conversion (default: 300)
+  - **Dual Image Strategy**: Stores original high-DPI images while using resized images for OCR processing
+  - **Smart Resizing**: Only downsizes images when necessary (scale factor < 1.0)
   - Image preprocessing and optimization
   - Comprehensive error handling and retries
   - Detailed metrics tracking
@@ -212,6 +216,73 @@ The pattern exports these outputs to the parent stack:
 - Configuration can be updated through the Web UI without stack redeployment
 - Model choices are constrained through enum constraints in the configuration schema
 
+## OCR Configuration
+
+The OCR service in Pattern 2 supports enhanced image processing capabilities for optimal text extraction:
+
+### DPI Configuration
+
+Configure DPI (Dots Per Inch) for PDF-to-image conversion:
+
+```python
+# Example OCR service initialization with custom DPI
+ocr_service = OcrService(
+    dpi=400,  # Higher DPI for better quality (default: 300)
+    resize_config={
+        'target_width': 1200,
+        'target_height': 1600
+    }
+)
+```
+
+### Image Resizing Configuration
+
+The OCR service supports optional image resizing for processing optimization:
+
+```yaml
+# OCR configuration example
+ocr:
+  dpi: 300  # PDF-to-image conversion DPI
+  resize_config:
+    target_width: 951   # Target width for processing
+    target_height: 1268 # Target height for processing
+```
+
+### OCR Image Processing Features
+
+- **Configurable DPI**: Higher DPI (400+) for better quality, standard DPI (300) for balanced performance
+- **Dual Image Strategy**: 
+  - Stores original high-DPI images in S3 for archival and downstream processing
+  - Uses resized images for OCR processing to optimize performance
+- **Aspect Ratio Preservation**: Images are resized proportionally without distortion
+- **Smart Scaling**: Only downsizes images when necessary (scale factor < 1.0)
+- **Enhanced Logging**: Detailed logging for DPI and resize operations
+
+### Configuration Benefits
+
+- **Quality Control**: Higher DPI settings improve OCR accuracy for complex documents
+- **Performance Optimization**: Resized images reduce processing time and memory usage
+- **Storage Efficiency**: Dual strategy balances quality preservation with processing efficiency
+- **Flexibility**: Runtime configuration allows adjustment without code changes
+- **Backward Compatibility**: Default values maintain existing behavior
+
+### Best Practices for OCR
+
+1. **DPI Selection**:
+   - Use 300 DPI for standard documents
+   - Use 400+ DPI for documents with small text or complex layouts
+   - Consider processing costs when using higher DPI settings
+
+2. **Image Resizing**:
+   - Enable resizing for large documents to improve processing speed
+   - Maintain aspect ratios to preserve text readability
+   - Test different dimensions based on document types
+
+3. **Performance Tuning**:
+   - Monitor processing times and adjust DPI/resize settings accordingly
+   - Use concurrent processing for multi-page documents
+   - Balance quality requirements with processing costs
+
 ## Customizing Classification
 
 The pattern supports two different classification methods:
@@ -645,3 +716,11 @@ sam local invoke ExtractionFunction --env-vars testing/env.json -e testing/Extra
    - Include examples for all document classes you expect to process
    - Regularly review and update examples based on real-world performance
    - Test configurations with examples before production deployment
+
+8. **Image Processing Optimization**:
+   - Configure appropriate image dimensions for each service based on document complexity
+   - Use higher DPI (400+) for OCR when processing documents with small text or complex layouts
+   - Balance image quality with processing performance and costs
+   - Test different image configurations with your specific document types
+   - Monitor memory usage and processing times when adjusting image settings
+   - Leverage the dual image strategy in OCR to preserve quality while optimizing processing