Skip to content

Commit 3724982

Browse files
author
Bob Strahan
committed
docs: add configurable image processing and enhanced resizing logic
1 parent cb8df7b commit 3724982

File tree

5 files changed

+238
-0
lines changed

5 files changed

+238
-0
lines changed

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,13 @@ SPDX-License-Identifier: MIT-0
66
## [Unreleased]
77

88
### Added
9+
- **Configurable Image Processing and Enhanced Resizing Logic**
10+
- **Improved Image Resizing Algorithm**: Enhanced aspect-ratio preserving scaling that only downsizes when necessary (scale factor < 1.0) to prevent image distortion
11+
- **Configurable Image Dimensions**: All processing services (Assessment, Classification, Extraction, OCR) now support configurable image dimensions through configuration with default 951×1268 resolution
12+
- **Service-Specific Image Optimization**: Each service can use optimal image dimensions for performance and quality tuning
13+
- **Enhanced OCR Service**: Added configurable DPI for PDF-to-image conversion (default: 300) and optional image resizing with dual image strategy (stores original high-DPI images while using resized images for processing)
14+
- **Runtime Configuration**: No code changes needed to adjust image processing - all configurable through service configuration
15+
- **Backward Compatibility**: Default values maintain existing behavior with no immediate action required for existing deployments
916
- **Enhanced Configuration Management**
1017
- **Save as Default**: New button to save current configuration as the new default baseline with confirmation modal and version upgrade warnings
1118
- **Export Configuration**: Export current configuration to local files in JSON or YAML format with customizable filename

docs/assessment.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -522,6 +522,54 @@ StateTaxes[0]:
522522
└── Period: 8.43 [Confidence: 83.2% / Threshold: 80.0% - GREEN]
523523
```
524524
525+
## Image Processing Configuration
526+
527+
The assessment service supports configurable image dimensions for optimal confidence evaluation:
528+
529+
### Default Configuration
530+
531+
```yaml
532+
assessment:
533+
model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
534+
# Image processing settings
535+
image:
536+
target_width: 951 # Default width in pixels
537+
target_height: 1268 # Default height in pixels
538+
```
539+
540+
### Custom Image Dimensions
541+
542+
Configure image dimensions based on assessment requirements:
543+
544+
```yaml
545+
# For detailed visual assessment
546+
assessment:
547+
image:
548+
target_width: 1200
549+
target_height: 1600
550+
551+
# For standard confidence evaluation
552+
assessment:
553+
image:
554+
target_width: 800
555+
target_height: 1000
556+
```
557+
558+
### Image Resizing Features for Assessment
559+
560+
- **Aspect Ratio Preservation**: Images maintain proportions for accurate visual analysis
561+
- **Smart Scaling**: Only downsizes when necessary to preserve visual detail
562+
- **High-Quality Resampling**: Better image quality for confidence assessment
563+
- **Performance Optimization**: Optimized images reduce assessment processing time
564+
565+
### Configuration Benefits for Assessment
566+
567+
- **Enhanced Visual Analysis**: Appropriate resolution improves confidence evaluation accuracy
568+
- **Better OCR Verification**: Higher quality images help verify extraction results against visual content
569+
- **Improved Confidence Scoring**: Better image quality leads to more accurate confidence assessments
570+
- **Service-Specific Tuning**: Optimize image dimensions for different assessment complexity levels
571+
- **Resource Optimization**: Balance assessment quality and processing costs
572+
525573
## Cost Optimization
526574
527575
### Token Reduction Strategy
@@ -532,6 +580,7 @@ The assessment feature implements several cost optimization techniques:
532580
2. **Conditional Image Processing**: Images only processed when `{DOCUMENT_IMAGE}` placeholder is present
533581
3. **Optional Deployment**: Assessment infrastructure only deployed when `IsAssessmentEnabled=true`
534582
4. **Efficient Prompting**: Optimized prompt templates minimize token usage while maintaining accuracy
583+
5. **Configurable Image Dimensions**: Adjust image resolution to balance assessment quality and processing costs
535584

536585

537586
## Testing and Validation

docs/classification.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -337,6 +337,54 @@ The `imagePath` field supports multiple formats:
337337

338338
For comprehensive details on configuring few-shot examples, including multimodal vs. text-only approaches, example management, and advanced features, refer to the [few-shot-examples.md](./few-shot-examples.md) documentation.
339339

340+
## Image Processing Configuration
341+
342+
The classification service supports configurable image dimensions for optimal performance and quality:
343+
344+
### Default Configuration
345+
346+
```yaml
347+
classification:
348+
model: us.amazon.nova-pro-v1:0
349+
# Image processing settings
350+
image:
351+
target_width: 951 # Default width in pixels
352+
target_height: 1268 # Default height in pixels
353+
```
354+
355+
### Custom Image Dimensions
356+
357+
Configure image dimensions based on your specific requirements:
358+
359+
```yaml
360+
# For high-accuracy classification
361+
classification:
362+
image:
363+
target_width: 1200
364+
target_height: 1600
365+
366+
# For fast processing with lower resolution
367+
classification:
368+
image:
369+
target_width: 600
370+
target_height: 800
371+
```
372+
373+
### Image Resizing Features
374+
375+
- **Aspect Ratio Preservation**: Images are resized proportionally without distortion
376+
- **Smart Scaling**: Only downsizes images when necessary (scale factor < 1.0)
377+
- **High-Quality Resampling**: Better visual quality after resizing
378+
- **Performance Optimization**: Smaller, optimized images process faster with lower memory usage
379+
380+
### Configuration Benefits
381+
382+
- **Service-Specific Tuning**: Each service can use optimal image dimensions
383+
- **Runtime Configuration**: No code changes needed to adjust image processing
384+
- **Backward Compatibility**: Default values maintain existing behavior
385+
- **Memory Optimization**: Configurable dimensions allow memory optimization
386+
- **Better Resource Utilization**: Service-specific sizing reduces unnecessary processing
387+
340388
## Best Practices for Classification
341389

342390
1. **Provide Clear Class Descriptions**: Include distinctive features and common elements
@@ -346,3 +394,5 @@ For comprehensive details on configuring few-shot examples, including multimodal
346394
5. **Monitor and Refine**: Use the evaluation framework to track classification accuracy
347395
6. **Consider Visual Elements**: Describe visual layout and design patterns in class descriptions
348396
7. **Test with Real Documents**: Validate classification against actual document samples
397+
8. **Optimize Image Dimensions**: Configure appropriate image sizes based on document complexity and processing requirements
398+
9. **Balance Quality vs Performance**: Higher resolution images provide better accuracy but consume more resources

docs/extraction.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -399,6 +399,55 @@ extraction:
399399

400400
Examples are class-specific - only examples from the same document class being processed will be included in the prompt.
401401

402+
## Image Processing Configuration
403+
404+
The extraction service supports configurable image dimensions for optimal performance and quality:
405+
406+
### Default Configuration
407+
408+
```yaml
409+
extraction:
410+
model: us.amazon.nova-pro-v1:0
411+
# Image processing settings
412+
image:
413+
target_width: 951 # Default width in pixels
414+
target_height: 1268 # Default height in pixels
415+
```
416+
417+
### Custom Image Dimensions
418+
419+
Configure image dimensions based on your extraction requirements:
420+
421+
```yaml
422+
# For high-accuracy extraction with detailed visual analysis
423+
extraction:
424+
image:
425+
target_width: 1200
426+
target_height: 1600
427+
428+
# For fast processing with standard resolution
429+
extraction:
430+
image:
431+
target_width: 800
432+
target_height: 1000
433+
```
434+
435+
### Image Resizing Features
436+
437+
- **Aspect Ratio Preservation**: Images are resized proportionally without distortion
438+
- **Smart Scaling**: Only downsizes images when necessary (scale factor < 1.0)
439+
- **High-Quality Resampling**: Better visual quality after resizing for improved field detection
440+
- **Performance Optimization**: Optimized images reduce processing time and memory usage
441+
442+
### Configuration Benefits for Extraction
443+
444+
- **Enhanced Field Detection**: Appropriate image resolution improves accuracy for table and form extraction
445+
- **Visual Element Processing**: Better handling of signatures, stamps, checkboxes, and visual indicators
446+
- **OCR Error Correction**: Higher quality images help verify and correct text extraction results
447+
- **Service-Specific Tuning**: Optimize image dimensions for different document types and extraction complexity
448+
- **Runtime Configuration**: Adjust image processing without code changes
449+
- **Resource Optimization**: Balance quality and performance based on extraction requirements
450+
402451
## Best Practices
403452

404453
1. **Clear Attribute Descriptions**: Include detail on where and how information appears in the document. More specific descriptions lead to better extraction results.
@@ -421,3 +470,7 @@ Examples are class-specific - only examples from the same document class being p
421470
8. **Handle Document Variations**: Consider creating separate document classes for significantly different layouts of the same document type rather than trying to handle all variations with a single class.
422471

423472
9. **Test Extraction Pipeline End-to-End**: Validate your extraction configuration with the full pipeline including OCR, classification, and extraction to ensure components work together effectively.
473+
474+
10. **Optimize Image Dimensions**: Configure image dimensions based on document complexity - use higher resolution for forms and tables, standard resolution for simple text documents.
475+
476+
11. **Balance Quality vs Performance**: Higher resolution images provide better extraction accuracy but consume more resources and processing time.

docs/pattern-2.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,10 @@ Each step includes comprehensive retry logic for handling transient errors:
5858
- **Purpose**: Processes input PDFs using Amazon Textract
5959
- **Key Features**:
6060
- Concurrent page processing with ThreadPoolExecutor
61+
- **Configurable Image Processing**: Enhanced image resizing with aspect-ratio preservation
62+
- **Configurable DPI**: Adjustable DPI for PDF-to-image conversion (default: 300)
63+
- **Dual Image Strategy**: Stores original high-DPI images while using resized images for OCR processing
64+
- **Smart Resizing**: Only downsizes images when necessary (scale factor < 1.0)
6165
- Image preprocessing and optimization
6266
- Comprehensive error handling and retries
6367
- Detailed metrics tracking
@@ -212,6 +216,73 @@ The pattern exports these outputs to the parent stack:
212216
- Configuration can be updated through the Web UI without stack redeployment
213217
- Model choices are constrained through enum constraints in the configuration schema
214218

219+
## OCR Configuration
220+
221+
The OCR service in Pattern 2 supports enhanced image processing capabilities for optimal text extraction:
222+
223+
### DPI Configuration
224+
225+
Configure DPI (Dots Per Inch) for PDF-to-image conversion:
226+
227+
```python
228+
# Example OCR service initialization with custom DPI
229+
ocr_service = OcrService(
230+
dpi=400, # Higher DPI for better quality (default: 300)
231+
resize_config={
232+
'target_width': 1200,
233+
'target_height': 1600
234+
}
235+
)
236+
```
237+
238+
### Image Resizing Configuration
239+
240+
The OCR service supports optional image resizing for processing optimization:
241+
242+
```yaml
243+
# OCR configuration example
244+
ocr:
245+
dpi: 300 # PDF-to-image conversion DPI
246+
resize_config:
247+
target_width: 951 # Target width for processing
248+
target_height: 1268 # Target height for processing
249+
```
250+
251+
### OCR Image Processing Features
252+
253+
- **Configurable DPI**: Higher DPI (400+) for better quality, standard DPI (300) for balanced performance
254+
- **Dual Image Strategy**:
255+
- Stores original high-DPI images in S3 for archival and downstream processing
256+
- Uses resized images for OCR processing to optimize performance
257+
- **Aspect Ratio Preservation**: Images are resized proportionally without distortion
258+
- **Smart Scaling**: Only downsizes images when necessary (scale factor < 1.0)
259+
- **Enhanced Logging**: Detailed logging for DPI and resize operations
260+
261+
### Configuration Benefits
262+
263+
- **Quality Control**: Higher DPI settings improve OCR accuracy for complex documents
264+
- **Performance Optimization**: Resized images reduce processing time and memory usage
265+
- **Storage Efficiency**: Dual strategy balances quality preservation with processing efficiency
266+
- **Flexibility**: Runtime configuration allows adjustment without code changes
267+
- **Backward Compatibility**: Default values maintain existing behavior
268+
269+
### Best Practices for OCR
270+
271+
1. **DPI Selection**:
272+
- Use 300 DPI for standard documents
273+
- Use 400+ DPI for documents with small text or complex layouts
274+
- Consider processing costs when using higher DPI settings
275+
276+
2. **Image Resizing**:
277+
- Enable resizing for large documents to improve processing speed
278+
- Maintain aspect ratios to preserve text readability
279+
- Test different dimensions based on document types
280+
281+
3. **Performance Tuning**:
282+
- Monitor processing times and adjust DPI/resize settings accordingly
283+
- Use concurrent processing for multi-page documents
284+
- Balance quality requirements with processing costs
285+
215286
## Customizing Classification
216287
217288
The pattern supports two different classification methods:
@@ -645,3 +716,11 @@ sam local invoke ExtractionFunction --env-vars testing/env.json -e testing/Extra
645716
- Include examples for all document classes you expect to process
646717
- Regularly review and update examples based on real-world performance
647718
- Test configurations with examples before production deployment
719+
720+
8. **Image Processing Optimization**:
721+
- Configure appropriate image dimensions for each service based on document complexity
722+
- Use higher DPI (400+) for OCR when processing documents with small text or complex layouts
723+
- Balance image quality with processing performance and costs
724+
- Test different image configurations with your specific document types
725+
- Monitor memory usage and processing times when adjusting image settings
726+
- Leverage the dual image strategy in OCR to preserve quality while optimizing processing

0 commit comments

Comments
 (0)