|
| 1 | +# OCR Image Sizing Best Practices Guide |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The OCR service automatically applies sensible default image size limits to optimize the balance between OCR accuracy and resource consumption. This guide explains the sizing strategy and how to customize it for your specific use cases. |
| 6 | + |
| 7 | +## Default Behavior (NEW) |
| 8 | + |
| 9 | +### Automatic Optimization |
| 10 | +- **Default limits**: 951x1268 pixels when no image sizing is configured |
| 11 | +- **Why defaults matter**: Prevents excessive token consumption, memory issues, and processing delays |
| 12 | +- **Backward compatibility**: Existing explicit configurations continue to work unchanged |
| 13 | + |
| 14 | +### When Defaults Are Applied |
| 15 | +```yaml |
| 16 | +ocr: |
| 17 | + image: |
| 18 | + # No target_width or target_height specified |
| 19 | + # → Automatic 951x1268 limits applied |
| 20 | + dpi: 150 |
| 21 | +``` |
| 22 | +
|
| 23 | +### When Defaults Are NOT Applied |
| 24 | +```yaml |
| 25 | +ocr: |
| 26 | + image: |
| 27 | + # Explicit configuration provided |
| 28 | + target_width: 1200 |
| 29 | + target_height: 900 |
| 30 | + # → Your explicit values used |
| 31 | +``` |
| 32 | + |
| 33 | +## Sizing Recommendations by Use Case |
| 34 | + |
| 35 | +| Use Case | Dimensions | Token Usage/Page | Best For | Configuration | |
| 36 | +|----------|------------|------------------|----------|---------------| |
| 37 | +| **High Accuracy** | 1600×1200 | 500-800 | Forms, tables, handwriting | `target_width: 1600`<br/>`target_height: 1200` | |
| 38 | +| **Standard Documents** | 1200×900 | 300-500 | Printed text, simple layouts | `target_width: 1200`<br/>`target_height: 900` | |
| 39 | +| **Token-Conscious** | 800×600 | 150-300 | Basic text extraction | `target_width: 800`<br/>`target_height: 600` | |
| 40 | +| **Minimal Processing** | 600×450 | 100-200 | Speed over accuracy | `target_width: 600`<br/>`target_height: 450` | |
| 41 | +| **No Limits** | Original | 1000-4000+ | When quality is critical | `target_width: ""`<br/>`target_height: ""` | |
| 42 | + |
| 43 | +## Cost Impact Analysis |
| 44 | + |
| 45 | +### Before Default Sizing |
| 46 | +- **Typical page**: 1000-4000+ tokens |
| 47 | +- **10-page document**: 40,000+ tokens |
| 48 | +- **Monthly cost impact**: Can be substantial for high-volume processing |
| 49 | + |
| 50 | +### With Default Sizing (951×1268) |
| 51 | +- **Typical page**: 400-600 tokens |
| 52 | +- **10-page document**: ~5,000 tokens |
| 53 | +- **Cost reduction**: 60-85% on vision model costs |
| 54 | + |
| 55 | +### Resource Benefits |
| 56 | +- **Memory usage**: Reduced OutOfMemory errors during concurrent processing |
| 57 | +- **Processing speed**: Faster uploads, downloads, and processing |
| 58 | +- **Network efficiency**: Lower bandwidth consumption |
| 59 | + |
| 60 | +## Configuration Examples |
| 61 | + |
| 62 | +### Use Automatic Defaults (Recommended) |
| 63 | +```yaml |
| 64 | +ocr: |
| 65 | + image: |
| 66 | + dpi: 150 |
| 67 | + # No sizing specified = automatic 951×1268 defaults applied |
| 68 | +``` |
| 69 | + |
| 70 | +### High-Volume Text Processing |
| 71 | +```yaml |
| 72 | +ocr: |
| 73 | + image: |
| 74 | + dpi: 150 |
| 75 | + target_width: 1200 |
| 76 | + target_height: 900 |
| 77 | + # Balances quality with token efficiency |
| 78 | +``` |
| 79 | + |
| 80 | +### Forms and Complex Documents |
| 81 | +```yaml |
| 82 | +ocr: |
| 83 | + image: |
| 84 | + dpi: 150 |
| 85 | + target_width: 1600 |
| 86 | + target_height: 1200 |
| 87 | + # Maximum recommended size for accuracy |
| 88 | +``` |
| 89 | + |
| 90 | +### Token-Optimized Processing |
| 91 | +```yaml |
| 92 | +ocr: |
| 93 | + image: |
| 94 | + dpi: 150 |
| 95 | + target_width: 800 |
| 96 | + target_height: 600 |
| 97 | + # Minimizes token usage while maintaining readability |
| 98 | +``` |
| 99 | + |
| 100 | +### Working with Configuration Systems |
| 101 | +```yaml |
| 102 | +# Empty strings are treated the same as no configuration |
| 103 | +# This handles configuration systems that return empty strings for unset values |
| 104 | +ocr: |
| 105 | + image: |
| 106 | + dpi: 150 |
| 107 | + target_width: "" |
| 108 | + target_height: "" |
| 109 | + # → Automatic 951x1268 defaults applied (same as if not specified) |
| 110 | +``` |
| 111 | + |
| 112 | +### Partial Configuration (Disables Defaults) |
| 113 | +```yaml |
| 114 | +ocr: |
| 115 | + image: |
| 116 | + dpi: 150 |
| 117 | + target_width: 800 |
| 118 | + # target_height missing - disables automatic defaults |
| 119 | + # → No size limits applied (preserves existing behavior) |
| 120 | +``` |
| 121 | + |
| 122 | +## Migration Guide |
| 123 | + |
| 124 | +### For Existing Deployments |
| 125 | +1. **No action required**: Existing configurations continue to work |
| 126 | +2. **Opt into defaults**: Remove `target_width` and `target_height` from config |
| 127 | +3. **Monitor improvements**: Track token usage and processing performance |
| 128 | + |
| 129 | +### Performance Monitoring |
| 130 | +- Monitor token consumption in LLM processing stages |
| 131 | +- Watch for memory usage improvements during concurrent processing |
| 132 | +- Track overall document processing times |
| 133 | + |
| 134 | +## Troubleshooting |
| 135 | + |
| 136 | +### OCR Quality Issues |
| 137 | +- **Text unclear**: Increase image dimensions or check source document quality |
| 138 | +- **Tables misaligned**: Try 1600×1200 or higher for complex layouts |
| 139 | +- **Handwriting errors**: Use maximum recommended sizing (1600×1200) |
| 140 | + |
| 141 | +### Performance Issues |
| 142 | +- **Memory errors**: Ensure sizing limits are applied (not disabled) |
| 143 | +- **Slow processing**: Reduce image dimensions if quality permits |
| 144 | +- **High costs**: Monitor and optimize based on use case requirements |
| 145 | + |
| 146 | +## Best Practices Summary |
| 147 | + |
| 148 | +1. **Start with defaults**: Let automatic sizing optimize your resource usage |
| 149 | +2. **Measure and adjust**: Monitor token usage and accuracy for your specific documents |
| 150 | +3. **Use case specific**: Different document types may benefit from different sizing |
| 151 | +4. **Test thoroughly**: Validate OCR accuracy with your specific document samples |
| 152 | +5. **Monitor costs**: Track token consumption impact of sizing decisions |
| 153 | + |
| 154 | +## Technical Details |
| 155 | + |
| 156 | +### How Defaults Work |
| 157 | +- Applied when both `target_width` and `target_height` are unspecified or `None` |
| 158 | +- Fallback to defaults when invalid values are provided |
| 159 | +- Partial configurations (only width OR height) disable defaults to preserve existing behavior |
| 160 | + |
| 161 | +### Memory Optimization |
| 162 | +- Images are extracted at target size to prevent memory issues |
| 163 | +- Concurrent processing optimized to avoid OutOfMemory errors |
| 164 | +- Aggressive memory cleanup after each page processing |
| 165 | + |
| 166 | +### Aspect Ratio Preservation |
| 167 | +- All resizing preserves original aspect ratio |
| 168 | +- Never upscales images (quality would not improve) |
| 169 | +- Uses intelligent scaling to fit within target dimensions |
| 170 | + |
| 171 | +## Logging and Monitoring |
| 172 | + |
| 173 | +### Configuration Visibility |
| 174 | +``` |
| 175 | +INFO OCR Service initialized - DPI: 150, Image sizing: 1600x1200 |
| 176 | +``` |
| 177 | + |
| 178 | +### Default Application |
| 179 | +``` |
| 180 | +INFO No image sizing configured, applying default limits: 1600x1200 to optimize resource usage and token consumption |
| 181 | +``` |
| 182 | + |
| 183 | +### Explicit Configuration |
| 184 | +``` |
| 185 | +INFO Using configured image sizing: 800x600 |
| 186 | +``` |
| 187 | + |
| 188 | +### Error Handling |
| 189 | +``` |
| 190 | +WARNING Invalid resize configuration values: width=abc, height=xyz. Falling back to defaults: 1600x1200 |
| 191 | +``` |
| 192 | + |
| 193 | +This comprehensive logging helps you understand exactly what sizing strategy is being applied and troubleshoot any configuration issues. |
0 commit comments