Skip to content

Commit d7784f5

Browse files
committed
Merge branch 'fix/pdf-resolution' into 'develop'
Fix/pdf resolution See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!228
2 parents af49113 + e18ef6f commit d7784f5

File tree

21 files changed

+351
-199
lines changed

21 files changed

+351
-199
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,11 @@ SPDX-License-Identifier: MIT-0
1414
- Fixed view toggle behavior - switching between views no longer closes the viewer window
1515
- Reordered view buttons to: Markdown View, Text Confidence View, Text View for better user experience
1616

17+
- **Enhanced OCR DPI Configuration for PDF files**
18+
- DPI for PDF image conversion is now configurable in the configuration editor under OCR image processing settings
19+
- Default DPI improved from 96 to 150 DPI for better default quality and OCR accuracy
20+
- Configurable through Web UI without requiring code changes or redeployment
21+
1722
### Changed
1823
- **Converted text confidence data format from JSON to markdown table for improved readability and reduced token usage**
1924
- Removed unnecessary "page_count" field
@@ -26,8 +31,13 @@ SPDX-License-Identifier: MIT-0
2631
- Aligned with classification service pattern for better consistency across IDP services
2732
- Backward compatibility maintained - old parameter pattern still supported with deprecation warning
2833
- Updated all lambda functions and notebooks to use new simplified pattern
34+
- Removed fixed image target_height and target_width from default configurations, so images are processed in original resolution by default.
35+
2936

3037
### Fixed
38+
- **Fixed Image Resizing Behavior for High-Resolution Documents**
39+
- Fixed issue where empty strings in image configuration were incorrectly resizing images to default 951x1268 pixels instead of preserving original resolution
40+
- Empty strings (`""`) in `target_width` and `target_height` configuration now preserve original document resolution for maximum processing accuracy
3141
- Fixed issue where PNG files were being unnecessarily converted to JPEG format and resized to lower resolution with lost quality
3242
- Fixed issue where PNG and JPG image files were not rendering inline in the Document Details page
3343
- Fixed issue where PDF files were being downloaded instead of displayed inline

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.3.8-wip4
1+
0.3.8-wip6

config_library/pattern-2/bank-statement-sample/config.yaml

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ ocr:
1010
features:
1111
- name: LAYOUT
1212
image:
13-
target_width: '951'
14-
target_height: '1268'
13+
dpi: '150'
14+
target_width: ''
15+
target_height: ''
1516
classes:
1617
- name: Bank Statement
1718
description: Monthly bank account statement
@@ -68,8 +69,8 @@ classes:
6869
attributeType: list
6970
classification:
7071
image:
71-
target_height: '1268'
72-
target_width: '951'
72+
target_height: ''
73+
target_width: ''
7374
top_p: '0.1'
7475
max_tokens: '4096'
7576
top_k: '5'
@@ -210,8 +211,8 @@ classification:
210211
classificationMethod: textbasedHolisticClassification
211212
extraction:
212213
image:
213-
target_height: '1268'
214-
target_width: '951'
214+
target_height: ''
215+
target_width: ''
215216
top_p: '0.1'
216217
max_tokens: '10000'
217218
top_k: '5'
@@ -368,8 +369,8 @@ summarization:
368369
You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
369370
assessment:
370371
image:
371-
target_height: '1268'
372-
target_width: '951'
372+
target_height: ''
373+
target_width: ''
373374
granular:
374375
enabled: true
375376
max_workers: "20"

config_library/pattern-2/default/config.yaml

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,9 @@ ocr:
1212
- name: TABLES
1313
- name: SIGNATURES
1414
image:
15-
target_width: '951'
16-
target_height: '1268'
15+
dpi: '150'
16+
target_width: ''
17+
target_height: ''
1718
classes:
1819
- name: letter
1920
description: A formal written correspondence with sender/recipient addresses, date, salutation, body, and closing signature
@@ -308,8 +309,8 @@ classes:
308309
description: Additional notes or remarks about the document. Look for sections labeled 'notes', 'remarks', or 'comments'.
309310
classification:
310311
image:
311-
target_height: '1268'
312-
target_width: '951'
312+
target_height: ''
313+
target_width: ''
313314
top_p: '0.1'
314315
max_tokens: '4096'
315316
top_k: '5'
@@ -450,8 +451,8 @@ classification:
450451
classificationMethod: textbasedHolisticClassification
451452
extraction:
452453
image:
453-
target_width: '951'
454-
target_height: '1268'
454+
target_width: ''
455+
target_height: ''
455456
top_p: '0.1'
456457
max_tokens: '10000'
457458
top_k: '5'
@@ -608,8 +609,8 @@ summarization:
608609
You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
609610
assessment:
610611
image:
611-
target_height: '1268'
612-
target_width: '951'
612+
target_height: ''
613+
target_width: ''
613614
granular:
614615
enabled: true
615616
max_workers: "20"

config_library/pattern-2/few_shot_example_with_multimodal_page_classification/config.yaml

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,9 @@ ocr:
1212
- name: TABLES
1313
- name: SIGNATURES
1414
image:
15-
target_width: '951'
16-
target_height: '1268'
15+
dpi: '150'
16+
target_width: ''
17+
target_height: ''
1718
classes:
1819
- name: letter
1920
description: >-
@@ -647,8 +648,8 @@ classes:
647648

648649
classification:
649650
image:
650-
target_height: '1268'
651-
target_width: '951'
651+
target_height: ''
652+
target_width: ''
652653
classificationMethod: multimodalPageLevelClassification
653654
model: us.amazon.nova-pro-v1:0
654655
temperature: '0.0'
@@ -710,8 +711,8 @@ classification:
710711
{{"class": "letter"}}
711712
extraction:
712713
image:
713-
target_height: '1268'
714-
target_width: '951'
714+
target_height: ''
715+
target_width: ''
715716
model: us.amazon.nova-pro-v1:0
716717
temperature: '0.0'
717718
top_p: '0.1'
@@ -895,8 +896,8 @@ pricing:
895896
price: '3.75E-6'
896897
assessment:
897898
image:
898-
target_height: '1268'
899-
target_width: '951'
899+
target_height: ''
900+
target_width: ''
900901
granular:
901902
enabled: true
902903
max_workers: "20"

config_library/pattern-3/default/config.yaml

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,9 @@ ocr:
1212
- name: TABLES
1313
- name: SIGNATURES
1414
image:
15-
target_width: '951'
16-
target_height: '1268'
15+
dpi: '150'
16+
target_width: ''
17+
target_height: ''
1718
classes:
1819
- name: letter
1920
description: A formal written correspondence with sender/recipient addresses, date, salutation, body, and closing signature
@@ -310,8 +311,8 @@ classification:
310311
model: Custom fine tuned UDOP model
311312
extraction:
312313
image:
313-
target_width: '951'
314-
target_height: '1268'
314+
target_width: ''
315+
target_height: ''
315316
top_p: '0.1'
316317
max_tokens: '10000'
317318
top_k: '5'
@@ -468,8 +469,8 @@ summarization:
468469
You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
469470
assessment:
470471
image:
471-
target_width: '951'
472-
target_height: '1268'
472+
target_width: ''
473+
target_height: ''
473474
granular:
474475
enabled: true
475476
max_workers: "20"

docs/assessment.md

Lines changed: 41 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -536,49 +536,76 @@ StateTaxes[0]:
536536
537537
The assessment service supports configurable image dimensions for optimal confidence evaluation:
538538
539-
### Default Configuration
539+
### New Default Behavior (Preserves Original Resolution)
540+
541+
**Important Change**: Empty strings or unspecified image dimensions now preserve the original document resolution for maximum assessment accuracy:
540542
541543
```yaml
542544
assessment:
543545
model: "anthropic.claude-3-5-sonnet-20241022-v2:0"
544-
# Image processing settings
546+
# Image processing settings - preserves original resolution
545547
image:
546-
target_width: 951 # Default width in pixels
547-
target_height: 1268 # Default height in pixels
548+
target_width: "" # Empty string = no resizing (recommended)
549+
target_height: "" # Empty string = no resizing (recommended)
548550
```
549551

550552
### Custom Image Dimensions
551553

552-
Configure image dimensions based on assessment requirements:
554+
Configure specific dimensions when performance optimization is needed:
553555

554556
```yaml
555-
# For detailed visual assessment
557+
# For detailed visual assessment with controlled dimensions
556558
assessment:
557559
image:
558-
target_width: 1200
559-
target_height: 1600
560+
target_width: "1200" # Resize to 1200 pixels wide
561+
target_height: "1600" # Resize to 1600 pixels tall
560562

561563
# For standard confidence evaluation
562564
assessment:
563565
image:
564-
target_width: 800
565-
target_height: 1000
566+
target_width: "800" # Smaller for faster processing
567+
target_height: "1000" # Maintains good quality
566568
```
567569
568570
### Image Resizing Features for Assessment
569571
570-
- **Aspect Ratio Preservation**: Images maintain proportions for accurate visual analysis
572+
- **Original Resolution Preservation**: Empty strings preserve full document resolution for maximum assessment accuracy
573+
- **Aspect Ratio Preservation**: Images maintain proportions for accurate visual analysis when dimensions are specified
571574
- **Smart Scaling**: Only downsizes when necessary to preserve visual detail
572575
- **High-Quality Resampling**: Better image quality for confidence assessment
573-
- **Performance Optimization**: Optimized images reduce assessment processing time
576+
- **Performance Optimization**: Configurable dimensions allow balancing accuracy vs. speed
574577
575578
### Configuration Benefits for Assessment
576579
577-
- **Enhanced Visual Analysis**: Appropriate resolution improves confidence evaluation accuracy
580+
- **Maximum Assessment Accuracy**: Empty strings preserve full document resolution for best confidence evaluation
581+
- **Enhanced Visual Analysis**: Original resolution improves confidence evaluation accuracy
578582
- **Better OCR Verification**: Higher quality images help verify extraction results against visual content
579583
- **Improved Confidence Scoring**: Better image quality leads to more accurate confidence assessments
580584
- **Service-Specific Tuning**: Optimize image dimensions for different assessment complexity levels
581-
- **Resource Optimization**: Balance assessment quality and processing costs
585+
- **Resource Optimization**: Choose between accuracy (original resolution) and performance (smaller dimensions)
586+
587+
### Migration from Previous Versions
588+
589+
**Previous Behavior**: Empty strings defaulted to 951x1268 pixel resizing
590+
**New Behavior**: Empty strings preserve original image resolution
591+
592+
If you were relying on the previous default resizing behavior, explicitly set dimensions:
593+
594+
```yaml
595+
# To maintain previous default behavior
596+
assessment:
597+
image:
598+
target_width: "951"
599+
target_height: "1268"
600+
```
601+
602+
### Best Practices for Assessment
603+
604+
1. **Use Empty Strings for High Accuracy**: For critical confidence assessment, use empty strings to preserve original resolution
605+
2. **Consider Assessment Complexity**: Complex documents with fine details benefit from higher resolution
606+
3. **Test Assessment Quality**: Evaluate confidence assessment accuracy with your specific document types
607+
4. **Monitor Resource Usage**: Higher resolution images consume more memory and processing time
608+
5. **Balance Accuracy vs Performance**: Choose appropriate settings based on your assessment requirements and processing volume
582609
583610
## Granular Assessment
584611

docs/classification.md

Lines changed: 41 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -341,49 +341,75 @@ For comprehensive details on configuring few-shot examples, including multimodal
341341

342342
The classification service supports configurable image dimensions for optimal performance and quality:
343343

344-
### Default Configuration
344+
### New Default Behavior (Preserves Original Resolution)
345+
346+
**Important Change**: Empty strings or unspecified image dimensions now preserve the original document resolution for maximum classification accuracy:
345347

346348
```yaml
347349
classification:
348350
model: us.amazon.nova-pro-v1:0
349-
# Image processing settings
351+
# Image processing settings - preserves original resolution
350352
image:
351-
target_width: 951 # Default width in pixels
352-
target_height: 1268 # Default height in pixels
353+
target_width: "" # Empty string = no resizing (recommended)
354+
target_height: "" # Empty string = no resizing (recommended)
353355
```
354356

355357
### Custom Image Dimensions
356358

357-
Configure image dimensions based on your specific requirements:
359+
Configure specific dimensions when performance optimization is needed:
358360

359361
```yaml
360-
# For high-accuracy classification
362+
# For high-accuracy classification with controlled dimensions
361363
classification:
362364
image:
363-
target_width: 1200
364-
target_height: 1600
365+
target_width: "1200" # Resize to 1200 pixels wide
366+
target_height: "1600" # Resize to 1600 pixels tall
365367
366368
# For fast processing with lower resolution
367369
classification:
368370
image:
369-
target_width: 600
370-
target_height: 800
371+
target_width: "600" # Smaller for faster processing
372+
target_height: "800" # Maintains reasonable quality
371373
```
372374

373375
### Image Resizing Features
374376

375-
- **Aspect Ratio Preservation**: Images are resized proportionally without distortion
377+
- **Original Resolution Preservation**: Empty strings preserve full document resolution for maximum accuracy
378+
- **Aspect Ratio Preservation**: Images are resized proportionally without distortion when dimensions are specified
376379
- **Smart Scaling**: Only downsizes images when necessary (scale factor < 1.0)
377380
- **High-Quality Resampling**: Better visual quality after resizing
378-
- **Performance Optimization**: Smaller, optimized images process faster with lower memory usage
381+
- **Performance Optimization**: Configurable dimensions allow balancing accuracy vs. speed
379382

380383
### Configuration Benefits
381384

385+
- **Maximum Classification Accuracy**: Empty strings preserve full document resolution for best results
382386
- **Service-Specific Tuning**: Each service can use optimal image dimensions
383387
- **Runtime Configuration**: No code changes needed to adjust image processing
384-
- **Backward Compatibility**: Default values maintain existing behavior
385-
- **Memory Optimization**: Configurable dimensions allow memory optimization
386-
- **Better Resource Utilization**: Service-specific sizing reduces unnecessary processing
388+
- **Backward Compatibility**: Existing numeric values continue to work as before
389+
- **Memory Optimization**: Configurable dimensions allow resource optimization
390+
- **Better Resource Utilization**: Choose between accuracy (original resolution) and performance (smaller dimensions)
391+
392+
### Migration from Previous Versions
393+
394+
**Previous Behavior**: Empty strings defaulted to 951x1268 pixel resizing
395+
**New Behavior**: Empty strings preserve original image resolution
396+
397+
If you were relying on the previous default resizing behavior, explicitly set dimensions:
398+
399+
```yaml
400+
# To maintain previous default behavior
401+
classification:
402+
image:
403+
target_width: "951"
404+
target_height: "1268"
405+
```
406+
407+
### Best Practices for Classification
408+
409+
1. **Use Empty Strings for High Accuracy**: For critical document classification, use empty strings to preserve original resolution
410+
2. **Consider Document Types**: Complex layouts benefit from higher resolution, simple text documents may work well with smaller dimensions
411+
3. **Test Performance Impact**: Higher resolution images provide better accuracy but consume more resources
412+
4. **Monitor Processing Time**: Balance classification accuracy with processing speed based on your requirements
387413

388414
## JSON and YAML Output Support
389415

0 commit comments

Comments
 (0)