Release v0.3.4 · aws-solutions-library-samples/accelerated-intelligent-document-processing-on-aws

[0.3.4]

Added

Configurable Image Processing and Enhanced Resizing Logic
- Improved Image Resizing Algorithm: Enhanced aspect-ratio preserving scaling that only downsizes when necessary (scale factor < 1.0) to prevent image distortion
- Configurable Image Dimensions: All processing services (Assessment, Classification, Extraction, OCR) now support configurable image dimensions through configuration with default 951×1268 resolution
- Service-Specific Image Optimization: Each service can use optimal image dimensions for performance and quality tuning
- Enhanced OCR Service: Added configurable DPI for PDF-to-image conversion (default: 300) and optional image resizing with dual image strategy (stores original high-DPI images while using resized images for processing)
- Runtime Configuration: No code changes needed to adjust image processing - all configurable through service configuration
- Backward Compatibility: Default values maintain existing behavior with no immediate action required for existing deployments
Enhanced Configuration Management
- Save as Default: New button to save current configuration as the new default baseline with confirmation modal and version upgrade warnings
- Export Configuration: Export current configuration to local files in JSON or YAML format with customizable filename
- Import Configuration: Import configuration from local JSON or YAML files with automatic format detection and validation
- Enhanced Lambda resolver with deep merge functionality for proper default configuration updates
- Automatic custom configuration reset when saving as default to maintain clean state
Nested Attribute Groups and Lists Support
- Enhanced document configuration schema to support complex nested attribute structures with three attribute types:
  - Simple attributes: Single-value extractions (existing behavior)
  - Group attributes: Nested object structures with sub-attributes (e.g., address with street, city, state)
  - List attributes: Arrays with item templates containing multiple attributes per item (e.g., transactions with date, amount, description)
- Web UI Enhancements: Configuration editor now supports viewing and editing nested attribute structures with proper validation
- Extraction Service Updates: Enhanced {ATTRIBUTE_NAMES_AND_DESCRIPTIONS} placeholder processing to generate formatted prompts for nested structures
- Assessment Service Enhancements: Added support for nested structure confidence evaluation with recursive processing of group and list attributes, including proper confidence threshold application from configuration
- Evaluation Service Improvements:
  - Implemented pattern matching for list attributes (e.g., Transactions[].Date maps to Transactions[0].Date, Transactions[1].Date)
  - Added data flattening for complex extraction results using dot notation and array indices
  - Fixed numerical sorting for list items (now sorts 0, 1, 2, ..., 10, 11 instead of alphabetically)
  - Individual evaluation methods applied per nested attribute (EXACT, FUZZY, SEMANTIC, etc.)
- Documentation: Comprehensive updates to evaluation docs and README files with nested structure examples and processing explanations
- Use Cases: Enables complex document processing for bank statements (account details + transactions), invoices (vendor info + line items), and medical records (patient info + procedures)
Enhanced Documentation and Examples
- New example notebooks with improved clarity, modularity, and documentation
Evaluation Framework Enhancements
- Added confidence threshold to evaluation outputs to enable prioritizing accuracy results for attributes with higher confidence thresholds
Comprehensive Metering Data Collection
- The system now captures and stores detailed metering data for analytics, including:
  - Which services were used (Textract, Bedrock, etc.)
  - What operations were performed (analyze_document, Claude, etc.)
  - How many resources were consumed (pages, tokens, etc.)
Reporting Database Documentation
- Added comprehensive reporting database documentation

Changed

Pin packages to tested versions to avoid vulnerability from incompatible new package versions.
Updated reporting data to use document's queued_time for consistent timestamps
Create new extensible SaveReportingData class in idp_common package for saving evaluation results to Parquet format
Remove save_to_reporting from evaluation_function and replace with Lambda invocation, for smaller Lambda packages and better modularity.
Harden publish process and avoid package version bloat by purging previous build artifacts before re-building

Fixed

Defend against non-numeric confidence_threshold values in the configuration - avoid float conversion or numeric comparison exceptions in Assessement step
Prevent creation of empty configuration fields in UI
Firefox browser issues with signed URLs (PR #14)
Improved S3 Partition Key Format for Better Date Range Filtering:
- Updated reporting data partition keys to use YYYY-MM format for month and YYYY-MM-DD format for day
- Enables easier date range filtering in analytics queries across different months and years
- Partition structure now: year=2024/month=2024-03/day=2024-03-15/ instead of year=2024/month=03/day=15/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.3.4

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

[0.3.4]

Added

Changed

Fixed

Uh oh!