Skip to content

Commit 9302d51

Browse files
authored
Merge branch 'aws-solutions-library-samples:main' into main
2 parents 31303a9 + 9379cf9 commit 9302d51

File tree

98 files changed

+6511
-3542
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

98 files changed

+6511
-3542
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,4 @@ __pycache__
1919
rvl_cdip_*
2020
notebooks/examples/data
2121
.idea/
22+
.dsr/

CHANGELOG.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,83 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
### Added
9+
10+
11+
12+
## [0.3.12]
13+
14+
### Added
15+
16+
- **Custom Prompt Generator Lambda Support for Patterns 2 & 3**
17+
- Added `custom_prompt_lambda_arn` configuration field to enable injection of custom business logic into extraction processing
18+
- **Key Features**: Lambda interface with all template placeholders (DOCUMENT_TEXT, DOCUMENT_CLASS, ATTRIBUTE_NAMES_AND_DESCRIPTIONS, DOCUMENT_IMAGE), URI-based image handling for JSON serialization, comprehensive error handling with fail-fast behavior, scoped IAM permissions requiring GENAIIDP-* function naming
19+
- **Use Cases**: Document type-specific processing rules, integration with external systems for customer configurations, conditional processing based on document content, regulatory compliance and industry-specific requirements
20+
- **Demo Resources**: Interactive notebook demonstration (`step3_extraction_with_custom_lambda.ipynb`), SAM deployment template for demo Lambda function, comprehensive documentation and examples in `notebooks/examples/demo-lambda/`
21+
- **Benefits**: Custom business logic without core code changes, backward compatible (existing deployments unchanged), robust JSON serialization handling all object types, complete observability with detailed logging
22+
23+
- **Refactored Document Classification Service for Enhanced Boundary Detection**
24+
- Consolidated `multimodalPageLevelClassification` and the experimental `multimodalPageBoundaryClassification` (from v0.3.11) into a single enhanced `multimodalPageLevelClassification` method
25+
- Implemented BIO-like sequence segmentation with document boundary indicators: "start" (new document) and "continue" (same document)
26+
- Automatically segments multi-document packets, even when they contain multiple documents of the same type
27+
- Added comprehensive classification guide with method comparisons and best practices
28+
- **Benefits**: Simplified codebase with single multimodal classification method, improved handling of complex document packets, maintains backward compatibility
29+
- **No Breaking Changes**: Existing configurations work unchanged, no configuration updates required
30+
31+
- **Enhanced A2I Template and Workflow Management**
32+
- Enhanced A2I template with improved user interface and clearer instructions for reviewers
33+
- Added comprehensive instructions for reviewers in A2I template to guide the review process
34+
- Implemented capture of failed review tasks with proper error handling and logging
35+
- Added workflow orchestration control to stop processing when reviewer rejects A2I task
36+
- Removed automatic A2I task creation when Pattern-1 Bedrock Data Automation (BDA) fails to classify document to appropriate Blueprint
37+
38+
- **Dynamic Cost Calculation for Metering Data**
39+
- Added automated unit cost and estimated cost calculation to metering table with new `unit_cost` and `estimated_cost` columns
40+
- Dynamic pricing configuration loading from configuration
41+
- Enhanced cost analysis capabilities with comprehensive Athena queries for cost tracking, trend analysis, and efficiency metrics
42+
- Automatic cost calculation as `estimated_cost = value × unit_cost` for all metering records
43+
44+
- **Configuration-Based Summarization Control**
45+
- Summarization can now be enabled/disabled via configuration file `summarization.enabled` property instead of CloudFormation stack parameter
46+
- **Key Benefits**: Runtime control without stack redeployment, zero LLM costs when disabled, simplified state machine architecture, backward compatible defaults
47+
- **Implementation**: Always calls SummarizationStep but service skips processing when `enabled: false`
48+
- **Cost Optimization**: When disabled, no LLM API calls or S3 operations are performed
49+
- **Configuration Example**: Set `summarization.enabled: false` to disable, `enabled: true` to enable (default)
50+
51+
- **Configuration-Based Assessment Control**
52+
- Assessment can now be enabled/disabled via configuration file `assessment.enabled` property instead of CloudFormation stack parameter
53+
- **Key Benefits**: Runtime control without stack redeployment, zero LLM costs when disabled, simplified state machine architecture, backward compatible defaults
54+
- **Implementation**: Always calls AssessmentStep but service skips processing when `enabled: false`
55+
- **Cost Optimization**: When disabled, no LLM API calls or S3 operations are performed
56+
- **Configuration Example**: Set `assessment.enabled: false` to disable, `enabled: true` to enable (default)
57+
58+
- **New guides for setting up development environments**
59+
- EC2-based Linux development environment
60+
- MacOS development environment
61+
62+
### Removed
63+
- **CloudFormation Parameters**: Removed `IsSummarizationEnabled` and `IsAssessmentEnabled` parameters from all pattern templates
64+
- **Related Conditions**: Removed parameter conditions and state machine definition substitutions for both features
65+
- **Conditional Logic**: Eliminated complex conditional logic from state machine definitions for summarization and assessment steps
66+
67+
### ⚠️ Breaking Changes
68+
- **Configuration Migration Required**: When updating a stack that previously had `IsSummarizationEnabled` or `IsAssessmentEnabled` set to `false`, these features will now default to `enabled: true` after the update. To maintain the disabled behavior:
69+
1. Update your configuration file to set `summarization.enabled: false` and/or `assessment.enabled: false` as needed
70+
2. Save the configuration changes immediately after the stack update
71+
3. This ensures continued cost optimization by preventing unexpected LLM API calls
72+
- **Action Required**: Review your current CloudFormation parameter settings before updating and update your configuration accordingly to preserve existing behavior
73+
74+
### Changed
75+
- **Updated Python Lambda Runtime to 3.13**
76+
77+
### Fixed
78+
- **Fixed B615 "Unsafe Hugging Face Hub download without revision pinning" security finding in Pattern-3 fine-tuning module** - Added revision pinning with to prevent supply chain attacks and ensure reproducible deployments
79+
- **Fixed CloudWatch Log Group Missing Retention regression**
80+
- **Security: Cross-Site Scripting (XSS) Vulnerability in FileViewer Component** - Fixed high-risk XSS vulnerability in `src/ui/src/components/document-viewer/FileViewer.jsx` where `innerHTML` was used with user-controlled data
81+
- **Add permissions boundary support to new Lambda function roles introduced in previous releases**
82+
- **Fixed OutOfMemory Errors in Pattern-2 OCR Lambda for Large High-Resolution Documents**
83+
- **Root Cause**: Processing large PDFs with high-resolution images (7469×9623 pixels) caused memory spikes when 20 concurrent workers each held ~101MB images simultaneously, exceeding the 4GB Lambda memory limit
84+
- **Optimal Solution**: Refactored image extraction to render directly at target dimensions using PyMuPDF matrix transformations, completely eliminating oversized image creation
885

986
## [0.3.11]
1087

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ White-glove customization, deployment, and integration support for production us
3333
- **Modular, pluggable patterns**: Pre-built processing patterns using state-of-the-art models and AWS services
3434
- **Advanced Classification**: Support for page-level and holistic document packet classification
3535
- **Few Shot Example Support**: Improve accuracy through example-based prompting
36+
- **Custom Business Logic Integration**: Inject custom prompt generation logic via Lambda functions for specialized document processing
3637
- **High Throughput Processing**: Handles large volumes of documents through intelligent queuing
3738
- **Built-in Resilience**: Comprehensive error handling, retries, and throttling management
3839
- **Cost Optimization**: Pay-per-use pricing model with built-in controls

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.3.11
1+
0.3.12

config_library/pattern-1/lending-package-sample/config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ notes: Processing configuration in BDA project.
55
assessment:
66
default_confidence_threshold: '0.8'
77
summarization:
8+
enabled: true
89
top_p: '0.1'
910
max_tokens: '4096'
1011
top_k: '5'

config_library/pattern-2/bank-statement-sample/config.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,7 @@ extraction:
307307
system_prompt: >-
308308
You are a document assistant. Respond only with JSON. Never make up data, only provide data found in the document being provided.
309309
summarization:
310+
enabled: true
310311
top_p: '0.1'
311312
max_tokens: '4096'
312313
top_k: '5'
@@ -368,6 +369,7 @@ summarization:
368369
system_prompt: >-
369370
You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
370371
assessment:
372+
enabled: true
371373
image:
372374
target_height: ''
373375
target_width: ''

config_library/pattern-2/lending-package-sample/config.yaml

Lines changed: 28 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
# SPDX-License-Identifier: MIT-0
2+
3+
notes: Boundary-aware classification example for pattern-2
4+
5+
6+
17
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
28
# SPDX-License-Identifier: MIT-0
39

@@ -914,15 +920,26 @@ classes:
914920
evaluation_method: LLM
915921
attributeType: group
916922
classification:
923+
classificationMethod: multimodalPageLevelClassification
917924
image:
918925
target_height: ''
919926
target_width: ''
927+
model: us.amazon.nova-pro-v1:0
928+
temperature: '0.0'
920929
top_p: '0.1'
921930
max_tokens: '4096'
922931
top_k: '5'
932+
system_prompt: >-
933+
You are a multimodal document classification expert that analyzes business documents using both visual layout and textual content. Your task is to classify single-page documents into predefined categories based on their structural patterns, visual features, and text content. Your output must be valid JSON according to the requested format.
934+
935+
<variables>
936+
<document-ocr-data>: OCR-extracted text content from the document page that provides textual information for classification
937+
<document-image>: Visual representation of the document page that provides layout, formatting, and visual structure information
938+
<document-types>: List of valid document types with their descriptions that the document must be classified into
939+
</variables>
923940
task_prompt: >-
924941
<task-description>
925-
Analyze the provided document using both its visual layout and textual content to determine its document type. You must classify it into exactly one of the predefined categories.
942+
Analyze the provided document using both its visual layout and textual content to determine its document type and whether this page begins a new document or continues the previous one.
926943
</task-description>
927944
928945
<document-types>
@@ -934,24 +951,16 @@ classification:
934951
1. Examine the visual layout: headers, logos, formatting, structure, and visual organization
935952
2. Analyze the textual content: key phrases, terminology, purpose, and information type
936953
3. Identify distinctive features that match the document type descriptions
937-
4. Consider both visual and textual evidence together to determine the best match
938-
5. CRITICAL: Only use document types explicitly listed in the <document-types> section
954+
4. Decide if this page starts a new document (output "start") or continues the previous document (output "continue")
955+
5. Consider both visual and textual evidence together to determine the best match
956+
6. CRITICAL: Only use document types explicitly listed in the <document-types> section
939957
</classification-instructions>
940958
941-
<reasoning-guidelines>
942-
When determining the document type:
943-
- First identify the document's primary purpose and function
944-
- Note specific visual elements (letterhead, forms, tables, signatures)
945-
- Identify key textual indicators (terminology, phrases, structure)
946-
- Consider the document's intended audience and use case
947-
- Provide specific evidence from both visual and textual analysis
948-
</reasoning-guidelines>
949-
950959
<output-format>
951-
Return your classification as valid JSON following this exact structure:
952960
{
953961
"classification_reason": "Detailed reasoning including specific visual and textual evidence that led to this classification",
954-
"class": "exact_document_type_from_list"
962+
"class": "exact_document_type_from_list",
963+
"document_boundary": "start or continue"
955964
}
956965
</output-format>
957966
@@ -968,22 +977,10 @@ classification:
968977
<final-instructions>
969978
Analyze the document above by:
970979
1. Applying the <classification-instructions> to examine both visual and textual features
971-
2. Following the <reasoning-guidelines> to build your classification rationale
972-
3. Selecting ONLY from document types in <document-types>
973-
4. Providing clear reasoning with specific evidence before the classification
974-
5. Outputting in the exact JSON format specified in <output-format>
980+
2. Selecting ONLY from document types in <document-types>
981+
3. Providing clear reasoning with specific evidence
982+
4. Outputting in the exact JSON format specified in <output-format>
975983
</final-instructions>
976-
temperature: '0.0'
977-
model: us.amazon.nova-pro-v1:0
978-
system_prompt: >-
979-
You are a multimodal document classification expert that analyzes business documents using both visual layout and textual content. Your task is to classify single-page documents into predefined categories based on their structural patterns, visual features, and text content. Your output must be valid JSON according to the requested format.
980-
981-
<variables>
982-
DOCUMENT_TEXT: OCR-extracted text content from the document page that provides textual information for classification
983-
DOCUMENT_IMAGE: Visual representation of the document page that provides layout, formatting, and visual structure information
984-
CLASS_NAMES_AND_DESCRIPTIONS: List of valid document types with their descriptions that the document must be classified into
985-
</variables>
986-
classificationMethod: multimodalPageLevelClassification
987984
extraction:
988985
image:
989986
target_width: ''
@@ -1082,6 +1079,7 @@ extraction:
10821079
system_prompt: >-
10831080
You are a document assistant. Respond only with JSON. Never make up data, only provide data found in the document being provided.
10841081
summarization:
1082+
enabled: true
10851083
top_p: '0.1'
10861084
max_tokens: '4096'
10871085
top_k: '5'
@@ -1143,6 +1141,7 @@ summarization:
11431141
system_prompt: >-
11441142
You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
11451143
assessment:
1144+
enabled: true
11461145
image:
11471146
target_height: ''
11481147
target_width: ''

0 commit comments

Comments
 (0)