Skip to content

Commit aa25c3b

Browse files
committed
Merge remote changes with CloudFormation service role additions Merge remote changes with CloudFormation service role additions Merge remote changes with CloudFormation service role additions# the commit.
2 parents 42957d7 + bfe74b5 commit aa25c3b

File tree

198 files changed

+26724
-4770
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

198 files changed

+26724
-4770
lines changed

.gitattributes

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
* text=auto eol=lf
2+
*.py text eol=lf
3+
*.sh text eol=lf
4+
*.yaml text eol=lf
5+
*.yml text eol=lf

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ build.toml
44
model.tar.gz
55
.checksum
66
.checksums/
7+
.build_checksum
8+
.lib_checksum
79
.vscode/
810
.DS_Store
911
dist/
@@ -20,3 +22,4 @@ rvl_cdip_*
2022
notebooks/examples/data
2123
.idea/
2224
.dsr/
25+
*tmp-dev-assets*

.gitlab-ci.yml

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ developer_tests:
2828
- apt-get update -y
2929
- apt-get install make -y
3030
- pip install ruff
31+
# Install dependencies needed by publish.py for test imports
32+
- pip install typer rich boto3
3133
# Install test dependencies
3234
- cd lib/idp_common_pkg && pip install -e ".[test]" && cd ../..
3335

@@ -54,14 +56,15 @@ integration_tests:
5456
# AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION}
5557
# IDP_ACCOUNT_ID: ${IDP_ACCOUNT_ID}
5658

57-
# Add rules to only run on develop branch
59+
# Add rules to only run on develop branch
60+
# Add rules to only run on develop branch
5861
rules:
5962
- if: $CI_COMMIT_BRANCH == "develop"
60-
when: manual # always # When idp-accelerator CICD is reconfigured
63+
when: always # always # When idp-accelerator CICD is reconfigured
6164
- if: $CI_COMMIT_BRANCH =~ /^feature\/.*/
62-
when: manual
65+
when: always
6366
- if: $CI_COMMIT_BRANCH =~ /^fix\/.*/
64-
when: manual
67+
when: always
6568
- if: $CI_COMMIT_BRANCH =~ /^hotfix\/.*/
6669
when: manual
6770
- if: $CI_COMMIT_BRANCH =~ /^release\/.*/

CHANGELOG.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,132 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
## [0.3.15]
9+
10+
### Added
11+
12+
- **Intelligent Document Discovery Module for Automated Configuration Generation**
13+
- Added Discovery module that automatically analyzes document samples to identify structure, field types, and organizational patterns
14+
- **Pattern-Neutral Design**: Works across all processing patterns (1, 2, 3) with unified discovery process and pattern-specific implementations
15+
- **Dual Discovery Methods**: Discovery without ground truth (exploratory analysis) and with ground truth (optimization using labeled data)
16+
- **Automated Blueprint Creation**: Pattern 1 includes zero-touch BDA blueprint generation with intelligent change detection and version management
17+
- **Web UI Integration**: Real-time discovery job monitoring, interactive results review, and seamless configuration integration
18+
- **Advanced Features**: Multi-model support (Nova, Claude), customizable prompts, configurable parameters, ground truth processing, schema conversion, and lifecycle management
19+
- **Key Benefits**: Rapid new document type onboarding, reduced time-to-production, configuration optimization, and automated workflow bootstrapping
20+
- **Use Cases**: New document exploration, configuration improvement, rapid prototyping, and document understanding
21+
- **Documentation**: Guide in `docs/discovery.md` with architecture details, best practices, and troubleshooting
22+
23+
- **Optional Pattern-2 Regex-Based Classification for Enhanced Performance**
24+
- Added support for optional regex patterns in document class definitions for performance optimization
25+
- **Document Name Regex**: Match against document ID/name to classify all pages without LLM processing when all pages should be the same class
26+
- **Document Page Content Regex**: Match against page text content during multi-modal page-level classification for fast page classification
27+
- **Key Benefits**: Significant performance improvements and cost savings by bypassing LLM calls for pattern-matched documents, deterministic classification results for known document patterns, seamless fallback to existing LLM classification when regex patterns don't match
28+
- **Configuration**: Optional `document_name_regex` and `document_page_content_regex` fields in class definitions with automatic regex compilation and validation
29+
- **Logging**: Comprehensive info-level logging when regex patterns match for observability and debugging
30+
- **CloudFormation Integration**: Updated Pattern-2 schema to support regex configuration through the Web UI
31+
- **Demonstration**: New `step2_classification_with_regex.ipynb` notebook showcasing regex configuration and performance comparisons
32+
- **Documentation**: Enhanced classification module README and main documentation with regex usage examples and best practices
33+
34+
- **Windows WSL Development Environment Setup Guide**
35+
- Added WSL-based development environment setup guide for Windows developers in `docs/setup-development-env-WSL.md`
36+
- **Key Features**: Automated setup script (`wsl_setup.sh`) for quick installation of Git, Python, Node.js, AWS CLI, and SAM CLI
37+
- **Integrated Workflow**: Development setup combining Windows tools (VS Code, browsers) with native Linux environment
38+
- **Target Use Cases**: Windows developers needing Linux compatibility without Docker Desktop or VM overhead
39+
40+
### Fixed
41+
- **Throttling Error Detection and Retry Logic for Assessment Functions** - [GitHub Issue #45](https://github.com/aws-solutions-library-samples/accelerated-intelligent-document-processing-on-aws/issues/45)
42+
- **Assessment Function**: Enhanced throttling detection to check for throttling errors returned in `document.errors` field in addition to thrown exceptions, raising `ThrottlingException` to trigger Step Functions retry when throttling is detected
43+
- **Granular Assessment Task Caching**: Fixed caching logic to properly cache successful assessment tasks when there are ANY failed tasks (both exception-based and result-based failures), enabling efficient retry optimization by only reprocessing failed tasks while preserving successful results
44+
- **Impact**: Improved resilience for throttling scenarios, reduced redundant processing during retries, and better Step Functions retry behavior
45+
46+
- **Security Vulnerability Mitigation - Package Updates**
47+
48+
- **GovCloud Compatibility - Hardcoded Service Domain References**
49+
- Fixed hardcoded `amazonaws.com` references in CloudFormation templates that prevented GovCloud deployment
50+
- Updated all service principals and endpoints to use dynamic `${AWS::URLSuffix}` expressions for automatic region-based resolution
51+
- **Templates Updated**: `template.yaml` (main template), `patterns/pattern-3/sagemaker_classifier_endpoint.yaml`
52+
- **Services Fixed**: EventBridge, Cognito, SageMaker, ECR, CloudFront, CodeBuild, AppSync, Lambda, DynamoDB, CloudWatch Logs, Glue
53+
- Resolves GitHub Issue #50 - templates now deploy correctly in both standard AWS and GovCloud regions
54+
55+
- **Bug Fixes and Code Improvements**
56+
- Fixed HITL processing errors in both Pattern-1 (DynamoDB validation with empty strings) and Pattern-2 (string indices error in A2I output processing)
57+
- Fixed Step Function UI issues including auto-refresh button auto-disable and fetch failures for failed executions with datetime serialization errors
58+
- Cleaned up unused Step Function subscription infrastructure and removed duplicate code in Pattern-2 HITL function
59+
- Expanded UI Visual Editor bounding box size with padding for better visibility and user interaction
60+
- Fixed bug in list of models supporting cache points - previously claude 4 sonnet and opus had been excluded.
61+
- Validations added at the assessment step for checking valid json response. The validation fails after extraction/assessment is complete if json parsing issues are encountered.
62+
63+
64+
## [0.3.14]
65+
66+
### Added
67+
- Support for 1m token context for Claude Sonnet 4
68+
- Video demo of "Chat with Document" in [./docs/web-ui.md](./docs/web-ui.md)
69+
- **Human-in-the-Loop (HITL) Support Extended to Pattern-2**
70+
- Added HITL review capabilities for Pattern-2 (Textract + Bedrock processing) using Amazon SageMaker Augmented AI (A2I)
71+
- Enables human validation and correction when extraction confidence falls below configurable threshold
72+
- Includes same features as Pattern-1 HITL: automatic triggering, review portal integration, and seamless result updates
73+
- Documentation and video demo in [./docs/human-review.md](./docs/human-review.md)
74+
75+
### Removed
76+
- Windows development environment guide and setup script removed as it proved insufficiently robust
77+
78+
### Fixed
79+
- Fix 1-click Launch URL output from the GovCloud template generation script
80+
- Add Agent Analytics to architecture diagram
81+
- Fix various UX and error reporting issues with the new Python publish script
82+
- Simplify UDOP model path construction and avoid invalid default for regions other than us-east-1 and us-west-2
83+
- Permission regression from previous release affecting "Chat with Document"
84+
85+
86+
## [0.3.13]
87+
888
### Added
989

90+
- **External MCP Agent Integration for Custom Tool Extension**
91+
- Added External MCP (Model Context Protocol) Agent support that enables integration with custom MCP servers to extend IDP capabilities
92+
- **Cross-Account Integration**: Host MCP servers in separate AWS accounts or external infrastructure with secure OAuth authentication using AWS Cognito
93+
- **Dynamic Tool Discovery**: Automatically discovers and integrates available tools from MCP servers through the IDP web interface
94+
- **Secure Authentication Flow**: Uses AWS Cognito User Pools for OAuth bearer token authentication with proper token validation
95+
- **Configuration Management**: JSON array configuration in AWS Secrets Manager supporting multiple MCP server connections with optional custom agent names and descriptions
96+
- **Real-time Integration**: Tools become immediately available through the IDP web interface after configuration
97+
98+
- **AWS GovCloud Support with Automated Template Generation**
99+
- Added GovCloud compatibility through `scripts/generate_govcloud_template.py` script
100+
- **ARN Partition Compatibility**: All templates updated to use `arn:${AWS::Partition}:` for both commercial and GovCloud regions
101+
- **Headless Operation**: Automatically removes UI-related resources (CloudFront, AppSync, Cognito, WAF) for GovCloud deployment
102+
- **Core Functionality Preserved**: All 3 processing patterns and complete 6-step pipeline (OCR, Classification, Extraction, Assessment, Summarization, Evaluation) remain fully functional
103+
- **Automated Workflow**: Single script orchestrates build + GovCloud template generation + S3 upload with deployment URLs
104+
- **Enterprise Ready**: Enables headless document processing for government and enterprise environments requiring GovCloud compliance
105+
- **Documentation**: New `docs/govcloud-deployment.md` with deployment guide, architecture differences, and access methods
106+
107+
- **Pattern-2 and Pattern-3 Assessment now generate geometry (bounding boxes) for visualization in UI 'Visual Editor' (parity with Pattern-1)**
108+
- Added comprehensive spatial localization capabilities to both regular and granular assessment services
109+
- **Automatic Processing**: When LLM provides bbox coordinates, automatically converts to UI-compatible (Visual Edit) geometry format without any configuration
110+
- **Universal Support**: Works with all attribute types - simple attributes, nested group attributes (e.g., CompanyAddress.State), and list attributes
111+
- **Enhanced Prompts**: Updated assessment task prompts with spatial-localization-guidelines requesting bbox coordinates in normalized 0-1000 scale
112+
- **Demo Notebooks**: Assessment notebooks now showcase automatic bounding box processing
113+
114+
- **New Python-Based Publishing System**
115+
- Replaced `publish.sh` bash script with new `publish.py` Python script
116+
- Rich console interface with progress bars, spinners, and colored output using Rich library
117+
- Multi-threaded artifact building and uploading for significantly improved performance
118+
- Native support for Linux, macOS, and Windows environments
119+
120+
- **Windows Development Environment Setup Guide and Helper Script**
121+
- New `scripts/dev_setup.bat` (570 lines) for complete Windows development environment configuration
122+
123+
- **OCR Service Default Image Sizing for Resource Optimization**
124+
- Implemented automatic default image size limits (951×1268) when no image sizing configuration is provided
125+
- **Key Benefits**: Reduction in vision model token consumption, prevents OutOfMemory errors during concurrent processing, improves processing speed and reduces bandwidth usage
126+
127+
### Changed
128+
129+
- **Reverted to python3.12 runtime to resolve build package dependency problems**
130+
131+
### Fixed
132+
- **Improved Visual Edit bounding box position when using image zoom or pan**
133+
10134

11135

12136
## [0.3.12]

Makefile

Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ test:
1414
$(MAKE) -C lib/idp_common_pkg test
1515

1616
# Run both linting and formatting in one command
17-
lint: ruff-lint format
17+
lint: ruff-lint format check-arn-partitions
1818

1919
# Run linting checks and fix issues automatically
2020
ruff-lint:
@@ -29,16 +29,47 @@ format:
2929
lint-cicd:
3030
@echo "Running code quality checks..."
3131
@if ! ruff check; then \
32-
echo "$(RED)ERROR: Ruff linting failed!$(NC)"; \
33-
echo "$(YELLOW)Please run 'make ruff-lint' locally to fix these issues.$(NC)"; \
32+
echo -e "$(RED)ERROR: Ruff linting failed!$(NC)"; \
33+
echo -e "$(YELLOW)Please run 'make ruff-lint' locally to fix these issues.$(NC)"; \
3434
exit 1; \
3535
fi
3636
@if ! ruff format --check; then \
37-
echo "$(RED)ERROR: Code formatting check failed!$(NC)"; \
38-
echo "$(YELLOW)Please run 'make format' locally to fix these issues.$(NC)"; \
37+
echo -e "$(RED)ERROR: Code formatting check failed!$(NC)"; \
38+
echo -e "$(YELLOW)Please run 'make format' locally to fix these issues.$(NC)"; \
39+
exit 1; \
40+
fi
41+
@echo -e "$(GREEN)All code quality checks passed!$(NC)"
42+
43+
# Check CloudFormation templates for hardcoded AWS partition ARNs and service principals
44+
check-arn-partitions:
45+
@echo "Checking CloudFormation templates for hardcoded ARN partitions and service principals..."
46+
@FOUND_ISSUES=0; \
47+
for template in template.yaml patterns/*/template.yaml patterns/*/sagemaker_classifier_endpoint.yaml options/*/template.yaml; do \
48+
if [ -f "$$template" ]; then \
49+
echo "Checking $$template..."; \
50+
ARN_MATCHES=$$(grep -n "arn:aws:" "$$template" | grep -v "arn:\$${AWS::Partition}:" || true); \
51+
if [ -n "$$ARN_MATCHES" ]; then \
52+
echo -e "$(RED)ERROR: Found hardcoded 'arn:aws:' references in $$template:$(NC)"; \
53+
echo "$$ARN_MATCHES" | sed 's/^/ /'; \
54+
echo -e "$(YELLOW) These should use 'arn:\$${AWS::Partition}:' instead for GovCloud compatibility$(NC)"; \
55+
FOUND_ISSUES=1; \
56+
fi; \
57+
SERVICE_MATCHES=$$(grep -n "\.amazonaws\.com" "$$template" | grep -v "\$${AWS::URLSuffix}" | grep -v "^[[:space:]]*#" | grep -v "Description:" | grep -v "Comment:" | grep -v "cognito" | grep -v "ContentSecurityPolicy" || true); \
58+
if [ -n "$$SERVICE_MATCHES" ]; then \
59+
echo -e "$(RED)ERROR: Found hardcoded service principal references in $$template:$(NC)"; \
60+
echo "$$SERVICE_MATCHES" | sed 's/^/ /'; \
61+
echo -e "$(YELLOW) These should use '\$${AWS::URLSuffix}' instead of 'amazonaws.com' for GovCloud compatibility$(NC)"; \
62+
echo -e "$(YELLOW) Example: 'lambda.amazonaws.com' should be 'lambda.\$${AWS::URLSuffix}'$(NC)"; \
63+
FOUND_ISSUES=1; \
64+
fi; \
65+
fi; \
66+
done; \
67+
if [ $$FOUND_ISSUES -eq 0 ]; then \
68+
echo -e "$(GREEN)✅ No hardcoded ARN partition or service principal references found!$(NC)"; \
69+
else \
70+
echo -e "$(RED)❌ Found hardcoded references that need to be fixed for GovCloud compatibility$(NC)"; \
3971
exit 1; \
4072
fi
41-
@echo "$(GREEN)All code quality checks passed!$(NC)"
4273

4374
# A convenience Makefile target that runs
4475
commit: lint test

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ White-glove customization, deployment, and integration support for production us
3939
- **Cost Optimization**: Pay-per-use pricing model with built-in controls
4040
- **Comprehensive Monitoring**: Rich CloudWatch dashboard with detailed metrics and logs
4141
- **Web User Interface**: Modern UI for inspecting document workflow status and results
42+
- **Human-in-the-Loop (HITL)**: Amazon A2I integration for human review workflows (Pattern 1 & Pattern 2)
43+
- **Note**: When deploying multiple patterns with HITL, reuse existing private workteam ARN due to AWS account limits
4244
- **AI-Powered Evaluation**: Framework to assess accuracy against baseline data
4345
- **Extraction Confidence Assessment**: LLM-powered assessment of extraction confidence with multimodal document analysis
4446
- **Document Knowledge Base Query**: Ask questions about your processed documents
@@ -124,9 +126,12 @@ For detailed deployment and testing instructions, see the [Deployment Guide](./d
124126
- [Deployment](./docs/deployment.md) - Build, publish, deploy, and test instructions
125127
- [Web UI](./docs/web-ui.md) - Web interface features and usage
126128
- [Agent Analysis](./docs/agent-analysis.md) - Natural language analytics and data visualization feature
129+
- [Custom MCP Agent](./docs/custom-MCP-agent.md) - Integrating external MCP servers for custom tools and capabilities
127130
- [Configuration](./docs/configuration.md) - Configuration and customization options
131+
- [Discovery](./docs/discovery.md) - Pattern-neutral discovery process and BDA blueprint automation
128132
- [Classification](./docs/classification.md) - Customizing document classification
129133
- [Extraction](./docs/extraction.md) - Customizing information extraction
134+
- [Human-in-the-Loop Review](./docs/human-review.md) - Human review workflows with Amazon A2I
130135
- [Assessment](./docs/assessment.md) - Extraction confidence evaluation using LLMs
131136
- [Evaluation Framework](./docs/evaluation.md) - Accuracy assessment system with analytics database and reporting
132137
- [Knowledge Base](./docs/knowledge-base.md) - Document knowledge base query feature

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.3.12
1+
0.3.15

config_library/pattern-1/lending-package-sample/config.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,16 @@ pricing:
185185
price: '3.0E-7'
186186
- name: cacheWriteInputTokens
187187
price: '3.75E-6'
188+
- name: bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0:1m
189+
units:
190+
- name: inputTokens
191+
price: '6.0E-6'
192+
- name: outputTokens
193+
price: '2.25E-5'
194+
- name: cacheReadInputTokens
195+
price: '6.0E-7'
196+
- name: cacheWriteInputTokens
197+
price: '7.5E-6'
188198
- name: bedrock/us.anthropic.claude-opus-4-20250514-v1:0
189199
units:
190200
- name: inputTokens

0 commit comments

Comments
 (0)