Skip to content

Commit 915e724

Browse files
committed
Merge branch 'feature/a2i-pattern2-merge' into 'develop'
HITL (A2I) integration for Pattern2 See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!279
2 parents 779f374 + 49d1ae1 commit 915e724

File tree

23 files changed

+2264
-415
lines changed

23 files changed

+2264
-415
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ White-glove customization, deployment, and integration support for production us
3939
- **Cost Optimization**: Pay-per-use pricing model with built-in controls
4040
- **Comprehensive Monitoring**: Rich CloudWatch dashboard with detailed metrics and logs
4141
- **Web User Interface**: Modern UI for inspecting document workflow status and results
42+
- **Human-in-the-Loop (HITL)**: Amazon A2I integration for human review workflows (Pattern 1 & Pattern 2)
43+
- **Note**: When deploying multiple patterns with HITL, reuse existing private workteam ARN due to AWS account limits
4244
- **AI-Powered Evaluation**: Framework to assess accuracy against baseline data
4345
- **Extraction Confidence Assessment**: LLM-powered assessment of extraction confidence with multimodal document analysis
4446
- **Document Knowledge Base Query**: Ask questions about your processed documents
@@ -128,6 +130,7 @@ For detailed deployment and testing instructions, see the [Deployment Guide](./d
128130
- [Configuration](./docs/configuration.md) - Configuration and customization options
129131
- [Classification](./docs/classification.md) - Customizing document classification
130132
- [Extraction](./docs/extraction.md) - Customizing information extraction
133+
- [Human-in-the-Loop Review](./docs/human-review.md) - Human review workflows with Amazon A2I
131134
- [Assessment](./docs/assessment.md) - Extraction confidence evaluation using LLMs
132135
- [Evaluation Framework](./docs/evaluation.md) - Accuracy assessment system with analytics database and reporting
133136
- [Knowledge Base](./docs/knowledge-base.md) - Document knowledge base query feature

docs/human-review.md

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Human-in-the-Loop (HITL) Review
2+
3+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
4+
SPDX-License-Identifier: MIT-0
5+
6+
## Table of Contents
7+
8+
- [Overview](#overview)
9+
- [Architecture](#architecture)
10+
- [Workflow](#workflow)
11+
- [Configuration](#configuration)
12+
- [Review Portal](#review-portal)
13+
- [Best Practices](#best-practices)
14+
- [Troubleshooting](#troubleshooting)
15+
- [Known Limitations](#known-limitations)
16+
17+
## Overview
18+
19+
The GenAI-IDP solution supports Human-in-the-Loop (HITL) review capabilities using Amazon SageMaker Augmented AI (A2I). This feature enables human reviewers to validate and correct extracted information when the system's confidence falls below a specified threshold, ensuring accuracy for critical document processing workflows.
20+
21+
**Supported Patterns:**
22+
- Pattern 1: BDA processing with HITL review
23+
- Pattern 2: Textract + Bedrock processing with HITL review
24+
25+
## Architecture
26+
27+
The HITL system integrates with the document processing workflow through:
28+
29+
- **Amazon SageMaker A2I**: Manages human review tasks and workflows
30+
- **Private Workforce**: Secure reviewer access with existing credentials
31+
- **Review Portal**: Web interface for validation and correction
32+
- **Confidence Assessment**: Automated triggering based on extraction confidence scores
33+
34+
<img src="../images/hitl_a2i_workflow.png" alt="HITL Flow diagram" width="800">
35+
36+
## Workflow
37+
38+
### 1. Automatic Triggering
39+
40+
HITL review is automatically triggered when:
41+
- HITL feature is enabled in your configuration
42+
- Extraction confidence score falls below the configured threshold
43+
- The system creates a human review task in SageMaker A2I
44+
45+
### 2. Review Process
46+
47+
**Access:**
48+
- Reviewers access the SageMaker A2I Review Portal (URL available in CloudFormation output `SageMakerA2IReviewPortalURL`)
49+
- Login credentials are the same as those used for the GenAI IDP portal
50+
51+
**Review Tasks:**
52+
- Extracted key-value pairs are presented for validation and correction
53+
- Reviewers can validate correct extractions or make necessary corrections
54+
- Submit corrections using the "Submit" button
55+
56+
### 3. Result Integration
57+
58+
- Corrected key-value pairs automatically update the source results
59+
- The document processing workflow continues with the human-verified data
60+
- Processing status is updated to reflect human review completion
61+
62+
## Configuration
63+
64+
### Deployment Parameters
65+
66+
**Pattern 1:**
67+
- `EnableHITL`: Boolean parameter to enable/disable the HITL feature
68+
- `Pattern1 - Existing Private Workforce ARN`: parameter to use existing private workforce (reuse existing private workteam ARN due to AWS account limits)
69+
70+
**Pattern 2:**
71+
- `EnableHITL`: Boolean parameter to enable/disable the HITL feature
72+
- `Pattern2 - Existing Private Workforce ARN`: parameter to use existing private workforce (reuse existing private workteam ARN due to AWS account limits)
73+
74+
### Confidence Threshold Configuration
75+
76+
The confidence threshold determines when human review is triggered:
77+
78+
1. **Access the Web UI**: Open the Web UI URL from your CloudFormation stack outputs
79+
2. **Navigate to Configuration**: Click on the "Configuration" tab in the navigation menu
80+
3. **Find Assessment & HITL Section**: Scroll to the "Assessment & HITL Configuration" section
81+
4. **Set Confidence Threshold**:
82+
- Enter a value between 0.0-1.0 (e.g., 0.8 for 80% confidence threshold)
83+
- Fields with confidence scores below this threshold will trigger HITL review
84+
5. **Save Configuration**: Click "Save" to apply the changes
85+
86+
The confidence threshold is stored as a configuration parameter and automatically applied to new document processing without requiring stack redeployment.
87+
88+
## Review Portal
89+
90+
### Accessing the Portal
91+
92+
The SageMaker A2I Review Portal URL is available in your CloudFormation stack outputs as `SageMakerA2IReviewPortalURL`.
93+
94+
### Portal Features
95+
96+
- **Task Queue**: View all pending review tasks
97+
- **Document Preview**: Visual representation of the document being reviewed
98+
- **Key-Value Editor**: Interface for validating and correcting extracted data
99+
- **Submission Controls**: Submit approved or corrected extractions
100+
101+
### Reviewer Credentials
102+
103+
- Use the same credentials as the GenAI IDP Web UI portal
104+
- If using an existing private workforce, provide the workforce ARN during deployment
105+
106+
## Best Practices
107+
108+
### Review Management
109+
- **Regular Monitoring**: Check the Review Portal regularly for pending tasks to avoid processing delays
110+
- **Consistent Guidelines**: Establish consistent correction guidelines if multiple reviewers are involved
111+
- **Quality Assurance**: Implement review quality checks for critical document types
112+
113+
### Threshold Optimization
114+
- **Start Conservative**: Begin with higher confidence thresholds (0.8-0.9) and adjust based on accuracy needs
115+
- **Monitor Performance**: Track review frequency and accuracy improvements to optimize thresholds
116+
- **Document-Specific**: Consider different thresholds for different document types based on complexity
117+
118+
### Workflow Integration
119+
- **Training**: Ensure reviewers understand the document types and expected extraction fields
120+
- **Escalation**: Define processes for complex cases that require additional expertise
121+
- **Feedback Loop**: Use review corrections to improve extraction prompts and configurations
122+
123+
## Troubleshooting
124+
125+
### Common Issues
126+
127+
**No Review Tasks Appearing:**
128+
- Verify HITL is enabled in deployment parameters
129+
- Check confidence threshold settings in Web UI configuration
130+
- Ensure documents are triggering confidence scores below threshold
131+
132+
**Portal Access Issues:**
133+
- Verify reviewer credentials match GenAI IDP Web UI credentials
134+
- Check private workforce configuration if using existing workforce
135+
- Confirm portal URL from CloudFormation outputs
136+
137+
**Review Submissions Not Processing:**
138+
- Check Step Functions execution for error details
139+
- Verify A2I workflow definition is properly configured
140+
- Review CloudWatch logs for processing errors
141+
142+
### Monitoring
143+
144+
Monitor HITL performance through:
145+
- **CloudWatch Metrics**: Track review task creation and completion rates
146+
- **Step Functions**: Monitor workflow execution and HITL integration points
147+
- **Web UI Dashboard**: View document processing status including HITL stages
148+
149+
## Known Limitations
150+
151+
### Current Limitations
152+
153+
- **Task Navigation**: Current version of SageMaker A2I cannot provide direct hyperlink to specific document tasks. When reviewers click on review document URL, the portal displays all review tasks without task-specific navigation.
154+
155+
- **Template Updates**: Updating SageMaker A2I Template and workflow performs deletion on A2I flow definition and custom template, then recreates resources via Lambda function. Direct updates to A2I resources through Python SDK are not supported.
156+
157+
- **Private Workforce Cognito Limitation**: AWS SageMaker Ground Truth allows only **one private workforce per Cognito User Pool** per AWS account. This creates a critical dependency when deploying multiple GenAI-IDP stacks with HITL enabled:
158+
- Each private workforce must be mapped to a unique Cognito client
159+
- Multiple stacks cannot create separate private workforces if they use the same Cognito User Pool
160+
- **Risk**: If the first stack (that created the private workforce) is deleted, it will break the private workteam for all other stacks using the same workforce
161+
- **Recommendation**: Always reuse existing private workteam ARNs when deploying additional patterns or stacks with HITL enabled
162+
- Use the `ExistingPrivateWorkforceArn` parameter to reference the workforce created by your first HITL-enabled deployment
163+
164+
### Workarounds
165+
166+
- **Task Management**: Reviewers should process tasks in chronological order or use task identifiers to track specific documents
167+
- **Configuration Changes**: Plan A2I template updates during maintenance windows to minimize disruption
168+
- **Multi-Stack HITL Deployment**:
169+
1. Deploy your first HITL-enabled stack and note the `PrivateWorkteamArn` from CloudFormation outputs
170+
2. For subsequent stacks, provide this ARN in the `ExistingPrivateWorkforceArn` parameter
171+
3. Never delete the original stack that created the private workforce without first migrating the workforce to another stack
172+
4. Consider creating a dedicated "HITL infrastructure" stack to manage the private workforce independently

docs/pattern-1.md

Lines changed: 3 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -196,49 +196,11 @@ payload = {
196196

197197
Pattern-1 supports Human-in-the-Loop (HITL) review capabilities using Amazon SageMaker Augmented AI (A2I). This feature allows human reviewers to validate and correct extracted information when the system's confidence falls below a specified threshold.
198198

199-
#### HITL Workflow
200-
1. **Automatic Triggering**:
201-
- HITL is triggered when the feature is enabled in your configuration
202-
- Extraction confidence score falls below your configured confidence threshold
203-
- The system creates a human review task in SageMaker A2I
204-
205-
2. **Review Process**:
206-
- Reviewers access the SageMaker A2I Review Portal (URL available in CloudFormation output `SageMakerA2IReviewPortalURL`)
207-
- Login credentials are the same as those used for the GenAI IDP portal (if you want to use your own Private work team, you can provide your existing private workforce work team arn as a input parameter for `Pattern1 - Existing Private Workforce ARN`)
208-
- Extracted key-value pairs are presented for validation and correction
209-
- Reviewers validate correct extractions or make necessary corrections
210-
- After review, corrections are submitted with the "Submit" button
211-
212-
3. **Result Integration**:
213-
- Corrected key-value pairs automatically update the source results
214-
- The document processing workflow continues with the human-verified data
215-
216-
<img src="../../images/hitl_a2i_workflow.png" alt="HITL Flow diagram" width="800">
217-
218-
#### Configuration
199+
**Pattern-1 Specific Configuration:**
219200
- `EnableHITL`: Boolean parameter to enable/disable the HITL feature
220-
- **Confidence Threshold**: Configured through the Web UI Portal Configuration tab under "Assessment & HITL Configuration" section. This numeric value (0.0-1.0) determines when human review is triggered based on extraction confidence scores.
201+
- `Pattern1 - Existing Private Workforce ARN`: Optional parameter to use existing private workforce
221202

222-
#### Configuring Confidence Threshold
223-
To set the confidence threshold for HITL triggering:
224-
225-
1. **Access the Web UI**: Open the Web UI URL from your CloudFormation stack outputs
226-
2. **Navigate to Configuration**: Click on the "Configuration" tab in the navigation menu
227-
3. **Find Assessment & HITL Section**: Scroll to the "Assessment & HITL Configuration" section
228-
4. **Set Confidence Threshold**:
229-
- Enter a value between 0.0-1.0 (e.g., 0.8 for 80% confidence threshold)
230-
- Fields with confidence scores below this threshold will trigger HITL review
231-
5. **Save Configuration**: Click "Save" to apply the changes
232-
233-
The confidence threshold is stored as a configuration parameter and automatically applied to new document processing without requiring stack redeployment.
234-
235-
#### Best Practices
236-
- Regularly check the Review Portal for pending tasks to avoid processing delays
237-
- Establish consistent correction guidelines if multiple reviewers are involved
238-
239-
#### Known Limitations
240-
- Current version of SageMaker A2I cannot provide direct hyperlink to respective document tasks. When reviewer clicks on review document URL and start working, review portal will start displating all review tasks. Reviewer cannot pick specific task and start working.
241-
- Updating SageMaker A2I Template and workflow performs deletion on A2I flow definition, A2I custom template and recreate the resources via lambda function. Update A2I resources through Python SDK is not allowed.
203+
For comprehensive HITL documentation including workflow details, configuration steps, best practices, and troubleshooting, see the [Human-in-the-Loop Review Guide](./human-review.md).
242204

243205
## Best Practices
244206
1. **BDA Project Configuration**:

docs/pattern-2.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ This pattern implements an intelligent document processing workflow that uses Am
1616
- [Classification Function](#classification-function)
1717
- [Extraction Function](#extraction-function)
1818
- [ProcessResults Function](#processresults-function)
19+
- [Human-in-the-Loop (HITL)](#human-in-the-loop-hitl)
1920
- [Monitoring and Metrics](#monitoring-and-metrics)
2021
- [Performance Metrics](#performance-metrics)
2122
- [Error Tracking](#error-tracking)
@@ -176,6 +177,17 @@ Each step includes comprehensive retry logic for handling transient errors:
176177
}
177178
```
178179

180+
### Human-in-the-Loop (HITL)
181+
182+
Pattern-2 supports Human-in-the-Loop (HITL) review capabilities using Amazon SageMaker Augmented AI (A2I). This feature allows human reviewers to validate and correct extracted information when the system's confidence falls below a specified threshold.
183+
184+
**Pattern-2 Specific Configuration:**
185+
- `EnableHITL`: Boolean parameter to enable/disable the HITL feature
186+
- `IsPattern2HITLEnabled`: Boolean parameter specific to Pattern-2 HITL enablement
187+
- `Pattern2 - Existing Private Workforce ARN`: Optional parameter to use existing private workforce
188+
189+
For comprehensive HITL documentation including workflow details, configuration steps, best practices, and troubleshooting, see the [Human-in-the-Loop Review Guide](./human-review.md).
190+
179191
### Monitoring and Metrics
180192

181193
The pattern includes a comprehensive CloudWatch dashboard with:

0 commit comments

Comments
 (0)