|
| 1 | +# Human-in-the-Loop (HITL) Review |
| 2 | + |
| 3 | +Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. |
| 4 | +SPDX-License-Identifier: MIT-0 |
| 5 | + |
| 6 | +## Table of Contents |
| 7 | + |
| 8 | +- [Overview](#overview) |
| 9 | +- [Architecture](#architecture) |
| 10 | +- [Workflow](#workflow) |
| 11 | +- [Configuration](#configuration) |
| 12 | +- [Review Portal](#review-portal) |
| 13 | +- [Best Practices](#best-practices) |
| 14 | +- [Troubleshooting](#troubleshooting) |
| 15 | +- [Known Limitations](#known-limitations) |
| 16 | + |
| 17 | +## Overview |
| 18 | + |
| 19 | +The GenAI-IDP solution supports Human-in-the-Loop (HITL) review capabilities using Amazon SageMaker Augmented AI (A2I). This feature enables human reviewers to validate and correct extracted information when the system's confidence falls below a specified threshold, ensuring accuracy for critical document processing workflows. |
| 20 | + |
| 21 | +**Supported Patterns:** |
| 22 | +- Pattern 1: BDA processing with HITL review |
| 23 | +- Pattern 2: Textract + Bedrock processing with HITL review |
| 24 | + |
| 25 | + |
| 26 | +https://github.com/user-attachments/assets/126c9a70-6811-46f3-9166-ef71397ea4bc |
| 27 | + |
| 28 | + |
| 29 | + |
| 30 | +## Architecture |
| 31 | + |
| 32 | +The HITL system integrates with the document processing workflow through: |
| 33 | + |
| 34 | +- **Amazon SageMaker A2I**: Manages human review tasks and workflows |
| 35 | +- **Private Workforce**: Secure reviewer access with existing credentials |
| 36 | +- **Review Portal**: Web interface for validation and correction |
| 37 | +- **Confidence Assessment**: Automated triggering based on extraction confidence scores |
| 38 | + |
| 39 | +<img src="../images/hitl_a2i_workflow.png" alt="HITL Flow diagram" width="800"> |
| 40 | + |
| 41 | +## Workflow |
| 42 | + |
| 43 | +### 1. Automatic Triggering |
| 44 | + |
| 45 | +HITL review is automatically triggered when: |
| 46 | +- HITL feature is enabled in your configuration |
| 47 | +- Extraction confidence score falls below the configured threshold |
| 48 | +- The system creates a human review task in SageMaker A2I |
| 49 | + |
| 50 | +### 2. Review Process |
| 51 | + |
| 52 | +**Access:** |
| 53 | +- Reviewers access the SageMaker A2I Review Portal (URL available in CloudFormation output `SageMakerA2IReviewPortalURL`) |
| 54 | +- Login credentials are the same as those used for the GenAI IDP portal |
| 55 | + |
| 56 | +**Review Tasks:** |
| 57 | +- Extracted key-value pairs are presented for validation and correction |
| 58 | +- Reviewers can validate correct extractions or make necessary corrections |
| 59 | +- Submit corrections using the "Submit" button |
| 60 | + |
| 61 | +### 3. Result Integration |
| 62 | + |
| 63 | +- Corrected key-value pairs automatically update the source results |
| 64 | +- The document processing workflow continues with the human-verified data |
| 65 | +- Processing status is updated to reflect human review completion |
| 66 | + |
| 67 | +## Configuration |
| 68 | + |
| 69 | +### Deployment Parameters |
| 70 | + |
| 71 | +**Pattern 1:** |
| 72 | +- `EnableHITL`: Boolean parameter to enable/disable the HITL feature |
| 73 | +- `Pattern1 - Existing Private Workforce ARN`: parameter to use existing private workforce (reuse existing private workteam ARN due to AWS account limits) |
| 74 | + |
| 75 | +**Pattern 2:** |
| 76 | +- `EnableHITL`: Boolean parameter to enable/disable the HITL feature |
| 77 | +- `Pattern2 - Existing Private Workforce ARN`: parameter to use existing private workforce (reuse existing private workteam ARN due to AWS account limits) |
| 78 | + |
| 79 | +### Confidence Threshold Configuration |
| 80 | + |
| 81 | +The confidence threshold determines when human review is triggered: |
| 82 | + |
| 83 | +1. **Access the Web UI**: Open the Web UI URL from your CloudFormation stack outputs |
| 84 | +2. **Navigate to Configuration**: Click on the "Configuration" tab in the navigation menu |
| 85 | +3. **Find Assessment & HITL Section**: Scroll to the "Assessment & HITL Configuration" section |
| 86 | +4. **Set Confidence Threshold**: |
| 87 | + - Enter a value between 0.0-1.0 (e.g., 0.8 for 80% confidence threshold) |
| 88 | + - Fields with confidence scores below this threshold will trigger HITL review |
| 89 | +5. **Save Configuration**: Click "Save" to apply the changes |
| 90 | + |
| 91 | +The confidence threshold is stored as a configuration parameter and automatically applied to new document processing without requiring stack redeployment. |
| 92 | + |
| 93 | +## Review Portal |
| 94 | + |
| 95 | +### Accessing the Portal |
| 96 | + |
| 97 | +The SageMaker A2I Review Portal URL is available in your CloudFormation stack outputs as `SageMakerA2IReviewPortalURL`. |
| 98 | + |
| 99 | +### Portal Features |
| 100 | + |
| 101 | +- **Task Queue**: View all pending review tasks |
| 102 | +- **Document Preview**: Visual representation of the document being reviewed |
| 103 | +- **Key-Value Editor**: Interface for validating and correcting extracted data |
| 104 | +- **Submission Controls**: Submit approved or corrected extractions |
| 105 | + |
| 106 | +### Reviewer Credentials |
| 107 | + |
| 108 | +- Use the same credentials as the GenAI IDP Web UI portal |
| 109 | +- If using an existing private workforce, provide the workforce ARN during deployment |
| 110 | + |
| 111 | +## Best Practices |
| 112 | + |
| 113 | +### Review Management |
| 114 | +- **Regular Monitoring**: Check the Review Portal regularly for pending tasks to avoid processing delays |
| 115 | +- **Consistent Guidelines**: Establish consistent correction guidelines if multiple reviewers are involved |
| 116 | +- **Quality Assurance**: Implement review quality checks for critical document types |
| 117 | + |
| 118 | +### Threshold Optimization |
| 119 | +- **Start Conservative**: Begin with higher confidence thresholds (0.8-0.9) and adjust based on accuracy needs |
| 120 | +- **Monitor Performance**: Track review frequency and accuracy improvements to optimize thresholds |
| 121 | +- **Document-Specific**: Consider different thresholds for different document types based on complexity |
| 122 | + |
| 123 | +### Workflow Integration |
| 124 | +- **Training**: Ensure reviewers understand the document types and expected extraction fields |
| 125 | +- **Escalation**: Define processes for complex cases that require additional expertise |
| 126 | +- **Feedback Loop**: Use review corrections to improve extraction prompts and configurations |
| 127 | + |
| 128 | +## Troubleshooting |
| 129 | + |
| 130 | +### Common Issues |
| 131 | + |
| 132 | +**No Review Tasks Appearing:** |
| 133 | +- Verify HITL is enabled in deployment parameters |
| 134 | +- Check confidence threshold settings in Web UI configuration |
| 135 | +- Ensure documents are triggering confidence scores below threshold |
| 136 | + |
| 137 | +**Portal Access Issues:** |
| 138 | +- Verify reviewer credentials match GenAI IDP Web UI credentials |
| 139 | +- Check private workforce configuration if using existing workforce |
| 140 | +- Confirm portal URL from CloudFormation outputs |
| 141 | + |
| 142 | +**Review Submissions Not Processing:** |
| 143 | +- Check Step Functions execution for error details |
| 144 | +- Verify A2I workflow definition is properly configured |
| 145 | +- Review CloudWatch logs for processing errors |
| 146 | + |
| 147 | +### Monitoring |
| 148 | + |
| 149 | +Monitor HITL performance through: |
| 150 | +- **CloudWatch Metrics**: Track review task creation and completion rates |
| 151 | +- **Step Functions**: Monitor workflow execution and HITL integration points |
| 152 | +- **Web UI Dashboard**: View document processing status including HITL stages |
| 153 | + |
| 154 | +## Known Limitations |
| 155 | + |
| 156 | +### Current Limitations |
| 157 | + |
| 158 | +- **Task Navigation**: Current version of SageMaker A2I cannot provide direct hyperlink to specific document tasks. When reviewers click on review document URL, the portal displays all review tasks without task-specific navigation. |
| 159 | + |
| 160 | +- **Template Updates**: Updating SageMaker A2I Template and workflow performs deletion on A2I flow definition and custom template, then recreates resources via Lambda function. Direct updates to A2I resources through Python SDK are not supported. |
| 161 | + |
| 162 | +- **Private Workforce Cognito Limitation**: AWS SageMaker Ground Truth allows only **one private workforce per Cognito User Pool** per AWS account. This creates a critical dependency when deploying multiple GenAI-IDP stacks with HITL enabled: |
| 163 | + - Each private workforce must be mapped to a unique Cognito client |
| 164 | + - Multiple stacks cannot create separate private workforces if they use the same Cognito User Pool |
| 165 | + - **Risk**: If the first stack (that created the private workforce) is deleted, it will break the private workteam for all other stacks using the same workforce |
| 166 | + - **Recommendation**: Always reuse existing private workteam ARNs when deploying additional patterns or stacks with HITL enabled |
| 167 | + - Use the `ExistingPrivateWorkforceArn` parameter to reference the workforce created by your first HITL-enabled deployment |
| 168 | + |
| 169 | +### Workarounds |
| 170 | + |
| 171 | +- **Task Management**: Reviewers should process tasks in chronological order or use task identifiers to track specific documents |
| 172 | +- **Configuration Changes**: Plan A2I template updates during maintenance windows to minimize disruption |
| 173 | +- **Multi-Stack HITL Deployment**: |
| 174 | + 1. Deploy your first HITL-enabled stack and note the `PrivateWorkteamArn` from CloudFormation outputs |
| 175 | + 2. For subsequent stacks, provide this ARN in the `ExistingPrivateWorkforceArn` parameter |
| 176 | + 3. Never delete the original stack that created the private workforce without first migrating the workforce to another stack |
| 177 | + 4. Consider creating a dedicated "HITL infrastructure" stack to manage the private workforce independently |
0 commit comments