Skip to content

Commit d52fb0f

Browse files
author
Bob Strahan
committed
Add Permissions Boundary parameter support to main CloudFormation template
1 parent f6e42aa commit d52fb0f

File tree

2 files changed

+93
-91
lines changed

2 files changed

+93
-91
lines changed

memory-bank/activeContext.md

Lines changed: 65 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -2,111 +2,85 @@
22

33
## Current Task Focus
44

5-
**User Question**: Understanding OCR processing architecture for large PDFs (500+ pages) in the IDP accelerator, specifically:
6-
1. Is OCR processing sequential or distributed by page?
7-
2. How does Bedrock-only OCR deployment differ?
8-
3. What parts of the system run sequentially vs distributed?
9-
4. Handling massive PDFs with hundreds of forms without clear page boundaries
5+
**Customer Question**: "We are encountering difficulties deploying your IDP stack outside of a sandbox environment due to an organization-wide Service Control Policy (SCP). This policy mandates the attachment of a Permissions Boundary to any new role. Could you please inform us if it is possible to update the CloudFormation template to include a parameterized Permissions Boundary? Without this update, our ability to transition the code to production will be significantly impeded."
106

11-
## Key Findings
7+
**Task Status**: Implementation phase - Need to add Permissions Boundary parameter support to CloudFormation templates
128

13-
### OCR Processing Models
9+
## Problem Analysis
1410

15-
The IDP accelerator uses **different processing models depending on the pattern**:
11+
### Current Situation
12+
- IDP stack creates numerous IAM roles across main template and pattern templates
13+
- Organization has SCP requiring Permissions Boundary on all new IAM roles
14+
- Current templates don't support Permissions Boundary configuration
15+
- Blocking production deployment
1616

17-
#### Pattern 1 (BDA): Sequential Internal Processing
18-
- **OCR Approach**: Bedrock Data Automation handles everything internally
19-
- **Processing**: Entire document processed as single unit by BDA service
20-
- **Concurrency**: Not user-controllable, managed by BDA
21-
- **Large Documents**: Subject to BDA service limits and timeouts
17+
### Affected Templates
18+
- **Main Template**: `template.yaml` - ~15 IAM roles
19+
- **Pattern 1**: `patterns/pattern-1/template.yaml` - ~8 IAM roles
20+
- **Pattern 2**: `patterns/pattern-2/template.yaml` - ~6 roles
21+
- **Pattern 3**: `patterns/pattern-3/template.yaml` - ~5 roles
22+
- **Options**: `options/bda-lending-project/template.yaml`, `options/bedrockkb/template.yaml`
2223

23-
#### Pattern 2/3 (Textract + Bedrock): Distributed Page Processing
24-
- **OCR Approach**: AWS Textract with concurrent page processing
25-
- **Processing**: **Pages processed in parallel** using ThreadPoolExecutor
26-
- **Concurrency**: Configurable (default: 20 concurrent workers)
27-
- **Large Documents**: Optimal for 500+ page documents
24+
## Solution Design
2825

29-
### Sequential vs Distributed Components
26+
### Approach: Parameterized Permissions Boundary
27+
1. **Add optional parameter** to main template for Permissions Boundary ARN
28+
2. **Conditionally apply boundary** to all IAM roles when provided
29+
3. **Maintain backward compatibility** for deployments without boundaries
30+
4. **Cascade parameter** to all nested pattern stacks
3031

31-
#### Sequential Processing:
32-
1. **Step Functions Workflow**: OCR → Classification → Extraction → Assessment → Summarization
33-
2. **Classification**: Analyzes all pages to create document boundaries
34-
3. **BDA Internal Processing**: Everything handled as single unit
32+
### Implementation Plan
3533

36-
#### Distributed Processing:
37-
1. **OCR Pages (Pattern 2/3)**: Up to 20 pages processed simultaneously
38-
2. **Extraction Sections**: Up to 10 document sections processed in parallel
39-
3. **Independent API Calls**: Each page makes separate Textract calls
34+
#### Step 1: Main Template Updates (`template.yaml`)
35+
- Add `PermissionsBoundaryArn` parameter
36+
- Add `HasPermissionsBoundary` condition
37+
- Update all IAM role resources with conditional boundary
38+
- Pass parameter to nested stacks
39+
- Update CloudFormation interface metadata
4040

41-
## Customer Scenario Analysis
41+
#### Step 2: Pattern Template Updates
42+
- Add parameter to each pattern template
43+
- Update all IAM roles in patterns
44+
- Maintain consistency across all patterns
4245

43-
### 500+ Page PDF with Multiple Forms
46+
#### Step 3: Options Template Updates
47+
- Update BDA lending project template
48+
- Update Bedrock KB template
4449

45-
**Challenge**: Single PDF containing hundreds of forms without clear page boundaries
50+
### Key Implementation Details
4651

47-
**Recommended Approach**: Pattern 2 or 3 for optimal performance
48-
49-
**Why Pattern 2/3 is Better**:
50-
- **Page-Level Parallelism**: 500 pages processed 20 at a time
51-
- **Memory Efficiency**: Individual pages loaded, not entire document
52-
- **Fault Tolerance**: Page failures don't stop entire processing
53-
- **Granular Control**: Can optimize per-page processing
54-
55-
**Classification Strategy**:
56-
- Use "holistic" classification method to analyze entire document
57-
- Creates logical sections grouping related pages
58-
- Handles form boundaries that don't align with page boundaries
59-
60-
## Technical Implementation Details
52+
**Parameter Definition:**
53+
```yaml
54+
PermissionsBoundaryArn:
55+
Type: String
56+
Default: ""
57+
Description: (Optional) ARN of IAM Permissions Boundary policy
58+
AllowedPattern: "^(|arn:aws:iam::[0-9]{12}:policy/.+)$"
59+
```
6160
62-
### OCR Service Configuration for Large Documents
61+
**Condition:**
62+
```yaml
63+
HasPermissionsBoundary: !Not [!Equals [!Ref PermissionsBoundaryArn, ""]]
64+
```
6365
66+
**Role Update Pattern:**
6467
```yaml
65-
ocr:
66-
backend: "textract"
67-
max_workers: 20 # Increase for more parallelism
68-
image:
69-
dpi: 150 # Balance quality vs processing time
70-
target_width: 1024
71-
target_height: 1024
72-
features:
73-
- name: "LAYOUT"
74-
- name: "TABLES"
75-
- name: "FORMS"
68+
SomeRole:
69+
Type: AWS::IAM::Role
70+
Properties:
71+
# existing properties...
72+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
7673
```
7774
78-
### Processing Flow for Large PDFs
79-
80-
1. **Document Load**: PyMuPDF loads PDF structure
81-
2. **Page Distribution**: ThreadPoolExecutor creates 20 concurrent workers
82-
3. **Parallel OCR**: Each page processed independently via Textract
83-
4. **Result Assembly**: Pages sorted and combined into document structure
84-
5. **Classification**: Holistic analysis creates logical document sections
85-
6. **Parallel Extraction**: Sections processed concurrently (MaxConcurrency: 10)
86-
87-
## Performance Implications
88-
89-
### For 500-Page Document:
90-
- **Pattern 1 (BDA)**: Single job, BDA-managed processing
91-
- **Pattern 2/3**: ~25 batches of 20 pages each, highly parallelized
92-
93-
### Bottlenecks to Consider:
94-
1. **Textract Rate Limits**: May need to adjust max_workers
95-
2. **Memory Usage**: 20 concurrent pages require significant memory
96-
3. **S3 Operations**: Parallel uploads/downloads for page results
97-
4. **Lambda Timeouts**: Ensure sufficient timeout for large documents
98-
99-
## Next Steps and Considerations
100-
101-
### For Customer Implementation:
102-
1. **Choose Pattern 2 or 3** for large document processing
103-
2. **Configure max_workers** based on Textract limits and memory
104-
3. **Use holistic classification** to handle form boundaries
105-
4. **Monitor memory usage** during processing
106-
5. **Consider document splitting** if single PDF approach is problematic
107-
108-
### Optimization Opportunities:
109-
- **Adaptive Concurrency**: Adjust workers based on document size
110-
- **Progressive Processing**: Start classification while OCR continues
111-
- **Caching Strategy**: Cache page images for reprocessing
112-
- **Error Recovery**: Implement page-level retry with exponential backoff
75+
## Benefits
76+
- **SCP Compliance**: Satisfies organizational requirements
77+
- **Backward Compatible**: Existing deployments unaffected
78+
- **Flexible**: Works with any Permissions Boundary policy
79+
- **Comprehensive**: Covers all IAM roles across all components
80+
81+
## Next Steps
82+
1. Implement main template changes
83+
2. Update all pattern templates
84+
3. Update options templates
85+
4. Test deployment scenarios
86+
5. Document usage examples

template.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,18 @@ Parameters:
290290
Description: Maximum acceptable execution time in milliseconds before alerting (default 300000 = 300 seconds)
291291
MinValue: 1000
292292

293+
# Security Configuration
294+
PermissionsBoundaryArn:
295+
Type: String
296+
Default: ""
297+
Description: >-
298+
(Optional) ARN of an existing IAM Permissions Boundary policy to attach to all IAM roles.
299+
Required by some organizations with Service Control Policies (SCPs).
300+
Format: arn:aws:iam::account-id:policy/policy-name
301+
Leave blank if no Permissions Boundary is required.
302+
AllowedPattern: "^(|arn:aws:iam::[0-9]{12}:policy/.+)$"
303+
ConstraintDescription: Must be empty or a valid IAM policy ARN
304+
293305
# Logging configuration
294306
LogLevel:
295307
Type: String
@@ -428,6 +440,7 @@ Conditions:
428440
ShouldCreateDocumentKnowledgeBase: !Equals [!Ref DocumentKnowledgeBase, "BEDROCK_KNOWLEDGE_BASE (Create)"]
429441
ShouldUseDocumentKnowledgeBase: !Condition ShouldCreateDocumentKnowledgeBase
430442
DocumentSectionsCrawlerScheduleEnabled: !Not [!Equals [!Ref DocumentSectionsCrawlerFrequency, "Manual"]]
443+
HasPermissionsBoundary: !Not [!Equals [!Ref PermissionsBoundaryArn, ""]]
431444

432445

433446
Metadata:
@@ -492,6 +505,10 @@ Metadata:
492505
Parameters:
493506
- BedrockGuardrailId
494507
- BedrockGuardrailVersion
508+
- Label:
509+
default: "Security Configuration"
510+
Parameters:
511+
- PermissionsBoundaryArn
495512
- Label:
496513
default: "General Configuration"
497514
Parameters:
@@ -550,6 +567,8 @@ Metadata:
550567
default: "Bedrock Guardrail Id"
551568
BedrockGuardrailVersion:
552569
default: "Bedrock Guardrail Version"
570+
PermissionsBoundaryArn:
571+
default: "Permissions Boundary ARN"
553572
MaxConcurrentWorkflows:
554573
default: "Maximum Concurrent Workflows"
555574
DataRetentionInDays:
@@ -828,6 +847,7 @@ Resources:
828847
- "s3://${ConfigurationBucket}/config_library/pattern-1/${ConfigPath}/config.yaml"
829848
- ConfigPath: !FindInMap [Pattern1ConfigurationMap, !Ref Pattern1Configuration, ConfigPath]
830849
ConfigLibraryHash: "<CONFIG_LIBRARY_HASH_TOKEN>"
850+
PermissionsBoundaryArn: !Ref PermissionsBoundaryArn
831851
SageMakerA2IReviewPortalURL: !If
832852
- IsPattern1HITLEnabled
833853
- !GetAtt WorkforceURLResource.PortalURL
@@ -867,6 +887,7 @@ Resources:
867887
- "s3://${ConfigurationBucket}/config_library/pattern-2/${ConfigPath}/config.yaml"
868888
- ConfigPath: !FindInMap [Pattern2ConfigurationMap, !Ref Pattern2Configuration, ConfigPath]
869889
ConfigLibraryHash: "<CONFIG_LIBRARY_HASH_TOKEN>"
890+
PermissionsBoundaryArn: !Ref PermissionsBoundaryArn
870891

871892
PATTERN3STACK:
872893
DependsOn:
@@ -901,6 +922,7 @@ Resources:
901922
- "s3://${ConfigurationBucket}/config_library/pattern-3/${ConfigPath}/config.yaml"
902923
- ConfigPath: !FindInMap [Pattern3ConfigurationMap, !Ref Pattern3Configuration, ConfigPath]
903924
ConfigLibraryHash: "<CONFIG_LIBRARY_HASH_TOKEN>"
925+
PermissionsBoundaryArn: !Ref PermissionsBoundaryArn
904926

905927
##########################################################################
906928
# Encryption key
@@ -1671,6 +1693,7 @@ Resources:
16711693
Action: sts:AssumeRole
16721694
ManagedPolicyArns:
16731695
- arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
1696+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
16741697
Policies:
16751698
- PolicyName: DocumentSectionsCrawlerS3Access
16761699
PolicyDocument:
@@ -3504,6 +3527,7 @@ Resources:
35043527
"cognito-identity.amazonaws.com:aud": !Ref IdentityPool
35053528
"ForAnyValue:StringLike":
35063529
"cognito-identity.amazonaws.com:amr": authenticated
3530+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
35073531
Policies:
35083532
- PolicyName: S3
35093533
PolicyDocument:
@@ -3703,6 +3727,7 @@ Resources:
37033727
Action: sts:AssumeRole
37043728
ManagedPolicyArns:
37053729
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
3730+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
37063731
Policies:
37073732
- PolicyName: CognitoClientUpdaterPolicy
37083733
PolicyDocument:
@@ -3799,6 +3824,7 @@ Resources:
37993824
Principal:
38003825
Service: sagemaker.amazonaws.com
38013826
Action: sts:AssumeRole
3827+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
38023828
Policies:
38033829
- PolicyName: A2IFlowDefinitionAccess
38043830
PolicyDocument:
@@ -4046,6 +4072,7 @@ Resources:
40464072
Properties:
40474073
ManagedPolicyArns:
40484074
- arn:aws:iam::aws:policy/service-role/AWSAppSyncPushToCloudWatchLogs
4075+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
40494076
AssumeRolePolicyDocument:
40504077
Version: 2012-10-17
40514078
Statement:
@@ -4209,6 +4236,7 @@ Resources:
42094236
Action: sts:AssumeRole
42104237
ManagedPolicyArns:
42114238
- arn:aws:iam::aws:policy/service-role/AWSAppSyncPushToCloudWatchLogs
4239+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
42124240
Policies:
42134241
- PolicyName: DynamoDBAccess
42144242
PolicyDocument:

0 commit comments

Comments
 (0)