|
| 1 | +Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. |
| 2 | +SPDX-License-Identifier: MIT-0 |
| 3 | + |
| 4 | +# Bank Statement Sample Configuration |
| 5 | + |
| 6 | +This directory contains a specialized configuration for processing bank statements using the GenAI IDP Accelerator. This configuration demonstrates the new nested attribute support including simple attributes, group attributes, and list attributes. |
| 7 | + |
| 8 | +## Pattern Association |
| 9 | + |
| 10 | +**Pattern**: Pattern-2 - Uses Amazon Bedrock with Nova or Claude models for both page classification/grouping and information extraction |
| 11 | + |
| 12 | +## Validation Level |
| 13 | + |
| 14 | +**Level**: 0 - Modest Testing |
| 15 | + |
| 16 | +- **Testing Evidence**: This configuration has been tested with the provided sample document `samples/bank-statement-multipage.pdf`. It demonstrates accurate extraction of account information, address details, and individual transaction records. See [notebook examples](../../../notebooks/usecase-specific-examples/multi-page-bank-statement) |
| 17 | +- **Production Usage**: This configuration serves as a reference implementation for financial document processing workflows |
| 18 | +- **Known Limitations**: Will require adjustments for specialized financial institutions with unique layouts |
| 19 | + |
| 20 | +## Overview |
| 21 | + |
| 22 | +The bank statement sample configuration is designed to handle multi-page bank account statements with complex nested data structures. This configuration showcases the full capabilities of the GenAI IDP Accelerator's nested attribute support. |
| 23 | + |
| 24 | +The configuration processes bank statements to extract: |
| 25 | + |
| 26 | +- Account-level information (simple attributes) |
| 27 | +- Customer address details (group attributes) |
| 28 | +- Individual transaction records (list attributes) |
| 29 | + |
| 30 | +## Key Components |
| 31 | + |
| 32 | +### Document Classes |
| 33 | + |
| 34 | +The configuration defines 1 document class with comprehensive nested attributes: |
| 35 | + |
| 36 | +- **Bank Statement**: Monthly bank account statement |
| 37 | + - **Simple Attributes**: Account Number, Statement Period |
| 38 | + - **Group Attributes**: Account Holder Address (Street Number, Street Name, City, State, ZIP Code) |
| 39 | + - **List Attributes**: Transactions (Date, Description, Amount for each transaction) |
| 40 | + |
| 41 | +### Attribute Types Demonstration |
| 42 | + |
| 43 | +This configuration demonstrates all three attribute types supported by the GenAI IDP Accelerator: |
| 44 | + |
| 45 | +#### 1. Simple Attributes |
| 46 | +- **Account Number**: Primary account identifier (EXACT evaluation) |
| 47 | +- **Statement Period**: Statement period like "January 2024" (FUZZY evaluation with 0.8 threshold) |
| 48 | + |
| 49 | +#### 2. Group Attributes |
| 50 | +- **Account Holder Address**: Nested object structure containing: |
| 51 | + - **Street Number**: House or building number (FUZZY evaluation with 0.9 threshold) |
| 52 | + - **Street Name**: Name of the street (FUZZY evaluation with 0.8 threshold) |
| 53 | + - **City**: City name (FUZZY evaluation with 0.9 threshold) |
| 54 | + - **State**: State abbreviation (EXACT evaluation) |
| 55 | + - **ZIP Code**: 5 or 9 digit postal code (EXACT evaluation) |
| 56 | + |
| 57 | +#### 3. List Attributes |
| 58 | +- **Transactions**: Array of transaction records, each containing: |
| 59 | + - **Date**: Transaction date in MM/DD/YYYY format (FUZZY evaluation with 0.9 threshold) |
| 60 | + - **Description**: Transaction description or merchant name (SEMANTIC evaluation with 0.7 threshold) |
| 61 | + - **Amount**: Transaction amount with positive/negative values (NUMERIC_EXACT evaluation) |
| 62 | + |
| 63 | +### Classification Settings |
| 64 | + |
| 65 | +- **Model**: Amazon Nova Pro |
| 66 | +- **Method**: Text-based holistic classification |
| 67 | +- **Temperature**: 0 (deterministic outputs) |
| 68 | +- **Top-k**: 5 |
| 69 | + |
| 70 | +The classification component analyzes the entire document package to identify bank statement sections and page boundaries. |
| 71 | + |
| 72 | +### Extraction Settings |
| 73 | + |
| 74 | +- **Model**: Amazon Nova Pro |
| 75 | +- **Temperature**: 0 (deterministic outputs) |
| 76 | +- **Top-k**: 5 |
| 77 | +- **Document Image Support**: Uses `{DOCUMENT_IMAGE}` placeholder for multimodal extraction |
| 78 | + |
| 79 | +The extraction component processes each bank statement section to extract structured data including nested address information and transaction lists. |
| 80 | + |
| 81 | +### Assessment Settings |
| 82 | + |
| 83 | +- **Model**: Claude 3.7 Sonnet |
| 84 | +- **Default Confidence Threshold**: 0.9 |
| 85 | +- **Temperature**: 0 (deterministic outputs) |
| 86 | + |
| 87 | +The assessment component evaluates extraction confidence for each attribute, including nested structures, with individual confidence thresholds per attribute type. |
| 88 | + |
| 89 | +### Evaluation Settings |
| 90 | + |
| 91 | +- **Model**: Claude 3 Haiku (for LLM evaluations) |
| 92 | +- **Evaluation Methods**: |
| 93 | + - EXACT: For account numbers, state abbreviations, ZIP codes |
| 94 | + - FUZZY: For names, addresses, dates (with configurable thresholds) |
| 95 | + - SEMANTIC: For transaction descriptions |
| 96 | + - NUMERIC_EXACT: For transaction amounts |
| 97 | + |
| 98 | +## Key Differences from Default Configuration |
| 99 | + |
| 100 | +### 1. Nested Attribute Support |
| 101 | + |
| 102 | +Unlike the default configuration which only uses simple attributes, this configuration demonstrates: |
| 103 | +- **Group attributes** for structured address information |
| 104 | +- **List attributes** for variable-length transaction records |
| 105 | +- **Mixed evaluation methods** tailored to each attribute type |
| 106 | + |
| 107 | +### 2. Financial Document Specialization |
| 108 | + |
| 109 | +- Optimized prompts for bank statement processing |
| 110 | +- Specialized evaluation methods for financial data (NUMERIC_EXACT for amounts) |
| 111 | +- Higher confidence thresholds for critical financial information |
| 112 | + |
| 113 | +### 3. Multimodal Enhancement |
| 114 | + |
| 115 | +- Uses `{DOCUMENT_IMAGE}` placeholder for improved extraction accuracy |
| 116 | +- Combines text and visual analysis for complex document layouts |
| 117 | +- Enhanced assessment capabilities with image-based confidence evaluation |
| 118 | + |
| 119 | +## Sample Documents |
| 120 | + |
| 121 | +This configuration includes the following sample document: |
| 122 | + |
| 123 | +- `samples/bank-statement-multipage.pdf`: A multi-page bank statement demonstrating account information, customer address, and multiple transaction records across several pages |
| 124 | + |
| 125 | +## Performance Metrics |
| 126 | + |
| 127 | +Based on testing with the provided sample document: |
| 128 | + |
| 129 | +| Metric | Value | Notes | |
| 130 | +|--------|-------|-------| |
| 131 | +| Classification Accuracy | >95% | Accurate identification of bank statement sections | |
| 132 | +| Simple Attribute Accuracy | >90% | Account numbers and periods extracted reliably | |
| 133 | +| Group Attribute Accuracy | >85% | Address components extracted with high fidelity | |
| 134 | +| List Attribute Accuracy | >90% | Transaction details extracted accurately per item | |
| 135 | + |
| 136 | +## Usage Instructions |
| 137 | + |
| 138 | +To use this bank statement configuration: |
| 139 | + |
| 140 | +1. **Deploy with Sample**: Upload `samples/bank-statement-multipage.pdf` to test the configuration |
| 141 | +2. **Review Results**: Examine the extracted nested data structure in the UI |
| 142 | +3. **Evaluate Performance**: Use the evaluation framework to compare against baseline results |
| 143 | +4. **Customize**: Modify attribute definitions for your specific bank statement formats |
| 144 | + |
| 145 | +## Customization Guidance |
| 146 | + |
| 147 | +### Adding New Attributes |
| 148 | + |
| 149 | +To add new simple attributes: |
| 150 | +```yaml |
| 151 | +- name: "Routing Number" |
| 152 | + description: "Bank routing number" |
| 153 | + attributeType: simple |
| 154 | + evaluation_method: EXACT |
| 155 | +``` |
| 156 | +
|
| 157 | +To extend the address group: |
| 158 | +```yaml |
| 159 | +groupAttributes: |
| 160 | + - name: "Country" |
| 161 | + description: "Country name" |
| 162 | + evaluation_method: EXACT |
| 163 | +``` |
| 164 | +
|
| 165 | +To add new transaction fields: |
| 166 | +```yaml |
| 167 | +itemAttributes: |
| 168 | + - name: "Category" |
| 169 | + description: "Transaction category" |
| 170 | + evaluation_method: SEMANTIC |
| 171 | + evaluation_threshold: '0.8' |
| 172 | +``` |
| 173 | +
|
| 174 | +### Modifying Evaluation Methods |
| 175 | +
|
| 176 | +Adjust evaluation methods and thresholds based on your accuracy requirements: |
| 177 | +- Use `EXACT` for critical identifiers |
| 178 | +- Use `FUZZY` with high thresholds (0.8-0.9) for names and addresses |
| 179 | +- Use `SEMANTIC` for descriptive fields |
| 180 | +- Use `NUMERIC_EXACT` for financial amounts |
| 181 | + |
| 182 | +## Contributors |
| 183 | + |
| 184 | +- GenAI IDP Accelerator Team |
| 185 | + |
| 186 | +## Version History |
| 187 | + |
| 188 | +- v1.0 (2025-06-19): Initial release demonstrating nested attribute support for bank statements |
0 commit comments