aws-solutions-library-samples
diff --git a/‎CHANGELOG.md‎
Lines changed: 30 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 4 additions & 2 deletions b/‎README.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎VERSION‎
Lines changed: 1 addition & 1 deletion b/‎VERSION‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/deployment.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/deployment.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/evaluation.md‎
Lines changed: 193 additions & 0 deletions b/‎docs/evaluation.md‎
Lines changed: 193 additions & 0 deletions
diff --git a/‎lib/idp_common_pkg/idp_common/agents/analytics/schema_provider.py‎
Lines changed: 8 additions & 0 deletions b/‎lib/idp_common_pkg/idp_common/agents/analytics/schema_provider.py‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎lib/idp_common_pkg/idp_common/appsync/mutations.py‎
Lines changed: 1 addition & 0 deletions b/‎lib/idp_common_pkg/idp_common/appsync/mutations.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎lib/idp_common_pkg/idp_common/appsync/service.py‎
Lines changed: 11 additions & 3 deletions b/‎lib/idp_common_pkg/idp_common/appsync/service.py‎
Lines changed: 11 additions & 3 deletions
diff --git a/‎lib/idp_common_pkg/idp_common/assessment/granular_service.py‎
Lines changed: 6 additions & 17 deletions b/‎lib/idp_common_pkg/idp_common/assessment/granular_service.py‎
Lines changed: 6 additions & 17 deletions
@@ -5,6 +5,36 @@ SPDX-License-Identifier: MIT-0
 
 ## [Unreleased]
 
+## [0.4.5]
+
+### Added
+
+- **Document Split Classification Metrics for Evaluating Page-Level Classification and Document Segmentation**
+  - Added `DocSplitClassificationMetrics` class for comprehensive evaluation of document splitting and classification accuracy
+  - **Three Accuracy Types**: Page-level classification accuracy, split accuracy without order consideration, and split accuracy with exact page order matching
+  - **Visual Reporting**: Generates markdown reports with color-coded indicators (🟢 Excellent, 🟡 Good, 🟠 Fair, 🔴 Poor), progress bars, and detailed section analysis tables
+  - **Automatic Integration**: Integrates with evaluation service when ground truth and predicted sections are available
+  - **Documentation**: Guide in `lib/idp_common_pkg/idp_common/evaluation/README.md` with usage examples, metric explanations, and best practices
+
+### Fixed
+
+- **Evaluation Output URI Fields Lost Across All Patterns - causing (a) missing Page Text Confidence content in UI, (2) failed Assessment step when reprocessing document after editing classes (No module named 'fitz')**
+  - Fixed bug where `text_confidence_uri` was being set to null in evaluation output for all three patterns
+  - Root cause: AppSync service `_appsync_to_document()` method incorrectly mapped page URIs, and evaluation functions overwrote correct documents with corrupted AppSync responses
+
+- **UI: Metering Data Not Displayed During Document Processing**
+  - Fixed UI subscription query missing `Metering` field, preventing real-time cost display
+  - Users can now see estimated costs accumulate in real-time without manual page refresh
+
+- **UI: Estimated Cost Panel Arrow Misalignment**
+  - Fixed expand/contract arrow displaying above "Estimated Cost" heading
+
+### Templates
+   - us-west-2: `https://s3.us-west-2.amazonaws.com/aws-ml-blog-us-west-2/artifacts/genai-idp/idp-main_0.4.5.yaml`
+   - us-east-1: `https://s3.us-east-1.amazonaws.com/aws-ml-blog-us-east-1/artifacts/genai-idp/idp-main_0.4.5.yaml`
+   - eu-central-1: `https://s3.eu-central-1.amazonaws.com/aws-ml-blog-eu-central-1/artifacts/genai-idp/idp-main_0.4.5.yaml`
+
+
 ## [0.4.4]
 
 ### Added
 
@@ -34,6 +34,8 @@ White-glove customization, deployment, and integration support for production us
 
 **Prefer AWS CDK?** This solution is also available as [GenAI IDP Accelerator for AWS CDK](https://github.com/cdklabs/genai-idp), providing the same functional capabilities through AWS CDK constructs for customers who prefer Infrastructure-as-Code with CDK.
 
+**Prefer Terraform?** This solution is also available as [GenAI IDP Terraform](https://github.com/awslabs/genai-idp-terraform), providing the same functional capabilities as a Terraform module that integrates with existing infrastructure and supports customization through module variables.
+
 ## Key Features
 
 - **Serverless Architecture**: Built entirely on AWS serverless technologies including Lambda, Step Functions, SQS, and DynamoDB
@@ -121,7 +123,7 @@ idp-cli download-results \
     --output-dir ./results/
 ```
 
-**See [IDP CLI Documentation](./idp_cli/README.md)** for:
+**See [IDP CLI Documentation](./docs/idp-cli.md)** for:
 - CLI-based stack deployment and updates
 - Batch document processing
 - Complete evaluation workflows with baselines
@@ -162,7 +164,7 @@ For detailed deployment and testing instructions, see the [Deployment Guide](./d
 
 - [Architecture](./docs/architecture.md) - Detailed component architecture and data flow
 - [Deployment](./docs/deployment.md) - Build, publish, deploy, and test instructions
-- [IDP CLI](./idp_cli/README.md) - Command line interface for batch processing and evaluation workflows
+- [IDP CLI](./docs/idp-cli.md) - Command line interface for batch processing and evaluation workflows
 - [Web UI](./docs/web-ui.md) - Web interface features and usage
 - [Agent Analysis](./docs/agent-analysis.md) - Natural language analytics and data visualization feature
 - [Custom MCP Agent](./docs/custom-MCP-agent.md) - Integrating external MCP servers for custom tools and capabilities
 
@@ -1 +1 @@
-0.4.5-wip1
+0.4.5-wip4
@@ -98,7 +98,7 @@ idp-cli deploy \
 - Integration with CI/CD pipelines
 - No manual console clicking required
 
-**For complete CLI documentation**, see [IDP CLI Documentation](../idp_cli/README.md).
+**For complete CLI documentation**, see [IDP CLI Documentation](./idp-cli.md).
 
 ---
 
@@ -416,7 +416,7 @@ idp-cli download-results \
 cat ./eval-results/eval-test/invoice.pdf/evaluation/report.md
 ```
 
-**For complete evaluation workflow documentation**, see [IDP CLI - Complete Evaluation Workflow](../idp_cli/README.md#complete-evaluation-workflow).
+**For complete evaluation workflow documentation**, see [IDP CLI - Complete Evaluation Workflow](./idp-cli.md#complete-evaluation-workflow).
 
 ---
 
 
@@ -649,6 +649,132 @@ This multi-level analysis helps identify specific areas for improvement, such as
 - Performance degradation with larger transaction lists
 - Specific list item attributes that frequently fail evaluation
 
+## Document Split Classification Metrics
+
+In addition to extraction accuracy evaluation, the framework now includes document split classification metrics to assess how accurately documents are classified and split into sections. This provides a comprehensive evaluation of both **what** was extracted and **how** documents were classified and organized.
+
+### Overview
+
+Document split classification metrics evaluate three key aspects:
+
+1. **Page-Level Classification**: Accuracy of classifying individual pages
+2. **Document Split Grouping**: Accuracy of grouping pages into sections
+3. **Page Order Preservation**: Accuracy of maintaining correct page order within sections
+
+These metrics are calculated by comparing the `document_class` and `split_document.page_indices` fields in each section's result JSON.
+
+### Three Types of Accuracy
+
+#### 1. Page Level Accuracy
+**Purpose**: Measures how accurately individual pages are classified, regardless of how they're grouped into sections.
+
+**Calculation**: For each page index across all sections, compare the expected `document_class` with the predicted `document_class`.
+
+**Use Case**: Identify if pages are being assigned to the correct document types.
+
+**Example**:
+```
+Expected: Page 0 → Invoice, Page 1 → Invoice, Page 2 → Receipt
+Predicted: Page 0 → Invoice, Page 1 → Receipt, Page 2 → Receipt
+Result: 2/3 pages correct = 66.7% accuracy
+```
+
+#### 2. Split Accuracy (Without Page Order)
+**Purpose**: Measures whether pages are correctly grouped into sections with the right classification, regardless of page order.
+
+**Calculation**: For each expected section, check if any predicted section has:
+- The same set of page indices (as a set, order doesn't matter)
+- The same `document_class`
+
+Both conditions must be met for a section to be marked as correct.
+
+**Use Case**: Verify that pages belonging together are kept together, even if their order might vary.
+
+**Example**:
+```
+Expected Section A: Class=Invoice, Pages={0, 1, 2}
+Predicted Section X: Class=Invoice, Pages={2, 0, 1}  ✅ Match (same set)
+
+Expected Section B: Class=Receipt, Pages={3, 4}
+Predicted Section Y: Class=Receipt, Pages={3, 4}  ✅ Match
+
+Expected Section C: Class=Payslip, Pages={5}
+Predicted Section Z: Class=Invoice, Pages={5}  ❌ No match (wrong class)
+
+Result: 2/3 sections correct = 66.7% accuracy
+```
+
+#### 3. Split Accuracy (With Page Order)
+**Purpose**: Most strict evaluation - measures correct grouping with exact page order preservation.
+
+**Calculation**: Same as "Without Order" but the page indices list must match exactly (same pages, same order).
+
+**Use Case**: Verify that multi-page documents maintain correct page sequence.
+
+**Example**:
+```
+Expected Section A: Class=Invoice, Pages=[0, 1, 2]
+Predicted Section X: Class=Invoice, Pages=[0, 1, 2]  ✅ Match (exact order)
+
+Expected Section B: Class=Receipt, Pages=[3, 4]
+Predicted Section Y: Class=Receipt, Pages=[4, 3]  ❌ No match (wrong order)
+
+Result: 1/2 sections correct = 50% accuracy
+```
+
+### Report Structure
+
+Document split metrics are integrated into the unified evaluation report:
+
+```markdown
+# Evaluation Report
+
+## Summary
+**Document Split Classification:**
+- Page Level Accuracy: 🟢 85/100 pages [████████████████░░░░] 85%
+- Split Accuracy (Without Order): 🟡 15/20 sections [███████████████░░░░░] 75%
+- Split Accuracy (With Order): 🟠 12/20 sections [████████████░░░░░░░░] 60%
+
+**Document Extraction:**
+- Match Rate: 🟢 145/150 attributes matched [███████████████████░] 97%
+- Precision: 0.97 | Recall: 0.95 | F1 Score: 🟢 0.96
+
+## Overall Metrics
+
+### Document Split Classification Metrics
+| Metric | Value | Rating |
+| page_level_accuracy | 0.8500 | 🟡 Good |
+| split_accuracy_without_order | 0.7500 | 🟡 Good |
+| split_accuracy_with_order | 0.6000 | 🟠 Fair |
+
+### Document Extraction Metrics
+| Metric | Value | Rating |
+| precision | 0.9700 | 🟢 Excellent |
+| recall | 0.9500 | 🟢 Excellent |
+| f1_score | 0.9600 | 🟢 Excellent |
+```
+
+### Data Structure Requirements
+
+For doc split metrics to be calculated, each section's result JSON must include:
+
+```json
+{
+  "document_class": {
+    "type": "Invoice"
+  },
+  "split_document": {
+    "page_indices": [0, 1]
+  },
+  "inference_result": {
+    // Extracted attributes
+  }
+}
+```
+
+- Page indices are **0-based** and **may be non-sequential**
+- Missing or null fields are handled gracefully (treated as "Unknown" class or empty page list)
+
 ## Setup and Usage
 
 ### Step 1: Creating Baseline Data
@@ -840,12 +966,23 @@ The evaluation framework includes comprehensive monitoring through CloudWatch me
 
 The framework calculates the following detailed metrics for each document and section:
 
+**Extraction Accuracy Metrics:**
 - **Precision**: Accuracy of positive predictions (TP / (TP + FP))
 - **Recall**: Coverage of actual positive cases (TP / (TP + FN))
 - **F1 Score**: Harmonic mean of precision and recall
 - **Accuracy**: Overall correctness (TP + TN) / (TP + TN + FP + FN)
 - **False Alarm Rate**: Rate of false positives among negatives (FP / (FP + TN))
 - **False Discovery Rate**: Rate of false positives among positive predictions (FP / (FP + TP))
+- **Weighted Overall Score**: Field-importance-weighted aggregate score
+
+**Document Split Classification Metrics:**
+- **Page Level Accuracy**: Classification accuracy for individual pages
+- **Split Accuracy (Without Order)**: Correct page grouping regardless of order
+- **Split Accuracy (With Order)**: Correct page grouping with exact order
+- **Total Pages**: Total number of pages evaluated
+- **Total Splits**: Total number of document sections/splits evaluated
+- **Correctly Classified Pages**: Count of pages with correct classification
+- **Correctly Split Sections**: Count of sections with correct page grouping
 
 The evaluation also tracks different evaluation statuses:
 - **RUNNING**: Evaluation is in progress
@@ -866,11 +1003,24 @@ The evaluation framework automatically saves detailed metrics to an AWS Glue dat
 
 #### 1. document_evaluations
 Stores document-level metrics including:
+
+**Extraction Metrics:**
 - Document ID, input key, evaluation date
 - Overall accuracy, precision, recall, F1 score
 - False alarm rate, false discovery rate
+- Weighted overall score
 - Execution time performance metrics
 
+**Document Split Classification Metrics:**
+- Page level accuracy (double)
+- Split accuracy without order (double)
+- Split accuracy with order (double)
+- Total pages (int)
+- Total splits (int)
+- Correctly classified pages (int)
+- Correctly split without order (int)
+- Correctly split with order (int)
+
 #### 2. section_evaluations  
 Stores section-level metrics including:
 - Document ID, section ID, section type
@@ -932,6 +1082,49 @@ SELECT section_type,
 FROM "your-database-name".section_evaluations
 GROUP BY section_type
 ORDER BY avg_accuracy DESC;
+
+-- Example: Query doc split classification performance
+SELECT document_id,
+       page_level_accuracy,
+       split_accuracy_without_order,
+       split_accuracy_with_order,
+       total_pages,
+       total_splits,
+       evaluation_date
+FROM "your-database-name".document_evaluations
+WHERE page_level_accuracy < 0.9
+ORDER BY page_level_accuracy ASC;
+
+-- Example: Compare doc split vs extraction accuracy
+SELECT 
+  AVG(page_level_accuracy) as avg_page_classification_accuracy,
+  AVG(split_accuracy_without_order) as avg_split_grouping_accuracy,
+  AVG(precision) as avg_extraction_precision,
+  AVG(recall) as avg_extraction_recall,
+  AVG(f1_score) as avg_extraction_f1
+FROM "your-database-name".document_evaluations
+WHERE evaluation_date >= current_date - interval '7' day;
+
+-- Example: Identify documents with page classification issues
+SELECT document_id,
+       total_pages,
+       correctly_classified_pages,
+       page_level_accuracy,
+       ROUND((total_pages - correctly_classified_pages), 0) as misclassified_pages
+FROM "your-database-name".document_evaluations
+WHERE page_level_accuracy < 1.0
+ORDER BY misclassified_pages DESC;
+
+-- Example: Analyze split accuracy trends over time
+SELECT 
+  DATE_TRUNC('day', evaluation_date) as eval_day,
+  COUNT(*) as documents_evaluated,
+  AVG(split_accuracy_without_order) as avg_split_accuracy_unordered,
+  AVG(split_accuracy_with_order) as avg_split_accuracy_ordered
+FROM "your-database-name".document_evaluations
+WHERE evaluation_date >= current_date - interval '30' day
+GROUP BY DATE_TRUNC('day', evaluation_date)
+ORDER BY eval_day DESC;
 ```
 
 ### Analytics Notebook
 
@@ -131,6 +131,14 @@ def get_evaluation_tables_description() -> str:
 - `false_discovery_rate` (double): False discovery rate (0-1)
 - `weighted_overall_score` (double): Weighted overall score (0-1)
 - `execution_time` (double): Time taken to evaluate (seconds)
+- `page_level_accuracy` (double): Page-level classification accuracy (0-1)
+- `split_accuracy_without_order` (double): Document split accuracy without considering order (0-1)
+- `split_accuracy_with_order` (double): Document split accuracy with order considered (0-1)
+- `total_pages` (int): Total number of pages in the document
+- `total_splits` (int): Total number of document splits/sections
+- `correctly_classified_pages` (int): Number of pages correctly classified
+- `correctly_split_without_order` (int): Number of correctly split sections (unordered)
+- `correctly_split_with_order` (int): Number of correctly split sections (ordered)
 
 **Partitioned by**: date (YYYY-MM-DD format)
 
 
@@ -46,6 +46,7 @@
             Class
             ImageUri
             TextUri
+            TextConfidenceUri
         }
         Metering
         EvaluationReportUri
 
@@ -306,11 +306,19 @@ def _appsync_to_document(self, appsync_data: Dict[str, Any]) -> Document:
         if pages_data is not None:  # Ensure pages_data is not None before iterating
             for page_data in pages_data:
                 page_id = str(page_data.get("Id"))
+
+                # Get URI values and handle empty strings
+                # Note: TextUri in AppSync schema contains the parsed text URI
+                text_uri = page_data.get("TextUri") or None
+                text_conf_uri = page_data.get("TextConfidenceUri") or None
+                image_uri = page_data.get("ImageUri") or None
+
                 doc.pages[page_id] = Page(
                     page_id=page_id,
-                    image_uri=page_data.get("ImageUri"),
-                    raw_text_uri=page_data.get("TextUri"),
-                    text_confidence_uri=page_data.get("TextConfidenceUri"),
+                    image_uri=image_uri,
+                    raw_text_uri=text_uri,  # TextUri maps to both for backward compatibility
+                    parsed_text_uri=text_uri,  # Fix: TextUri contains parsed text URI
+                    text_confidence_uri=text_conf_uri,  # Fix: Convert empty strings to None
                     classification=page_data.get("Class"),
                 )
 
 
@@ -1288,23 +1288,12 @@ def _get_text_confidence_data(self, page) -> str:
                 )
                 raise
 
-        # Fallback: use raw OCR data if text confidence is not available (for backward compatibility)
-        if page.raw_text_uri:
-            try:
-                from idp_common.ocr.service import OcrService
-
-                ocr_service = OcrService()
-                raw_ocr_data = s3.get_json_content(page.raw_text_uri)
-                text_confidence_data = ocr_service._generate_text_confidence_data(
-                    raw_ocr_data
-                )
-                return json.dumps(text_confidence_data, indent=2)
-            except Exception as e:
-                logger.warning(
-                    f"Failed to generate text confidence data for page {page.page_id}: {str(e)}"
-                )
-                raise
-        return ""
+        # Text confidence URI not available
+        logger.error(
+            f"Text confidence data unavailable for page {page.page_id}. "
+            f"The text_confidence_uri field is missing or empty."
+        )
+        return "Text Confidence Data Unavailable"
 
     def _convert_bbox_to_geometry(
         self, bbox_coords: List[float], page_num: int
Original file line number	Diff line number	Diff line change
`@@ -46,6 +46,7 @@`
`46`	`46`	`Class`
`47`	`47`	`ImageUri`
`48`	`48`	`TextUri`
	`49`	`+ TextConfidenceUri`
`49`	`50`	`}`
`50`	`51`	`Metering`
`51`	`52`	`EvaluationReportUri`