Skip to content

Commit ee3c034

Browse files
author
Taniya Mathur
committed
Merge develop branch updates
2 parents e595938 + 0d2e5a9 commit ee3c034

File tree

30 files changed

+3204
-4660
lines changed

30 files changed

+3204
-4660
lines changed

CHANGELOG.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,36 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
## [0.4.5]
9+
10+
### Added
11+
12+
- **Document Split Classification Metrics for Evaluating Page-Level Classification and Document Segmentation**
13+
- Added `DocSplitClassificationMetrics` class for comprehensive evaluation of document splitting and classification accuracy
14+
- **Three Accuracy Types**: Page-level classification accuracy, split accuracy without order consideration, and split accuracy with exact page order matching
15+
- **Visual Reporting**: Generates markdown reports with color-coded indicators (🟢 Excellent, 🟡 Good, 🟠 Fair, 🔴 Poor), progress bars, and detailed section analysis tables
16+
- **Automatic Integration**: Integrates with evaluation service when ground truth and predicted sections are available
17+
- **Documentation**: Guide in `lib/idp_common_pkg/idp_common/evaluation/README.md` with usage examples, metric explanations, and best practices
18+
19+
### Fixed
20+
21+
- **Evaluation Output URI Fields Lost Across All Patterns - causing (a) missing Page Text Confidence content in UI, (2) failed Assessment step when reprocessing document after editing classes (No module named 'fitz')**
22+
- Fixed bug where `text_confidence_uri` was being set to null in evaluation output for all three patterns
23+
- Root cause: AppSync service `_appsync_to_document()` method incorrectly mapped page URIs, and evaluation functions overwrote correct documents with corrupted AppSync responses
24+
25+
- **UI: Metering Data Not Displayed During Document Processing**
26+
- Fixed UI subscription query missing `Metering` field, preventing real-time cost display
27+
- Users can now see estimated costs accumulate in real-time without manual page refresh
28+
29+
- **UI: Estimated Cost Panel Arrow Misalignment**
30+
- Fixed expand/contract arrow displaying above "Estimated Cost" heading
31+
32+
### Templates
33+
- us-west-2: `https://s3.us-west-2.amazonaws.com/aws-ml-blog-us-west-2/artifacts/genai-idp/idp-main_0.4.5.yaml`
34+
- us-east-1: `https://s3.us-east-1.amazonaws.com/aws-ml-blog-us-east-1/artifacts/genai-idp/idp-main_0.4.5.yaml`
35+
- eu-central-1: `https://s3.eu-central-1.amazonaws.com/aws-ml-blog-eu-central-1/artifacts/genai-idp/idp-main_0.4.5.yaml`
36+
37+
838
## [0.4.4]
939

1040
### Added

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ White-glove customization, deployment, and integration support for production us
3434

3535
**Prefer AWS CDK?** This solution is also available as [GenAI IDP Accelerator for AWS CDK](https://github.com/cdklabs/genai-idp), providing the same functional capabilities through AWS CDK constructs for customers who prefer Infrastructure-as-Code with CDK.
3636

37+
**Prefer Terraform?** This solution is also available as [GenAI IDP Terraform](https://github.com/awslabs/genai-idp-terraform), providing the same functional capabilities as a Terraform module that integrates with existing infrastructure and supports customization through module variables.
38+
3739
## Key Features
3840

3941
- **Serverless Architecture**: Built entirely on AWS serverless technologies including Lambda, Step Functions, SQS, and DynamoDB
@@ -121,7 +123,7 @@ idp-cli download-results \
121123
--output-dir ./results/
122124
```
123125

124-
**See [IDP CLI Documentation](./idp_cli/README.md)** for:
126+
**See [IDP CLI Documentation](./docs/idp-cli.md)** for:
125127
- CLI-based stack deployment and updates
126128
- Batch document processing
127129
- Complete evaluation workflows with baselines
@@ -162,7 +164,7 @@ For detailed deployment and testing instructions, see the [Deployment Guide](./d
162164

163165
- [Architecture](./docs/architecture.md) - Detailed component architecture and data flow
164166
- [Deployment](./docs/deployment.md) - Build, publish, deploy, and test instructions
165-
- [IDP CLI](./idp_cli/README.md) - Command line interface for batch processing and evaluation workflows
167+
- [IDP CLI](./docs/idp-cli.md) - Command line interface for batch processing and evaluation workflows
166168
- [Web UI](./docs/web-ui.md) - Web interface features and usage
167169
- [Agent Analysis](./docs/agent-analysis.md) - Natural language analytics and data visualization feature
168170
- [Custom MCP Agent](./docs/custom-MCP-agent.md) - Integrating external MCP servers for custom tools and capabilities

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.4.5-wip1
1+
0.4.5-wip4

docs/deployment.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ idp-cli deploy \
9898
- Integration with CI/CD pipelines
9999
- No manual console clicking required
100100

101-
**For complete CLI documentation**, see [IDP CLI Documentation](../idp_cli/README.md).
101+
**For complete CLI documentation**, see [IDP CLI Documentation](./idp-cli.md).
102102

103103
---
104104

@@ -416,7 +416,7 @@ idp-cli download-results \
416416
cat ./eval-results/eval-test/invoice.pdf/evaluation/report.md
417417
```
418418

419-
**For complete evaluation workflow documentation**, see [IDP CLI - Complete Evaluation Workflow](../idp_cli/README.md#complete-evaluation-workflow).
419+
**For complete evaluation workflow documentation**, see [IDP CLI - Complete Evaluation Workflow](./idp-cli.md#complete-evaluation-workflow).
420420

421421
---
422422

docs/evaluation.md

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -649,6 +649,132 @@ This multi-level analysis helps identify specific areas for improvement, such as
649649
- Performance degradation with larger transaction lists
650650
- Specific list item attributes that frequently fail evaluation
651651
652+
## Document Split Classification Metrics
653+
654+
In addition to extraction accuracy evaluation, the framework now includes document split classification metrics to assess how accurately documents are classified and split into sections. This provides a comprehensive evaluation of both **what** was extracted and **how** documents were classified and organized.
655+
656+
### Overview
657+
658+
Document split classification metrics evaluate three key aspects:
659+
660+
1. **Page-Level Classification**: Accuracy of classifying individual pages
661+
2. **Document Split Grouping**: Accuracy of grouping pages into sections
662+
3. **Page Order Preservation**: Accuracy of maintaining correct page order within sections
663+
664+
These metrics are calculated by comparing the `document_class` and `split_document.page_indices` fields in each section's result JSON.
665+
666+
### Three Types of Accuracy
667+
668+
#### 1. Page Level Accuracy
669+
**Purpose**: Measures how accurately individual pages are classified, regardless of how they're grouped into sections.
670+
671+
**Calculation**: For each page index across all sections, compare the expected `document_class` with the predicted `document_class`.
672+
673+
**Use Case**: Identify if pages are being assigned to the correct document types.
674+
675+
**Example**:
676+
```
677+
Expected: Page 0 → Invoice, Page 1 → Invoice, Page 2 → Receipt
678+
Predicted: Page 0 → Invoice, Page 1 → Receipt, Page 2 → Receipt
679+
Result: 2/3 pages correct = 66.7% accuracy
680+
```
681+
682+
#### 2. Split Accuracy (Without Page Order)
683+
**Purpose**: Measures whether pages are correctly grouped into sections with the right classification, regardless of page order.
684+
685+
**Calculation**: For each expected section, check if any predicted section has:
686+
- The same set of page indices (as a set, order doesn't matter)
687+
- The same `document_class`
688+
689+
Both conditions must be met for a section to be marked as correct.
690+
691+
**Use Case**: Verify that pages belonging together are kept together, even if their order might vary.
692+
693+
**Example**:
694+
```
695+
Expected Section A: Class=Invoice, Pages={0, 1, 2}
696+
Predicted Section X: Class=Invoice, Pages={2, 0, 1} ✅ Match (same set)
697+
698+
Expected Section B: Class=Receipt, Pages={3, 4}
699+
Predicted Section Y: Class=Receipt, Pages={3, 4} ✅ Match
700+
701+
Expected Section C: Class=Payslip, Pages={5}
702+
Predicted Section Z: Class=Invoice, Pages={5} ❌ No match (wrong class)
703+
704+
Result: 2/3 sections correct = 66.7% accuracy
705+
```
706+
707+
#### 3. Split Accuracy (With Page Order)
708+
**Purpose**: Most strict evaluation - measures correct grouping with exact page order preservation.
709+
710+
**Calculation**: Same as "Without Order" but the page indices list must match exactly (same pages, same order).
711+
712+
**Use Case**: Verify that multi-page documents maintain correct page sequence.
713+
714+
**Example**:
715+
```
716+
Expected Section A: Class=Invoice, Pages=[0, 1, 2]
717+
Predicted Section X: Class=Invoice, Pages=[0, 1, 2] ✅ Match (exact order)
718+
719+
Expected Section B: Class=Receipt, Pages=[3, 4]
720+
Predicted Section Y: Class=Receipt, Pages=[4, 3] ❌ No match (wrong order)
721+
722+
Result: 1/2 sections correct = 50% accuracy
723+
```
724+
725+
### Report Structure
726+
727+
Document split metrics are integrated into the unified evaluation report:
728+
729+
```markdown
730+
# Evaluation Report
731+
732+
## Summary
733+
**Document Split Classification:**
734+
- Page Level Accuracy: 🟢 85/100 pages [████████████████░░░░] 85%
735+
- Split Accuracy (Without Order): 🟡 15/20 sections [███████████████░░░░░] 75%
736+
- Split Accuracy (With Order): 🟠 12/20 sections [████████████░░░░░░░░] 60%
737+
738+
**Document Extraction:**
739+
- Match Rate: 🟢 145/150 attributes matched [███████████████████░] 97%
740+
- Precision: 0.97 | Recall: 0.95 | F1 Score: 🟢 0.96
741+
742+
## Overall Metrics
743+
744+
### Document Split Classification Metrics
745+
| Metric | Value | Rating |
746+
| page_level_accuracy | 0.8500 | 🟡 Good |
747+
| split_accuracy_without_order | 0.7500 | 🟡 Good |
748+
| split_accuracy_with_order | 0.6000 | 🟠 Fair |
749+
750+
### Document Extraction Metrics
751+
| Metric | Value | Rating |
752+
| precision | 0.9700 | 🟢 Excellent |
753+
| recall | 0.9500 | 🟢 Excellent |
754+
| f1_score | 0.9600 | 🟢 Excellent |
755+
```
756+
757+
### Data Structure Requirements
758+
759+
For doc split metrics to be calculated, each section's result JSON must include:
760+
761+
```json
762+
{
763+
"document_class": {
764+
"type": "Invoice"
765+
},
766+
"split_document": {
767+
"page_indices": [0, 1]
768+
},
769+
"inference_result": {
770+
// Extracted attributes
771+
}
772+
}
773+
```
774+
775+
- Page indices are **0-based** and **may be non-sequential**
776+
- Missing or null fields are handled gracefully (treated as "Unknown" class or empty page list)
777+
652778
## Setup and Usage
653779

654780
### Step 1: Creating Baseline Data
@@ -840,12 +966,23 @@ The evaluation framework includes comprehensive monitoring through CloudWatch me
840966

841967
The framework calculates the following detailed metrics for each document and section:
842968

969+
**Extraction Accuracy Metrics:**
843970
- **Precision**: Accuracy of positive predictions (TP / (TP + FP))
844971
- **Recall**: Coverage of actual positive cases (TP / (TP + FN))
845972
- **F1 Score**: Harmonic mean of precision and recall
846973
- **Accuracy**: Overall correctness (TP + TN) / (TP + TN + FP + FN)
847974
- **False Alarm Rate**: Rate of false positives among negatives (FP / (FP + TN))
848975
- **False Discovery Rate**: Rate of false positives among positive predictions (FP / (FP + TP))
976+
- **Weighted Overall Score**: Field-importance-weighted aggregate score
977+
978+
**Document Split Classification Metrics:**
979+
- **Page Level Accuracy**: Classification accuracy for individual pages
980+
- **Split Accuracy (Without Order)**: Correct page grouping regardless of order
981+
- **Split Accuracy (With Order)**: Correct page grouping with exact order
982+
- **Total Pages**: Total number of pages evaluated
983+
- **Total Splits**: Total number of document sections/splits evaluated
984+
- **Correctly Classified Pages**: Count of pages with correct classification
985+
- **Correctly Split Sections**: Count of sections with correct page grouping
849986

850987
The evaluation also tracks different evaluation statuses:
851988
- **RUNNING**: Evaluation is in progress
@@ -866,11 +1003,24 @@ The evaluation framework automatically saves detailed metrics to an AWS Glue dat
8661003

8671004
#### 1. document_evaluations
8681005
Stores document-level metrics including:
1006+
1007+
**Extraction Metrics:**
8691008
- Document ID, input key, evaluation date
8701009
- Overall accuracy, precision, recall, F1 score
8711010
- False alarm rate, false discovery rate
1011+
- Weighted overall score
8721012
- Execution time performance metrics
8731013

1014+
**Document Split Classification Metrics:**
1015+
- Page level accuracy (double)
1016+
- Split accuracy without order (double)
1017+
- Split accuracy with order (double)
1018+
- Total pages (int)
1019+
- Total splits (int)
1020+
- Correctly classified pages (int)
1021+
- Correctly split without order (int)
1022+
- Correctly split with order (int)
1023+
8741024
#### 2. section_evaluations
8751025
Stores section-level metrics including:
8761026
- Document ID, section ID, section type
@@ -932,6 +1082,49 @@ SELECT section_type,
9321082
FROM "your-database-name".section_evaluations
9331083
GROUP BY section_type
9341084
ORDER BY avg_accuracy DESC;
1085+
1086+
-- Example: Query doc split classification performance
1087+
SELECT document_id,
1088+
page_level_accuracy,
1089+
split_accuracy_without_order,
1090+
split_accuracy_with_order,
1091+
total_pages,
1092+
total_splits,
1093+
evaluation_date
1094+
FROM "your-database-name".document_evaluations
1095+
WHERE page_level_accuracy < 0.9
1096+
ORDER BY page_level_accuracy ASC;
1097+
1098+
-- Example: Compare doc split vs extraction accuracy
1099+
SELECT
1100+
AVG(page_level_accuracy) as avg_page_classification_accuracy,
1101+
AVG(split_accuracy_without_order) as avg_split_grouping_accuracy,
1102+
AVG(precision) as avg_extraction_precision,
1103+
AVG(recall) as avg_extraction_recall,
1104+
AVG(f1_score) as avg_extraction_f1
1105+
FROM "your-database-name".document_evaluations
1106+
WHERE evaluation_date >= current_date - interval '7' day;
1107+
1108+
-- Example: Identify documents with page classification issues
1109+
SELECT document_id,
1110+
total_pages,
1111+
correctly_classified_pages,
1112+
page_level_accuracy,
1113+
ROUND((total_pages - correctly_classified_pages), 0) as misclassified_pages
1114+
FROM "your-database-name".document_evaluations
1115+
WHERE page_level_accuracy < 1.0
1116+
ORDER BY misclassified_pages DESC;
1117+
1118+
-- Example: Analyze split accuracy trends over time
1119+
SELECT
1120+
DATE_TRUNC('day', evaluation_date) as eval_day,
1121+
COUNT(*) as documents_evaluated,
1122+
AVG(split_accuracy_without_order) as avg_split_accuracy_unordered,
1123+
AVG(split_accuracy_with_order) as avg_split_accuracy_ordered
1124+
FROM "your-database-name".document_evaluations
1125+
WHERE evaluation_date >= current_date - interval '30' day
1126+
GROUP BY DATE_TRUNC('day', evaluation_date)
1127+
ORDER BY eval_day DESC;
9351128
```
9361129

9371130
### Analytics Notebook

lib/idp_common_pkg/idp_common/agents/analytics/schema_provider.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,14 @@ def get_evaluation_tables_description() -> str:
131131
- `false_discovery_rate` (double): False discovery rate (0-1)
132132
- `weighted_overall_score` (double): Weighted overall score (0-1)
133133
- `execution_time` (double): Time taken to evaluate (seconds)
134+
- `page_level_accuracy` (double): Page-level classification accuracy (0-1)
135+
- `split_accuracy_without_order` (double): Document split accuracy without considering order (0-1)
136+
- `split_accuracy_with_order` (double): Document split accuracy with order considered (0-1)
137+
- `total_pages` (int): Total number of pages in the document
138+
- `total_splits` (int): Total number of document splits/sections
139+
- `correctly_classified_pages` (int): Number of pages correctly classified
140+
- `correctly_split_without_order` (int): Number of correctly split sections (unordered)
141+
- `correctly_split_with_order` (int): Number of correctly split sections (ordered)
134142
135143
**Partitioned by**: date (YYYY-MM-DD format)
136144

lib/idp_common_pkg/idp_common/appsync/mutations.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646
Class
4747
ImageUri
4848
TextUri
49+
TextConfidenceUri
4950
}
5051
Metering
5152
EvaluationReportUri

lib/idp_common_pkg/idp_common/appsync/service.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -306,11 +306,19 @@ def _appsync_to_document(self, appsync_data: Dict[str, Any]) -> Document:
306306
if pages_data is not None: # Ensure pages_data is not None before iterating
307307
for page_data in pages_data:
308308
page_id = str(page_data.get("Id"))
309+
310+
# Get URI values and handle empty strings
311+
# Note: TextUri in AppSync schema contains the parsed text URI
312+
text_uri = page_data.get("TextUri") or None
313+
text_conf_uri = page_data.get("TextConfidenceUri") or None
314+
image_uri = page_data.get("ImageUri") or None
315+
309316
doc.pages[page_id] = Page(
310317
page_id=page_id,
311-
image_uri=page_data.get("ImageUri"),
312-
raw_text_uri=page_data.get("TextUri"),
313-
text_confidence_uri=page_data.get("TextConfidenceUri"),
318+
image_uri=image_uri,
319+
raw_text_uri=text_uri, # TextUri maps to both for backward compatibility
320+
parsed_text_uri=text_uri, # Fix: TextUri contains parsed text URI
321+
text_confidence_uri=text_conf_uri, # Fix: Convert empty strings to None
314322
classification=page_data.get("Class"),
315323
)
316324

lib/idp_common_pkg/idp_common/assessment/granular_service.py

Lines changed: 6 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1288,23 +1288,12 @@ def _get_text_confidence_data(self, page) -> str:
12881288
)
12891289
raise
12901290

1291-
# Fallback: use raw OCR data if text confidence is not available (for backward compatibility)
1292-
if page.raw_text_uri:
1293-
try:
1294-
from idp_common.ocr.service import OcrService
1295-
1296-
ocr_service = OcrService()
1297-
raw_ocr_data = s3.get_json_content(page.raw_text_uri)
1298-
text_confidence_data = ocr_service._generate_text_confidence_data(
1299-
raw_ocr_data
1300-
)
1301-
return json.dumps(text_confidence_data, indent=2)
1302-
except Exception as e:
1303-
logger.warning(
1304-
f"Failed to generate text confidence data for page {page.page_id}: {str(e)}"
1305-
)
1306-
raise
1307-
return ""
1291+
# Text confidence URI not available
1292+
logger.error(
1293+
f"Text confidence data unavailable for page {page.page_id}. "
1294+
f"The text_confidence_uri field is missing or empty."
1295+
)
1296+
return "Text Confidence Data Unavailable"
13081297

13091298
def _convert_bbox_to_geometry(
13101299
self, bbox_coords: List[float], page_num: int

0 commit comments

Comments
 (0)