Skip to content

Commit 4aaace4

Browse files
author
Bob Strahan
committed
Merge branch 'develop' into feature/customfinetuned-model
2 parents 4b9dd7e + 51a0a6d commit 4aaace4

File tree

72 files changed

+13441
-172
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+13441
-172
lines changed

CHANGELOG.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,38 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
## [0.3.3]
9+
10+
### Added
11+
12+
- **Assessment Feature for Extraction Confidence Evaluation (EXPERIMENTAL)**
13+
- Added new assessment service that evaluates extraction confidence using LLMs to analyze extraction results against source documents
14+
- Multi-modal assessment capability combining text analysis with document images for comprehensive confidence scoring
15+
- UI integration with explainability_info display showing per-attribute confidence scores, thresholds, and explanations
16+
- Optional deployment controlled by `IsAssessmentEnabled` parameter (defaults to false)
17+
- Added e2e-example-with-assessment.ipynb notebook for testing assessment workflow
18+
19+
- **Enhanced Evaluation Framework with Confidence Integration**
20+
- Added expected_confidence and actual_confidence fields to evaluation reports for quality analysis
21+
- Automatic extraction and display of confidence scores from assessment explainability_info
22+
- Enhanced JSON and Markdown evaluation reports with confidence columns
23+
- Backward compatible integration - shows "N/A" when confidence data unavailable
24+
25+
- **Evaluation Analytics Database and Reporting System**
26+
- Added comprehensive ReportingDatabase (AWS Glue) with structured evaluation metrics storage
27+
- Three-tier analytics tables: document_evaluations, section_evaluations, and attribute_evaluations
28+
- Automatic partitioning by date and document for efficient querying with Amazon Athena
29+
- Detailed metrics tracking including accuracy, precision, recall, F1 score, execution time, and evaluation methods
30+
- Added evaluation_reporting_analytics.ipynb notebook for comprehensive performance analysis and visualization
31+
- Multi-level analytics with document, section, and attribute-level insights
32+
- Visual dashboards showing accuracy distributions, performance trends, and problematic patterns
33+
- Configurable filters for date ranges, document types, and evaluation thresholds
34+
- Integration with existing evaluation framework - metrics automatically saved to database
35+
- ReportingDatabase output added to CloudFormation template for easy reference
36+
37+
### Fixed
38+
- Fixed build failure related to pandas and numpy dependency conflicts in the idp_common_pkg package
39+
840
## [0.3.2]
941

1042
### Added

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ A scalable, serverless solution for automated document processing and informatio
3535
- **Comprehensive Monitoring**: Rich CloudWatch dashboard with detailed metrics and logs
3636
- **Web User Interface**: Modern UI for inspecting document workflow status and results
3737
- **AI-Powered Evaluation**: Framework to assess accuracy against baseline data
38+
- **Extraction Confidence Assessment**: LLM-powered assessment of extraction confidence with multimodal document analysis
3839
- **Document Knowledge Base Query**: Ask questions about your processed documents
3940

4041
## Architecture Overview
@@ -110,11 +111,12 @@ For detailed deployment and testing instructions, see the [Deployment Guide](./d
110111
- [Architecture](./docs/architecture.md) - Detailed component architecture and data flow
111112
- [Deployment](./docs/deployment.md) - Build, publish, deploy, and test instructions
112113
- [Web UI](./docs/web-ui.md) - Web interface features and usage
113-
- [Knowledge Base](./docs/knowledge-base.md) - Document knowledge base query feature
114-
- [Evaluation Framework](./docs/evaluation.md) - Accuracy assessment system
115114
- [Configuration](./docs/configuration.md) - Configuration and customization options
116115
- [Classification](./docs/classification.md) - Customizing document classification
117116
- [Extraction](./docs/extraction.md) - Customizing information extraction
117+
- [Assessment](./docs/assessment.md) - Extraction confidence evaluation using LLMs
118+
- [Evaluation Framework](./docs/evaluation.md) - Accuracy assessment system with analytics database and reporting
119+
- [Knowledge Base](./docs/knowledge-base.md) - Document knowledge base query feature
118120
- [Monitoring](./docs/monitoring.md) - Monitoring and logging capabilities
119121
- [Troubleshooting](./docs/troubleshooting.md) - Troubleshooting and performance guides
120122

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.3.3-alpha
1+
0.3.3-beta

config_library/pattern-2/default/config.yaml

Lines changed: 108 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,13 @@ classes:
1313
attributes:
1414
- name: sender_name
1515
description: The name of the person or entity who wrote or sent the letter. Look for text following or near terms like 'from', 'sender', 'authored by', 'written by', or at the end of the letter before a signature.
16+
confidence_threshold: '0.85'
1617
- name: sender_address
1718
description: The physical address of the sender, typically appearing at the top of the letter. May be labeled as 'address', 'location', or 'from address'.
19+
confidence_threshold: '0.8'
1820
- name: recipient_name
1921
description: The name of the person or entity receiving the letter. Look for this after 'to', 'recipient', 'addressee', or at the beginning of the letter.
22+
confidence_threshold: '0.9'
2023
- name: recipient_address
2124
description: The physical address where the letter is to be delivered. Often labeled as 'to address' or 'delivery address', typically appearing below the recipient name.
2225
- name: date
@@ -587,6 +590,110 @@ summarization:
587590
model: us.anthropic.claude-3-7-sonnet-20250219-v1:0
588591
system_prompt: >-
589592
You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
593+
assessment:
594+
default_confidence_threshold: '0.9'
595+
top_p: '0.1'
596+
max_tokens: '4096'
597+
top_k: '5'
598+
task_prompt: >-
599+
<background>
600+
601+
You are an expert document analysis assessment system. Your task is to evaluate the confidence and accuracy of extraction results for a document of class {DOCUMENT_CLASS}.
602+
603+
</background>
604+
605+
606+
<task>
607+
608+
Analyze the extraction results against the source document and provide confidence assessments for each extracted attribute. Consider factors such as:
609+
610+
1. Text clarity and OCR quality in the source regions
611+
2. Alignment between extracted values and document content
612+
3. Presence of clear evidence supporting the extraction
613+
4. Potential ambiguity or uncertainty in the source material
614+
5. Completeness and accuracy of the extracted information
615+
616+
</task>
617+
618+
619+
<assessment-guidelines>
620+
621+
For each attribute, provide:
622+
1. A confidence score between 0.0 and 1.0 where:
623+
- 1.0 = Very high confidence, clear and unambiguous evidence
624+
- 0.8-0.9 = High confidence, strong evidence with minor uncertainty
625+
- 0.6-0.7 = Medium confidence, reasonable evidence but some ambiguity
626+
- 0.4-0.5 = Low confidence, weak or unclear evidence
627+
- 0.0-0.3 = Very low confidence, little to no supporting evidence
628+
629+
2. A clear reason explaining the confidence score, including:
630+
- What evidence supports or contradicts the extraction
631+
- Any OCR quality issues that affect confidence
632+
- Clarity of the source document in relevant areas
633+
- Any ambiguity or uncertainty factors
634+
635+
Guidelines:
636+
- Base assessments on actual document content and OCR quality
637+
- Consider both text-based evidence and visual/layout clues
638+
- Account for OCR confidence scores when provided
639+
- Be objective and specific in reasoning
640+
- If an extraction appears incorrect, score accordingly with explanation
641+
642+
</assessment-guidelines>
643+
644+
<attributes-definitions>
645+
646+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
647+
648+
</attributes-definitions>
649+
650+
651+
<<CACHEPOINT>>
652+
653+
654+
<extraction-results>
655+
656+
{EXTRACTION_RESULTS}
657+
658+
</extraction-results>
659+
660+
661+
<document-image>
662+
663+
{DOCUMENT_IMAGE}
664+
665+
</document-image>
666+
667+
668+
<ocr-text-confidence-results>
669+
670+
{OCR_TEXT_CONFIDENCE}
671+
672+
</ocr-text-confidence-results>
673+
674+
675+
<final-instructions>
676+
677+
Analyze the extraction results against the source document and provide confidence assessments. Return a JSON object with the following structure:
678+
679+
{
680+
"attribute_name_1": {
681+
"confidence": 0.85,
682+
"confidence_reason": "Clear text evidence found in document header with high OCR confidence (0.98). Value matches exactly."
683+
},
684+
"attribute_name_2": {
685+
"confidence": 0.65,
686+
"confidence_reason": "Text is partially unclear due to poor scan quality. OCR confidence low (0.72) in this region."
687+
}
688+
}
689+
690+
Include assessments for ALL attributes present in the extraction results.
691+
692+
</final-instructions>
693+
temperature: '0.0'
694+
model: us.amazon.nova-pro-v1:0
695+
system_prompt: >-
696+
You are a document analysis assessment expert. Your task is to evaluate the confidence and accuracy of extraction results by analyzing the source document evidence. Respond only with JSON containing confidence scores and reasoning for each extracted attribute.
590697
evaluation:
591698
llm_method:
592699
top_p: '0.1'
@@ -622,7 +729,7 @@ evaluation:
622729
"reason": "Your explanation here"
623730
}
624731
temperature: '0.0'
625-
model: us.anthropic.claude-3-5-sonnet-20241022-v2:0
732+
model: us.anthropic.claude-3-haiku-20240307-v1:0
626733
system_prompt: >-
627734
You are an evaluator that helps determine if the predicted and expected values match for document attribute extraction. You will consider the context and meaning rather than just exact string matching.
628735
pricing:

config_library/pattern-2/few_shot_example_with_multimodal_page_classification/config.yaml

Lines changed: 104 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -877,6 +877,109 @@ pricing:
877877
price: '3.0E-7'
878878
- name: cacheWriteInputTokens
879879
price: '3.75E-6'
880+
assessment:
881+
top_p: '0.1'
882+
max_tokens: '4096'
883+
top_k: '5'
884+
task_prompt: >-
885+
<background>
886+
887+
You are an expert document analysis assessment system. Your task is to evaluate the confidence and accuracy of extraction results for a document of class {DOCUMENT_CLASS}.
888+
889+
</background>
890+
891+
892+
<task>
893+
894+
Analyze the extraction results against the source document and provide confidence assessments for each extracted attribute. Consider factors such as:
895+
896+
1. Text clarity and OCR quality in the source regions
897+
2. Alignment between extracted values and document content
898+
3. Presence of clear evidence supporting the extraction
899+
4. Potential ambiguity or uncertainty in the source material
900+
5. Completeness and accuracy of the extracted information
901+
902+
</task>
903+
904+
905+
<assessment-guidelines>
906+
907+
For each attribute, provide:
908+
1. A confidence score between 0.0 and 1.0 where:
909+
- 1.0 = Very high confidence, clear and unambiguous evidence
910+
- 0.8-0.9 = High confidence, strong evidence with minor uncertainty
911+
- 0.6-0.7 = Medium confidence, reasonable evidence but some ambiguity
912+
- 0.4-0.5 = Low confidence, weak or unclear evidence
913+
- 0.0-0.3 = Very low confidence, little to no supporting evidence
914+
915+
2. A clear reason explaining the confidence score, including:
916+
- What evidence supports or contradicts the extraction
917+
- Any OCR quality issues that affect confidence
918+
- Clarity of the source document in relevant areas
919+
- Any ambiguity or uncertainty factors
920+
921+
Guidelines:
922+
- Base assessments on actual document content and OCR quality
923+
- Consider both text-based evidence and visual/layout clues
924+
- Account for OCR confidence scores when provided
925+
- Be objective and specific in reasoning
926+
- If an extraction appears incorrect, score accordingly with explanation
927+
928+
</assessment-guidelines>
929+
930+
<attributes-definitions>
931+
932+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
933+
934+
</attributes-definitions>
935+
936+
937+
<<CACHEPOINT>>
938+
939+
940+
<extraction-results>
941+
942+
{EXTRACTION_RESULTS}
943+
944+
</extraction-results>
945+
946+
947+
<document-image>
948+
949+
{DOCUMENT_IMAGE}
950+
951+
</document-image>
952+
953+
954+
<ocr-text-confidence-results>
955+
956+
{OCR_TEXT_CONFIDENCE}
957+
958+
</ocr-text-confidence-results>
959+
960+
961+
<final-instructions>
962+
963+
Analyze the extraction results against the source document and provide confidence assessments. Return a JSON object with the following structure:
964+
965+
{
966+
"attribute_name_1": {
967+
"confidence": 0.85,
968+
"confidence_reason": "Clear text evidence found in document header with high OCR confidence (0.98). Value matches exactly."
969+
},
970+
"attribute_name_2": {
971+
"confidence": 0.65,
972+
"confidence_reason": "Text is partially unclear due to poor scan quality. OCR confidence low (0.72) in this region."
973+
}
974+
}
975+
976+
Include assessments for ALL attributes present in the extraction results.
977+
978+
</final-instructions>
979+
temperature: '0.0'
980+
model: us.amazon.nova-pro-v1:0
981+
system_prompt: >-
982+
You are a document analysis assessment expert. Your task is to evaluate the confidence and accuracy of extraction results by analyzing the source document evidence. Respond only with JSON containing confidence scores and reasoning for each extracted attribute.
880983
evaluation:
881984
llm_method:
882985
top_p: '0.1'
@@ -916,7 +1019,7 @@ evaluation:
9161019
"reason": "Your explanation here"
9171020
}
9181021
temperature: '0.0'
919-
model: us.anthropic.claude-3-5-sonnet-20241022-v2:0
1022+
model: us.anthropic.claude-3-haiku-20240307-v1:0
9201023
system_prompt: >
9211024
You are an evaluator that helps determine if the predicted and expected
9221025
values match for document attribute extraction. You will consider the

0 commit comments

Comments
 (0)