Skip to content

Commit d2d5881

Browse files
author
Bob Strahan
committed
Merge branch 'develop' v0.3.3
2 parents 3c6926b + 83fd6c1 commit d2d5881

File tree

79 files changed

+13582
-200
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+13582
-200
lines changed

CHANGELOG.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,64 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
## [0.3.3]
9+
10+
### Added
11+
12+
- **Amazon Nova Model Fine-tuning Support**
13+
- Added comprehensive `ModelFinetuningService` class for managing Nova model fine-tuning workflows
14+
- Support for fine-tuning Amazon Nova models (Nova Lite, Nova Pro) using Amazon Bedrock
15+
- Complete end-to-end workflow including dataset preparation, job creation, provisioned throughput management, and inference
16+
- CLI tools for fine-tuning workflow:
17+
- `prepare_nova_finetuning_data.py` - Dataset preparation from RVL-CDIP or custom datasets
18+
- `create_finetuning_job.py` - Fine-tuning job creation with automatic IAM role setup
19+
- `create_provisioned_throughput.py` - Provisioned throughput management for fine-tuned models
20+
- `inference_example.py` - Model inference and evaluation with comparison capabilities
21+
- CloudFormation integration with new parameters:
22+
- `CustomClassificationModelARN` - Support for custom fine-tuned classification models in Pattern-2
23+
- `CustomExtractionModelARN` - Support for custom fine-tuned extraction models in Pattern-2
24+
- Automatic integration of fine-tuned models in classification and extraction model selection dropdowns
25+
- Comprehensive documentation in `docs/nova-finetuning.md` with step-by-step instructions
26+
- Example notebooks:
27+
- `finetuning_dataset_prep.ipynb` - Interactive dataset preparation
28+
- `finetuning_model_service_demo.ipynb` - Service usage demonstration
29+
- `finetuning_model_document_classification_evaluation.ipynb` - Model evaluation
30+
- Built-in support for Bedrock fine-tuning format with multi-modal capabilities
31+
- Data splitting and validation set creation
32+
- Cost optimization features including provisioned throughput deletion
33+
- Performance metrics and accuracy evaluation tools
34+
35+
- **Assessment Feature for Extraction Confidence Evaluation (EXPERIMENTAL)**
36+
- Added new assessment service that evaluates extraction confidence using LLMs to analyze extraction results against source documents
37+
- Multi-modal assessment capability combining text analysis with document images for comprehensive confidence scoring
38+
- UI integration with explainability_info display showing per-attribute confidence scores, thresholds, and explanations
39+
- Optional deployment controlled by `IsAssessmentEnabled` parameter (defaults to false)
40+
- Added e2e-example-with-assessment.ipynb notebook for testing assessment workflow
41+
42+
- **Enhanced Evaluation Framework with Confidence Integration**
43+
- Added confidence fields to evaluation reports for quality analysis
44+
- Automatic extraction and display of confidence scores from assessment explainability_info
45+
- Enhanced JSON and Markdown evaluation reports with confidence columns
46+
- Backward compatible integration - shows "N/A" when confidence data unavailable
47+
48+
- **Evaluation Analytics Database and Reporting System**
49+
- Added comprehensive ReportingDatabase (AWS Glue) with structured evaluation metrics storage
50+
- Three-tier analytics tables: document_evaluations, section_evaluations, and attribute_evaluations
51+
- Automatic partitioning by date and document for efficient querying with Amazon Athena
52+
- Detailed metrics tracking including accuracy, precision, recall, F1 score, execution time, and evaluation methods
53+
- Added evaluation_reporting_analytics.ipynb notebook for comprehensive performance analysis and visualization
54+
- Multi-level analytics with document, section, and attribute-level insights
55+
- Visual dashboards showing accuracy distributions, performance trends, and problematic patterns
56+
- Configurable filters for date ranges, document types, and evaluation thresholds
57+
- Integration with existing evaluation framework - metrics automatically saved to database
58+
- ReportingDatabase output added to CloudFormation template for easy reference
59+
60+
### Fixed
61+
- Fixed build failure related to pandas, numpy, and PyMuPDF dependency conflicts in the idp_common_pkg package
62+
- Fixed deployment failure caused by CodeBuild project timeout, by raising TimeoutInMinutes property
63+
- Added missing cached token metrics to CloudWatch dashboards
64+
- Added Bedrock model access prerequisite to README and deployment doc.
65+
866
## [0.3.2]
967

1068
### Added

Makefile

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,13 @@ GREEN := \033[0;32m
66
YELLOW := \033[1;33m
77
NC := \033[0m # No Color
88

9+
# Default target - run both lint and test
10+
all: lint test
11+
12+
# Run tests in idp_common_pkg directory
13+
test:
14+
$(MAKE) -C lib/idp_common_pkg test
15+
916
# Run both linting and formatting in one command
1017
lint: ruff-lint format
1118

@@ -31,4 +38,12 @@ lint-cicd:
3138
echo "$(YELLOW)Please run 'make format' locally to fix these issues.$(NC)"; \
3239
exit 1; \
3340
fi
34-
@echo "$(GREEN)All code quality checks passed!$(NC)"
41+
@echo "$(GREEN)All code quality checks passed!$(NC)"
42+
43+
# A convenience Makefile target that runs
44+
commit: lint test
45+
$(info Generating commit message...)
46+
export COMMIT_MESSAGE="$(shell q chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only" | tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')" && \
47+
git add . && \
48+
git commit -am "$${COMMIT_MESSAGE}" && \
49+
git push

README.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ A scalable, serverless solution for automated document processing and informatio
3535
- **Comprehensive Monitoring**: Rich CloudWatch dashboard with detailed metrics and logs
3636
- **Web User Interface**: Modern UI for inspecting document workflow status and results
3737
- **AI-Powered Evaluation**: Framework to assess accuracy against baseline data
38+
- **Extraction Confidence Assessment**: LLM-powered assessment of extraction confidence with multimodal document analysis
3839
- **Document Knowledge Base Query**: Ask questions about your processed documents
3940

4041
## Architecture Overview
@@ -71,8 +72,8 @@ After deployment, you can quickly process a document and view results:
7172
- **Via S3**: Upload directly to the S3 input bucket (find the bucket URL in CloudFormation stack Outputs)
7273

7374
2. **Use Sample Documents**:
74-
- For Pattern 1 (BDA): Use `samples/lending_package.pdf`
75-
- For Patterns 2 and 3: Use `samples/rvl_cdip_package.pdf`
75+
- For Pattern 1 (BDA): Use [samples/lending_package.pdf](./samples/lending_package.pdf)
76+
- For Patterns 2 and 3: Use [samples/rvl_cdip_package.pdf](./samples/rvl_cdip_package.pdf)
7677

7778
3. **Monitor Processing**:
7879
- **Via Web UI**: Track document status on the dashboard
@@ -84,6 +85,10 @@ After deployment, you can quickly process a document and view results:
8485

8586
See the [Deployment Guide](./docs/deployment.md#testing-the-solution) for more detailed testing instructions.
8687

88+
IMPORTANT: If you have not previously done so, you must [request access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) to the following Amazon Bedrock models:
89+
- Amazon: All Nova models, plus Titan Text Embeddings V2
90+
- Anthropic: Claude 3.x models, Claude 4.x models
91+
8792
## Updating an Existing Deployment
8893

8994
To update an existing GenAIIDP stack to a new version:
@@ -110,11 +115,12 @@ For detailed deployment and testing instructions, see the [Deployment Guide](./d
110115
- [Architecture](./docs/architecture.md) - Detailed component architecture and data flow
111116
- [Deployment](./docs/deployment.md) - Build, publish, deploy, and test instructions
112117
- [Web UI](./docs/web-ui.md) - Web interface features and usage
113-
- [Knowledge Base](./docs/knowledge-base.md) - Document knowledge base query feature
114-
- [Evaluation Framework](./docs/evaluation.md) - Accuracy assessment system
115118
- [Configuration](./docs/configuration.md) - Configuration and customization options
116119
- [Classification](./docs/classification.md) - Customizing document classification
117120
- [Extraction](./docs/extraction.md) - Customizing information extraction
121+
- [Assessment](./docs/assessment.md) - Extraction confidence evaluation using LLMs
122+
- [Evaluation Framework](./docs/evaluation.md) - Accuracy assessment system with analytics database and reporting
123+
- [Knowledge Base](./docs/knowledge-base.md) - Document knowledge base query feature
118124
- [Monitoring](./docs/monitoring.md) - Monitoring and logging capabilities
119125
- [Troubleshooting](./docs/troubleshooting.md) - Troubleshooting and performance guides
120126

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.3.2
1+
0.3.3

config_library/pattern-2/default/config.yaml

Lines changed: 108 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,13 @@ classes:
1313
attributes:
1414
- name: sender_name
1515
description: The name of the person or entity who wrote or sent the letter. Look for text following or near terms like 'from', 'sender', 'authored by', 'written by', or at the end of the letter before a signature.
16+
confidence_threshold: '0.85'
1617
- name: sender_address
1718
description: The physical address of the sender, typically appearing at the top of the letter. May be labeled as 'address', 'location', or 'from address'.
19+
confidence_threshold: '0.8'
1820
- name: recipient_name
1921
description: The name of the person or entity receiving the letter. Look for this after 'to', 'recipient', 'addressee', or at the beginning of the letter.
22+
confidence_threshold: '0.9'
2023
- name: recipient_address
2124
description: The physical address where the letter is to be delivered. Often labeled as 'to address' or 'delivery address', typically appearing below the recipient name.
2225
- name: date
@@ -587,6 +590,110 @@ summarization:
587590
model: us.anthropic.claude-3-7-sonnet-20250219-v1:0
588591
system_prompt: >-
589592
You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
593+
assessment:
594+
default_confidence_threshold: '0.9'
595+
top_p: '0.1'
596+
max_tokens: '4096'
597+
top_k: '5'
598+
task_prompt: >-
599+
<background>
600+
601+
You are an expert document analysis assessment system. Your task is to evaluate the confidence and accuracy of extraction results for a document of class {DOCUMENT_CLASS}.
602+
603+
</background>
604+
605+
606+
<task>
607+
608+
Analyze the extraction results against the source document and provide confidence assessments for each extracted attribute. Consider factors such as:
609+
610+
1. Text clarity and OCR quality in the source regions
611+
2. Alignment between extracted values and document content
612+
3. Presence of clear evidence supporting the extraction
613+
4. Potential ambiguity or uncertainty in the source material
614+
5. Completeness and accuracy of the extracted information
615+
616+
</task>
617+
618+
619+
<assessment-guidelines>
620+
621+
For each attribute, provide:
622+
1. A confidence score between 0.0 and 1.0 where:
623+
- 1.0 = Very high confidence, clear and unambiguous evidence
624+
- 0.8-0.9 = High confidence, strong evidence with minor uncertainty
625+
- 0.6-0.7 = Medium confidence, reasonable evidence but some ambiguity
626+
- 0.4-0.5 = Low confidence, weak or unclear evidence
627+
- 0.0-0.3 = Very low confidence, little to no supporting evidence
628+
629+
2. A clear reason explaining the confidence score, including:
630+
- What evidence supports or contradicts the extraction
631+
- Any OCR quality issues that affect confidence
632+
- Clarity of the source document in relevant areas
633+
- Any ambiguity or uncertainty factors
634+
635+
Guidelines:
636+
- Base assessments on actual document content and OCR quality
637+
- Consider both text-based evidence and visual/layout clues
638+
- Account for OCR confidence scores when provided
639+
- Be objective and specific in reasoning
640+
- If an extraction appears incorrect, score accordingly with explanation
641+
642+
</assessment-guidelines>
643+
644+
<attributes-definitions>
645+
646+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
647+
648+
</attributes-definitions>
649+
650+
651+
<<CACHEPOINT>>
652+
653+
654+
<extraction-results>
655+
656+
{EXTRACTION_RESULTS}
657+
658+
</extraction-results>
659+
660+
661+
<document-image>
662+
663+
{DOCUMENT_IMAGE}
664+
665+
</document-image>
666+
667+
668+
<ocr-text-confidence-results>
669+
670+
{OCR_TEXT_CONFIDENCE}
671+
672+
</ocr-text-confidence-results>
673+
674+
675+
<final-instructions>
676+
677+
Analyze the extraction results against the source document and provide confidence assessments. Return a JSON object with the following structure:
678+
679+
{
680+
"attribute_name_1": {
681+
"confidence": 0.85,
682+
"confidence_reason": "Clear text evidence found in document header with high OCR confidence (0.98). Value matches exactly."
683+
},
684+
"attribute_name_2": {
685+
"confidence": 0.65,
686+
"confidence_reason": "Text is partially unclear due to poor scan quality. OCR confidence low (0.72) in this region."
687+
}
688+
}
689+
690+
Include assessments for ALL attributes present in the extraction results.
691+
692+
</final-instructions>
693+
temperature: '0.0'
694+
model: us.amazon.nova-pro-v1:0
695+
system_prompt: >-
696+
You are a document analysis assessment expert. Your task is to evaluate the confidence and accuracy of extraction results by analyzing the source document evidence. Respond only with JSON containing confidence scores and reasoning for each extracted attribute.
590697
evaluation:
591698
llm_method:
592699
top_p: '0.1'
@@ -622,7 +729,7 @@ evaluation:
622729
"reason": "Your explanation here"
623730
}
624731
temperature: '0.0'
625-
model: us.anthropic.claude-3-5-sonnet-20241022-v2:0
732+
model: us.anthropic.claude-3-haiku-20240307-v1:0
626733
system_prompt: >-
627734
You are an evaluator that helps determine if the predicted and expected values match for document attribute extraction. You will consider the context and meaning rather than just exact string matching.
628735
pricing:

config_library/pattern-2/few_shot_example_with_multimodal_page_classification/config.yaml

Lines changed: 104 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -877,6 +877,109 @@ pricing:
877877
price: '3.0E-7'
878878
- name: cacheWriteInputTokens
879879
price: '3.75E-6'
880+
assessment:
881+
top_p: '0.1'
882+
max_tokens: '4096'
883+
top_k: '5'
884+
task_prompt: >-
885+
<background>
886+
887+
You are an expert document analysis assessment system. Your task is to evaluate the confidence and accuracy of extraction results for a document of class {DOCUMENT_CLASS}.
888+
889+
</background>
890+
891+
892+
<task>
893+
894+
Analyze the extraction results against the source document and provide confidence assessments for each extracted attribute. Consider factors such as:
895+
896+
1. Text clarity and OCR quality in the source regions
897+
2. Alignment between extracted values and document content
898+
3. Presence of clear evidence supporting the extraction
899+
4. Potential ambiguity or uncertainty in the source material
900+
5. Completeness and accuracy of the extracted information
901+
902+
</task>
903+
904+
905+
<assessment-guidelines>
906+
907+
For each attribute, provide:
908+
1. A confidence score between 0.0 and 1.0 where:
909+
- 1.0 = Very high confidence, clear and unambiguous evidence
910+
- 0.8-0.9 = High confidence, strong evidence with minor uncertainty
911+
- 0.6-0.7 = Medium confidence, reasonable evidence but some ambiguity
912+
- 0.4-0.5 = Low confidence, weak or unclear evidence
913+
- 0.0-0.3 = Very low confidence, little to no supporting evidence
914+
915+
2. A clear reason explaining the confidence score, including:
916+
- What evidence supports or contradicts the extraction
917+
- Any OCR quality issues that affect confidence
918+
- Clarity of the source document in relevant areas
919+
- Any ambiguity or uncertainty factors
920+
921+
Guidelines:
922+
- Base assessments on actual document content and OCR quality
923+
- Consider both text-based evidence and visual/layout clues
924+
- Account for OCR confidence scores when provided
925+
- Be objective and specific in reasoning
926+
- If an extraction appears incorrect, score accordingly with explanation
927+
928+
</assessment-guidelines>
929+
930+
<attributes-definitions>
931+
932+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
933+
934+
</attributes-definitions>
935+
936+
937+
<<CACHEPOINT>>
938+
939+
940+
<extraction-results>
941+
942+
{EXTRACTION_RESULTS}
943+
944+
</extraction-results>
945+
946+
947+
<document-image>
948+
949+
{DOCUMENT_IMAGE}
950+
951+
</document-image>
952+
953+
954+
<ocr-text-confidence-results>
955+
956+
{OCR_TEXT_CONFIDENCE}
957+
958+
</ocr-text-confidence-results>
959+
960+
961+
<final-instructions>
962+
963+
Analyze the extraction results against the source document and provide confidence assessments. Return a JSON object with the following structure:
964+
965+
{
966+
"attribute_name_1": {
967+
"confidence": 0.85,
968+
"confidence_reason": "Clear text evidence found in document header with high OCR confidence (0.98). Value matches exactly."
969+
},
970+
"attribute_name_2": {
971+
"confidence": 0.65,
972+
"confidence_reason": "Text is partially unclear due to poor scan quality. OCR confidence low (0.72) in this region."
973+
}
974+
}
975+
976+
Include assessments for ALL attributes present in the extraction results.
977+
978+
</final-instructions>
979+
temperature: '0.0'
980+
model: us.amazon.nova-pro-v1:0
981+
system_prompt: >-
982+
You are a document analysis assessment expert. Your task is to evaluate the confidence and accuracy of extraction results by analyzing the source document evidence. Respond only with JSON containing confidence scores and reasoning for each extracted attribute.
880983
evaluation:
881984
llm_method:
882985
top_p: '0.1'
@@ -916,7 +1019,7 @@ evaluation:
9161019
"reason": "Your explanation here"
9171020
}
9181021
temperature: '0.0'
919-
model: us.anthropic.claude-3-5-sonnet-20241022-v2:0
1022+
model: us.anthropic.claude-3-haiku-20240307-v1:0
9201023
system_prompt: >
9211024
You are an evaluator that helps determine if the predicted and expected
9221025
values match for document attribute extraction. You will consider the

0 commit comments

Comments
 (0)