Skip to content

Commit 4e56f61

Browse files
committed
Feature/stickler
1 parent 2fdcd14 commit 4e56f61

File tree

61 files changed

+8380
-1679
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+8380
-1679
lines changed

CHANGELOG.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,36 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
### Added
9+
810
## [0.4.2]
911

1012
### Added
1113

14+
- **Stickler-Based Evaluation System for Enhanced Comparison Capabilities**
15+
- Migrated evaluation service from custom comparison logic to [AWS Labs Stickler library](https://github.com/awslabs/stickler/tree/main) for structured object evaluation
16+
- **Field Importance Weights**: New capability to assign business criticality weights to fields (e.g., shipment ID weight=3.0 vs notes weight=0.5)
17+
- **Enhanced Configuration**: Added `x-aws-idp-evaluation-*` extensions for evaluation configuration
18+
- **Backward compatible**: Maintained API compatibility - all existing code works unchanged
19+
- **Enhanced Comparators**: Leverages Stickler's optimized comparison algorithms (Exact, Levenshtein, Numeric, Fuzzy, Semantic) with LLM evaluation preserved through custom wrapper
20+
- **Better List Matching**: Hungarian algorithm via Stickler for optimal list comparisons regardless of order
21+
22+
- **UI: Evaluation Configuration in Document Schema UI**
23+
- Added evaluation weight, threshold (with conditional display), and document-level match threshold fields for complete Stickler configuration control
24+
- Added LEVENSHTEIN and HUNGARIAN evaluation methods with auto-populated threshold defaults based on selected method
25+
1226
- **IDP CLI Force Delete All Resources Option**
1327
- Added `--force-delete-all` flag to `idp-cli delete` command for comprehensive stack cleanup
1428
- **Post-CloudFormation Cleanup**: Analyzes resources after CloudFormation deletion completes to identify retained resources (DELETE_SKIPPED status)
1529
- **Use Cases**: Complete test environment cleanup, CI/CD pipelines requiring full teardown, cost optimization by removing all retained resources
1630

1731
### Changed
1832

33+
- **Containerized Pattern-1 and Pattern-3 Deployment Pipelines**
34+
- Migrated Pattern-1 and Pattern-3 Lambda functions to Docker image deployments (following Pattern-2 approach from v0.3.20)
35+
- Builds and pushes all Lambda images via CodeBuild with automated ECR cleanup
36+
- Increases Lambda package size limit from 250 MB (zip) to 10 GB (Docker image) to accommodate larger dependencies
37+
1938
- **Agent Companion Chat - Chat History Feature**
2039
- Added chat history feature from Agent Analysis back into Agent Companion Chat
2140
- Users can now load and view previous chat sessions with full conversation context
@@ -28,9 +47,18 @@ SPDX-License-Identifier: MIT-0
2847
- Prompt input is disabled during active streaming responses to prevent concurrent requests
2948
- Fixed issue where charts in loaded chat history were not displaying
3049

31-
- **GovCloud Template Generation - Missing Chat Resources**
50+
- **GovCloud Template Generation errors**
3251
- Fixed CloudFormation deployment error `Fn::GetAtt references undefined resource GraphQLApi` when deploying GovCloud templates
3352

53+
- **Example Notebook error fixed**
54+
- Example notebooks updated to work with new v0.4.0+ JSON schema
55+
56+
57+
### Templates
58+
- us-west-2: `https://s3.us-west-2.amazonaws.com/aws-ml-blog-us-west-2/artifacts/genai-idp/idp-main_0.4.2yaml`
59+
- us-east-1: `https://s3.us-east-1.amazonaws.com/aws-ml-blog-us-east-1/artifacts/genai-idp/idp-main_0.4.2.yaml`
60+
- eu-central-1: `https://s3.eu-central-1.amazonaws.com/aws-ml-blog-eu-central-1/artifacts/genai-idp/idp-main_0.4.2.yaml`
61+
3462
## [0.4.1]
3563

3664
### Changed

Makefile

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ test:
1616

1717
# Run both linting and formatting in one command
1818
lint: ruff-lint format check-arn-partitions validate-buildspec ui-lint
19+
fastlint: ruff-lint format check-arn-partitions validate-buildspec
1920

2021
# Run linting checks and fix issues automatically
2122
ruff-lint:
@@ -123,3 +124,10 @@ commit: lint test
123124
git add . && \
124125
git commit -am "$${COMMIT_MESSAGE}" && \
125126
git push
127+
128+
fastcommit: fastlint
129+
$(info Generating commit message...)
130+
export COMMIT_MESSAGE="$(shell q chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only" | tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')" && \
131+
git add . && \
132+
git commit -am "$${COMMIT_MESSAGE}" && \
133+
git push

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.4.2-wip2
1+
0.4.2-rc1

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ This folder contains detailed documentation on various aspects of the GenAI Inte
1313
- [Agent Analysis](./agent-analysis.md) - Natural language analytics and data visualization feature
1414
- [Knowledge Base](./knowledge-base.md) - Document knowledge base query feature
1515
- [Post-Processing Lambda Hook](./post-processing-lambda-hook.md) - Custom downstream processing integration
16-
- [Evaluation Framework](./evaluation.md) - Accuracy assessment system
16+
- [Evaluation Framework](./evaluation.md) - Accuracy assessment system powered by Stickler
1717
- [Assessment Feature](./assessment.md) - Extraction confidence evaluation using LLMs
1818
- [Configuration](./configuration.md) - Configuration and customization options
1919
- [Classification](./classification.md) - Customizing document classification

docs/configuration.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,63 @@ Status lookup returns comprehensive information:
312312
}
313313
```
314314

315+
## Evaluation Extensions in JSON Schema
316+
317+
Document class schemas support evaluation-specific extensions for fine-grained control over accuracy assessment. These extensions work with the [Stickler](https://github.com/awslabs/stickler)-based evaluation framework to provide flexible, business-aligned evaluation capabilities.
318+
319+
### Available Extensions
320+
321+
- `x-aws-idp-evaluation-method`: Comparison method (EXACT, FUZZY, NUMERIC_EXACT, SEMANTIC, LLM, HUNGARIAN)
322+
- `x-aws-idp-evaluation-threshold`: Minimum score to consider a match (0.0-1.0)
323+
- `x-aws-idp-evaluation-weight`: Field importance for weighted scoring (default: 1.0, higher values = more important)
324+
325+
### Example Configuration
326+
327+
```yaml
328+
classes:
329+
- $schema: "https://json-schema.org/draft/2020-12/schema"
330+
x-aws-idp-document-type: "Invoice"
331+
x-aws-idp-evaluation-match-threshold: 0.8 # Document-level threshold
332+
properties:
333+
invoice_number:
334+
type: string
335+
x-aws-idp-evaluation-method: EXACT
336+
x-aws-idp-evaluation-weight: 2.0 # Critical field - double weight
337+
invoice_date:
338+
type: string
339+
x-aws-idp-evaluation-method: FUZZY
340+
x-aws-idp-evaluation-threshold: 0.9
341+
x-aws-idp-evaluation-weight: 1.5 # Important field
342+
vendor_name:
343+
type: string
344+
x-aws-idp-evaluation-method: FUZZY
345+
x-aws-idp-evaluation-threshold: 0.85
346+
x-aws-idp-evaluation-weight: 1.0 # Normal weight (default)
347+
vendor_notes:
348+
type: string
349+
x-aws-idp-evaluation-method: SEMANTIC
350+
x-aws-idp-evaluation-threshold: 0.7
351+
x-aws-idp-evaluation-weight: 0.5 # Less critical - half weight
352+
```
353+
354+
### Stickler Backend Integration
355+
356+
The evaluation framework uses [Stickler](https://github.com/awslabs/stickler) as its evaluation engine. The `SticklerConfigMapper` automatically translates these IDP extensions to Stickler's native format, providing:
357+
358+
- **Field-level weighting** for business-critical attributes
359+
- **Optimal list matching** using the Hungarian algorithm
360+
- **Extensible comparator system** with exact, fuzzy, numeric, semantic, and LLM-based comparison
361+
- **Native JSON Schema support** with $ref resolution
362+
363+
### Benefits
364+
365+
1. **Business Alignment**: Weight critical fields higher to ensure evaluation scores reflect business priorities
366+
2. **Flexible Comparison**: Choose the right evaluation method for each field type
367+
3. **Tunable Thresholds**: Set field-specific thresholds for matching sensitivity
368+
4. **Dynamic Schema Generation**: Auto-generates evaluation schema from baseline data when configuration is missing (for development/prototyping)
369+
370+
For detailed evaluation capabilities and best practices, see [evaluation.md](evaluation.md).
371+
315372
## Cost Tracking and Optimization
316373

317374
The solution includes built-in cost tracking capabilities:

0 commit comments

Comments
 (0)