You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+29-1Lines changed: 29 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,17 +5,36 @@ SPDX-License-Identifier: MIT-0
5
5
6
6
## [Unreleased]
7
7
8
+
### Added
9
+
8
10
## [0.4.2]
9
11
10
12
### Added
11
13
14
+
-**Stickler-Based Evaluation System for Enhanced Comparison Capabilities**
15
+
- Migrated evaluation service from custom comparison logic to [AWS Labs Stickler library](https://github.com/awslabs/stickler/tree/main) for structured object evaluation
16
+
-**Field Importance Weights**: New capability to assign business criticality weights to fields (e.g., shipment ID weight=3.0 vs notes weight=0.5)
17
+
-**Enhanced Configuration**: Added `x-aws-idp-evaluation-*` extensions for evaluation configuration
18
+
-**Backward compatible**: Maintained API compatibility - all existing code works unchanged
19
+
-**Enhanced Comparators**: Leverages Stickler's optimized comparison algorithms (Exact, Levenshtein, Numeric, Fuzzy, Semantic) with LLM evaluation preserved through custom wrapper
20
+
-**Better List Matching**: Hungarian algorithm via Stickler for optimal list comparisons regardless of order
21
+
22
+
-**UI: Evaluation Configuration in Document Schema UI**
23
+
- Added evaluation weight, threshold (with conditional display), and document-level match threshold fields for complete Stickler configuration control
24
+
- Added LEVENSHTEIN and HUNGARIAN evaluation methods with auto-populated threshold defaults based on selected method
25
+
12
26
-**IDP CLI Force Delete All Resources Option**
13
27
- Added `--force-delete-all` flag to `idp-cli delete` command for comprehensive stack cleanup
14
28
-**Post-CloudFormation Cleanup**: Analyzes resources after CloudFormation deletion completes to identify retained resources (DELETE_SKIPPED status)
15
29
-**Use Cases**: Complete test environment cleanup, CI/CD pipelines requiring full teardown, cost optimization by removing all retained resources
16
30
17
31
### Changed
18
32
33
+
-**Containerized Pattern-1 and Pattern-3 Deployment Pipelines**
34
+
- Migrated Pattern-1 and Pattern-3 Lambda functions to Docker image deployments (following Pattern-2 approach from v0.3.20)
35
+
- Builds and pushes all Lambda images via CodeBuild with automated ECR cleanup
36
+
- Increases Lambda package size limit from 250 MB (zip) to 10 GB (Docker image) to accommodate larger dependencies
37
+
19
38
-**Agent Companion Chat - Chat History Feature**
20
39
- Added chat history feature from Agent Analysis back into Agent Companion Chat
21
40
- Users can now load and view previous chat sessions with full conversation context
@@ -28,9 +47,18 @@ SPDX-License-Identifier: MIT-0
28
47
- Prompt input is disabled during active streaming responses to prevent concurrent requests
29
48
- Fixed issue where charts in loaded chat history were not displaying
Copy file name to clipboardExpand all lines: Makefile
+8Lines changed: 8 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,7 @@ test:
16
16
17
17
# Run both linting and formatting in one command
18
18
lint: ruff-lint format check-arn-partitions validate-buildspec ui-lint
19
+
fastlint: ruff-lint format check-arn-partitions validate-buildspec
19
20
20
21
# Run linting checks and fix issues automatically
21
22
ruff-lint:
@@ -123,3 +124,10 @@ commit: lint test
123
124
git add .&&\
124
125
git commit -am "$${COMMIT_MESSAGE}"&&\
125
126
git push
127
+
128
+
fastcommit: fastlint
129
+
$(info Generating commit message...)
130
+
export COMMIT_MESSAGE="$(shell q chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only"| tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')"&&\
Copy file name to clipboardExpand all lines: docs/configuration.md
+57Lines changed: 57 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -312,6 +312,63 @@ Status lookup returns comprehensive information:
312
312
}
313
313
```
314
314
315
+
## Evaluation Extensions in JSON Schema
316
+
317
+
Document class schemas support evaluation-specific extensions for fine-grained control over accuracy assessment. These extensions work with the [Stickler](https://github.com/awslabs/stickler)-based evaluation framework to provide flexible, business-aligned evaluation capabilities.
x-aws-idp-evaluation-weight: 2.0 # Critical field - double weight
337
+
invoice_date:
338
+
type: string
339
+
x-aws-idp-evaluation-method: FUZZY
340
+
x-aws-idp-evaluation-threshold: 0.9
341
+
x-aws-idp-evaluation-weight: 1.5 # Important field
342
+
vendor_name:
343
+
type: string
344
+
x-aws-idp-evaluation-method: FUZZY
345
+
x-aws-idp-evaluation-threshold: 0.85
346
+
x-aws-idp-evaluation-weight: 1.0 # Normal weight (default)
347
+
vendor_notes:
348
+
type: string
349
+
x-aws-idp-evaluation-method: SEMANTIC
350
+
x-aws-idp-evaluation-threshold: 0.7
351
+
x-aws-idp-evaluation-weight: 0.5 # Less critical - half weight
352
+
```
353
+
354
+
### Stickler Backend Integration
355
+
356
+
The evaluation framework uses [Stickler](https://github.com/awslabs/stickler) as its evaluation engine. The `SticklerConfigMapper` automatically translates these IDP extensions to Stickler's native format, providing:
357
+
358
+
- **Field-level weighting** for business-critical attributes
359
+
- **Optimal list matching** using the Hungarian algorithm
360
+
- **Extensible comparator system** with exact, fuzzy, numeric, semantic, and LLM-based comparison
361
+
- **Native JSON Schema support** with $ref resolution
362
+
363
+
### Benefits
364
+
365
+
1. **Business Alignment**: Weight critical fields higher to ensure evaluation scores reflect business priorities
366
+
2. **Flexible Comparison**: Choose the right evaluation method for each field type
367
+
3. **Tunable Thresholds**: Set field-specific thresholds for matching sensitivity
368
+
4. **Dynamic Schema Generation**: Auto-generates evaluation schema from baseline data when configuration is missing (for development/prototyping)
369
+
370
+
For detailed evaluation capabilities and best practices, see [evaluation.md](evaluation.md).
371
+
315
372
## Cost Tracking and Optimization
316
373
317
374
The solution includes built-in cost tracking capabilities:
0 commit comments