@@ -3,6 +3,82 @@ SPDX-License-Identifier: MIT-0
33
44# Evaluation Framework
55
6+ ## Table of Contents
7+
8+ - [ Evaluation Framework] ( #evaluation-framework )
9+ - [ Stickler Evaluation Engine] ( #stickler-evaluation-engine )
10+ - [ Architecture] ( #architecture )
11+ - [ How It Works] ( #how-it-works )
12+ - [ Dynamic Schema Generation] ( #dynamic-schema-generation )
13+ - [ How It Works] ( #how-it-works-1 )
14+ - [ Type Inference Rules] ( #type-inference-rules )
15+ - [ Auto-Generated Schema Example] ( #auto-generated-schema-example )
16+ - [ Result Annotation] ( #result-annotation )
17+ - [ When to Use Auto-Generation] ( #when-to-use-auto-generation )
18+ - [ Logging and Monitoring] ( #logging-and-monitoring )
19+ - [ Implementation Details] ( #implementation-details )
20+ - [ Evaluation Methods] ( #evaluation-methods )
21+ - [ Supported Methods and Their Characteristics] ( #supported-methods-and-their-characteristics )
22+ - [ Threshold Display in Reports] ( #threshold-display-in-reports )
23+ - [ Field Weighting for Business Criticality] ( #field-weighting-for-business-criticality )
24+ - [ Configuration] ( #configuration )
25+ - [ Weighted Score Calculation] ( #weighted-score-calculation )
26+ - [ Benefits] ( #benefits )
27+ - [ Best Practices] ( #best-practices )
28+ - [ Type Coercion and Data Compatibility] ( #type-coercion-and-data-compatibility )
29+ - [ Automatic Type Conversion] ( #automatic-type-conversion )
30+ - [ When Type Coercion Happens] ( #when-type-coercion-happens )
31+ - [ Benefits] ( #benefits-1 )
32+ - [ Limitations] ( #limitations )
33+ - [ Best Practices] ( #best-practices-1 )
34+ - [ Assessment Confidence Integration] ( #assessment-confidence-integration )
35+ - [ Confidence Score Display] ( #confidence-score-display )
36+ - [ Enhanced Evaluation Reports] ( #enhanced-evaluation-reports )
37+ - [ Quality Analysis Benefits] ( #quality-analysis-benefits )
38+ - [ Backward Compatibility] ( #backward-compatibility )
39+ - [ Configuration] ( #configuration-1 )
40+ - [ Stack Deployment Parameters] ( #stack-deployment-parameters )
41+ - [ Runtime Configuration] ( #runtime-configuration )
42+ - [ Attribute-Specific Evaluation Methods] ( #attribute-specific-evaluation-methods )
43+ - [ Simple Attributes] ( #simple-attributes )
44+ - [ Group Attributes] ( #group-attributes )
45+ - [ List Attributes] ( #list-attributes )
46+ - [ Understanding Threshold vs Match-Threshold] ( #understanding-threshold-vs-match-threshold )
47+ - [ Method Compatibility Rules] ( #method-compatibility-rules )
48+ - [ Attribute Processing and Evaluation] ( #attribute-processing-and-evaluation )
49+ - [ Group Attribute Processing] ( #group-attribute-processing )
50+ - [ List Attribute Processing] ( #list-attribute-processing )
51+ - [ Evaluation Reports for Nested Structures] ( #evaluation-reports-for-nested-structures )
52+ - [ Evaluation Metrics for Complex Documents] ( #evaluation-metrics-for-complex-documents )
53+ - [ Document Split Classification Metrics] ( #document-split-classification-metrics )
54+ - [ Overview] ( #overview )
55+ - [ Three Types of Accuracy] ( #three-types-of-accuracy )
56+ - [ Report Structure] ( #report-structure )
57+ - [ Data Structure Requirements] ( #data-structure-requirements )
58+ - [ Setup and Usage] ( #setup-and-usage )
59+ - [ Step 1: Creating Baseline Data] ( #step-1-creating-baseline-data )
60+ - [ Understanding the Baseline Structure] ( #understanding-the-baseline-structure )
61+ - [ Step 2: Viewing Evaluation Reports] ( #step-2-viewing-evaluation-reports )
62+ - [ Best Practices] ( #best-practices-2 )
63+ - [ Baseline Management] ( #baseline-management )
64+ - [ Evaluation Strategy] ( #evaluation-strategy )
65+ - [ Configuration Best Practices] ( #configuration-best-practices )
66+ - [ Automatic Field Discovery] ( #automatic-field-discovery )
67+ - [ Semantic vs LLM Evaluation] ( #semantic-vs-llm-evaluation )
68+ - [ Metrics and Monitoring] ( #metrics-and-monitoring )
69+ - [ Aggregate Evaluation Analytics and Reporting] ( #aggregate-evaluation-analytics-and-reporting )
70+ - [ ReportingDatabase Overview] ( #reportingdatabase-overview )
71+ - [ Querying Evaluation Results] ( #querying-evaluation-results )
72+ - [ Analytics Notebook] ( #analytics-notebook )
73+ - [ Data Retention and Partitioning] ( #data-retention-and-partitioning )
74+ - [ Best Practices for Analytics] ( #best-practices-for-analytics )
75+ - [ Migration from Legacy Evaluation] ( #migration-from-legacy-evaluation )
76+ - [ What Changed] ( #what-changed )
77+ - [ What Stayed the Same] ( #what-stayed-the-same )
78+ - [ Migration Checklist] ( #migration-checklist )
79+ - [ Stickler Version Information] ( #stickler-version-information )
80+ - [ Troubleshooting Evaluation Issues] ( #troubleshooting-evaluation-issues )
81+
682The GenAIIDP solution includes a built-in evaluation framework to assess the accuracy of document processing outputs. This allows you to:
783
884- Compare processing outputs against baseline (ground truth) data
0 commit comments