Skip to content

Commit e7671bf

Browse files
author
Bob Strahan
committed
Update docs/evaluation.md with comprehensive table of contents
1 parent 94cf47d commit e7671bf

File tree

1 file changed

+76
-0
lines changed

1 file changed

+76
-0
lines changed

docs/evaluation.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,82 @@ SPDX-License-Identifier: MIT-0
33

44
# Evaluation Framework
55

6+
## Table of Contents
7+
8+
- [Evaluation Framework](#evaluation-framework)
9+
- [Stickler Evaluation Engine](#stickler-evaluation-engine)
10+
- [Architecture](#architecture)
11+
- [How It Works](#how-it-works)
12+
- [Dynamic Schema Generation](#dynamic-schema-generation)
13+
- [How It Works](#how-it-works-1)
14+
- [Type Inference Rules](#type-inference-rules)
15+
- [Auto-Generated Schema Example](#auto-generated-schema-example)
16+
- [Result Annotation](#result-annotation)
17+
- [When to Use Auto-Generation](#when-to-use-auto-generation)
18+
- [Logging and Monitoring](#logging-and-monitoring)
19+
- [Implementation Details](#implementation-details)
20+
- [Evaluation Methods](#evaluation-methods)
21+
- [Supported Methods and Their Characteristics](#supported-methods-and-their-characteristics)
22+
- [Threshold Display in Reports](#threshold-display-in-reports)
23+
- [Field Weighting for Business Criticality](#field-weighting-for-business-criticality)
24+
- [Configuration](#configuration)
25+
- [Weighted Score Calculation](#weighted-score-calculation)
26+
- [Benefits](#benefits)
27+
- [Best Practices](#best-practices)
28+
- [Type Coercion and Data Compatibility](#type-coercion-and-data-compatibility)
29+
- [Automatic Type Conversion](#automatic-type-conversion)
30+
- [When Type Coercion Happens](#when-type-coercion-happens)
31+
- [Benefits](#benefits-1)
32+
- [Limitations](#limitations)
33+
- [Best Practices](#best-practices-1)
34+
- [Assessment Confidence Integration](#assessment-confidence-integration)
35+
- [Confidence Score Display](#confidence-score-display)
36+
- [Enhanced Evaluation Reports](#enhanced-evaluation-reports)
37+
- [Quality Analysis Benefits](#quality-analysis-benefits)
38+
- [Backward Compatibility](#backward-compatibility)
39+
- [Configuration](#configuration-1)
40+
- [Stack Deployment Parameters](#stack-deployment-parameters)
41+
- [Runtime Configuration](#runtime-configuration)
42+
- [Attribute-Specific Evaluation Methods](#attribute-specific-evaluation-methods)
43+
- [Simple Attributes](#simple-attributes)
44+
- [Group Attributes](#group-attributes)
45+
- [List Attributes](#list-attributes)
46+
- [Understanding Threshold vs Match-Threshold](#understanding-threshold-vs-match-threshold)
47+
- [Method Compatibility Rules](#method-compatibility-rules)
48+
- [Attribute Processing and Evaluation](#attribute-processing-and-evaluation)
49+
- [Group Attribute Processing](#group-attribute-processing)
50+
- [List Attribute Processing](#list-attribute-processing)
51+
- [Evaluation Reports for Nested Structures](#evaluation-reports-for-nested-structures)
52+
- [Evaluation Metrics for Complex Documents](#evaluation-metrics-for-complex-documents)
53+
- [Document Split Classification Metrics](#document-split-classification-metrics)
54+
- [Overview](#overview)
55+
- [Three Types of Accuracy](#three-types-of-accuracy)
56+
- [Report Structure](#report-structure)
57+
- [Data Structure Requirements](#data-structure-requirements)
58+
- [Setup and Usage](#setup-and-usage)
59+
- [Step 1: Creating Baseline Data](#step-1-creating-baseline-data)
60+
- [Understanding the Baseline Structure](#understanding-the-baseline-structure)
61+
- [Step 2: Viewing Evaluation Reports](#step-2-viewing-evaluation-reports)
62+
- [Best Practices](#best-practices-2)
63+
- [Baseline Management](#baseline-management)
64+
- [Evaluation Strategy](#evaluation-strategy)
65+
- [Configuration Best Practices](#configuration-best-practices)
66+
- [Automatic Field Discovery](#automatic-field-discovery)
67+
- [Semantic vs LLM Evaluation](#semantic-vs-llm-evaluation)
68+
- [Metrics and Monitoring](#metrics-and-monitoring)
69+
- [Aggregate Evaluation Analytics and Reporting](#aggregate-evaluation-analytics-and-reporting)
70+
- [ReportingDatabase Overview](#reportingdatabase-overview)
71+
- [Querying Evaluation Results](#querying-evaluation-results)
72+
- [Analytics Notebook](#analytics-notebook)
73+
- [Data Retention and Partitioning](#data-retention-and-partitioning)
74+
- [Best Practices for Analytics](#best-practices-for-analytics)
75+
- [Migration from Legacy Evaluation](#migration-from-legacy-evaluation)
76+
- [What Changed](#what-changed)
77+
- [What Stayed the Same](#what-stayed-the-same)
78+
- [Migration Checklist](#migration-checklist)
79+
- [Stickler Version Information](#stickler-version-information)
80+
- [Troubleshooting Evaluation Issues](#troubleshooting-evaluation-issues)
81+
682
The GenAIIDP solution includes a built-in evaluation framework to assess the accuracy of document processing outputs. This allows you to:
783

884
- Compare processing outputs against baseline (ground truth) data

0 commit comments

Comments
 (0)