@@ -125,6 +125,17 @@ Refer to the **agentcore_runtime_deployment.ipynb** notebook to deploy your agen
125125
126126The platform includes comprehensive evaluation capabilities to assess agent performance across multiple dimensions.
127127
128+ ### How Evaluation Works
129+
130+ The evaluation system runs test queries against your agent, collects execution traces, and measures performance:
131+
132+ 1 . ** Load test queries** from ` groundtruth.json ` with expected tool usage
133+ 2 . ** Send queries to agent** endpoint and capture responses with trace IDs
134+ 3 . ** Wait for traces** to be available in Langfuse observability platform
135+ 4 . ** Extract metrics** from traces including tool calls, retrieval scores, and latencies
136+ 5 . ** Evaluate response quality** using Bedrock LLM to score faithfulness, correctness, and helpfulness
137+ 6 . ** Calculate performance metrics** and save comprehensive results to CSV files
138+
128139### Evaluation Setup
129140
130141The evaluation system consists of:
@@ -134,10 +145,16 @@ The evaluation system consists of:
134145
135146### Prerequisites
136147
137- 1 . ** Langfuse Configuration** : Ensure Langfuse is properly configured for trace collection
138- 2 . ** Agent Endpoint** : Have your agent running locally or deployed
139- 3 . ** AWS Credentials** : For Bedrock access (response quality evaluation)
140- 4 . ** Test Data** : Create ` groundtruth.json ` with test queries:
148+ 1 . ** Environment Variables** : Export Langfuse and AWS credentials:
149+ ``` bash
150+ export LANGFUSE_SECRET_KEY=" your-key"
151+ export LANGFUSE_PUBLIC_KEY=" your-key"
152+ export LANGFUSE_HOST=" your-langfuse-host"
153+ export AWS_ACCESS_KEY_ID=" your-key"
154+ export AWS_SECRET_ACCESS_KEY=" your-key"
155+ ```
156+ 2 . ** Agent Endpoint** : Have your agent running locally (` http://localhost:8080 ` ) or deployed on Bedrock AgentCore
157+ 3 . ** Test Data** : Create ` groundtruth.json ` with test queries:
141158
142159``` json
143160[
@@ -177,11 +194,12 @@ python response_quality_evaluator.py
177194
178195### Configuration
179196
180- Set environment variables :
197+ Set agent endpoint (local or AgentCore) :
181198``` bash
182- export AGENT_ARN=" http://localhost:8080" # or your deployed endpoint
183- export LANGFUSE_SECRET_KEY=" your-key"
184- export LANGFUSE_PUBLIC_KEY=" your-key"
185- export LANGFUSE_HOST=" your-langfuse-host"
199+ # For local agent
200+ export AGENT_ARN=" http://localhost:8080"
201+
202+ # For Bedrock AgentCore deployment
203+ export AGENT_ARN=" your-agentcore-endpoint"
186204```
187205
0 commit comments