Skip to content

Commit 5f7f119

Browse files
committed
Update readme
1 parent f16f057 commit 5f7f119

File tree

1 file changed

+27
-9
lines changed

1 file changed

+27
-9
lines changed

README.md

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,17 @@ Refer to the **agentcore_runtime_deployment.ipynb** notebook to deploy your agen
125125

126126
The platform includes comprehensive evaluation capabilities to assess agent performance across multiple dimensions.
127127

128+
### How Evaluation Works
129+
130+
The evaluation system runs test queries against your agent, collects execution traces, and measures performance:
131+
132+
1. **Load test queries** from `groundtruth.json` with expected tool usage
133+
2. **Send queries to agent** endpoint and capture responses with trace IDs
134+
3. **Wait for traces** to be available in Langfuse observability platform
135+
4. **Extract metrics** from traces including tool calls, retrieval scores, and latencies
136+
5. **Evaluate response quality** using Bedrock LLM to score faithfulness, correctness, and helpfulness
137+
6. **Calculate performance metrics** and save comprehensive results to CSV files
138+
128139
### Evaluation Setup
129140

130141
The evaluation system consists of:
@@ -134,10 +145,16 @@ The evaluation system consists of:
134145

135146
### Prerequisites
136147

137-
1. **Langfuse Configuration**: Ensure Langfuse is properly configured for trace collection
138-
2. **Agent Endpoint**: Have your agent running locally or deployed
139-
3. **AWS Credentials**: For Bedrock access (response quality evaluation)
140-
4. **Test Data**: Create `groundtruth.json` with test queries:
148+
1. **Environment Variables**: Export Langfuse and AWS credentials:
149+
```bash
150+
export LANGFUSE_SECRET_KEY="your-key"
151+
export LANGFUSE_PUBLIC_KEY="your-key"
152+
export LANGFUSE_HOST="your-langfuse-host"
153+
export AWS_ACCESS_KEY_ID="your-key"
154+
export AWS_SECRET_ACCESS_KEY="your-key"
155+
```
156+
2. **Agent Endpoint**: Have your agent running locally (`http://localhost:8080`) or deployed on Bedrock AgentCore
157+
3. **Test Data**: Create `groundtruth.json` with test queries:
141158

142159
```json
143160
[
@@ -177,11 +194,12 @@ python response_quality_evaluator.py
177194

178195
### Configuration
179196

180-
Set environment variables:
197+
Set agent endpoint (local or AgentCore):
181198
```bash
182-
export AGENT_ARN="http://localhost:8080" # or your deployed endpoint
183-
export LANGFUSE_SECRET_KEY="your-key"
184-
export LANGFUSE_PUBLIC_KEY="your-key"
185-
export LANGFUSE_HOST="your-langfuse-host"
199+
# For local agent
200+
export AGENT_ARN="http://localhost:8080"
201+
202+
# For Bedrock AgentCore deployment
203+
export AGENT_ARN="your-agentcore-endpoint"
186204
```
187205

0 commit comments

Comments
 (0)