Skip to content

Conversation

@nikhilwoodruff
Copy link
Contributor

@nikhilwoodruff nikhilwoodruff commented Dec 30, 2025

Adds infrastructure for testing and improving the agent's efficiency at answering policy questions.

API improvements

Endpoint Improvement
/parameters/ Added tax_benefit_model_name filter for country filtering
/variables/ Added search and tax_benefit_model_name filters
/datasets/ Added tax_benefit_model_name filter
/parameter-values/ Added current=true filter to get only current values

Other changes

  • Updated agent system prompt to use country filter
  • Fixed duplicate parameters in seed script (deduplicate by name)
  • Created test suite with policy questions across UK/US, household/economy
  • Created docs/AGENT_TESTING.md to track ongoing work

Baseline measurements (before these changes)

Question type Turns Target
Parameter lookup (UK personal allowance) 9-10 3-4
Household calculation (UK £50k income) 6 5-6

The main issue is parameter lookups getting mixed UK/US results. The country filter should significantly reduce turns.

nikhilwoodruff and others added 10 commits December 30, 2025 11:27
Reduces agent turns for parameter lookups by allowing country filtering.
Updated system prompt with parameter search tips.
…king)

Reorganised test categories:
- Parameter lookups are now separate from household calcs
- Economy-wide tests are actual budgetary/distributional analyses
Allows agent to get just the current value with current=true
Key improvements:
- Fix model name in system prompt (policyengine-uk with hyphen)
- Add case-insensitive search using ILIKE for parameters and variables
- Update API docstrings with correct model names

Agent can now find UK personal allowance in 3 turns vs 10 baseline.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@nikhilwoodruff nikhilwoodruff merged commit d160709 into main Dec 30, 2025
1 check passed
@nikhilwoodruff nikhilwoodruff deleted the agent-testing branch December 30, 2025 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants