Use num_vehicles as predictor for fuel spending imputation #244

MaxGhenis · 2025-12-03T18:43:57Z

Summary

Adds num_vehicles as a predictor for fuel spending imputation in the consumption model
Imputes num_vehicles to LCFS training data using the WAS-trained wealth model
Swaps imputation order: wealth before consumption (since consumption now depends on num_vehicles)

This builds on #243 which added vehicle ownership calibration.

Why this matters

Vehicle ownership is a strong predictor of fuel spending:

Households with 1 vehicle: ~£917/year fuel
Households with 2 vehicles: ~£1,242/year fuel
Households with 3+ vehicles: ~£1,386/year fuel

The correlation between imputed num_vehicles and fuel spending in LCFS is ~0.13, which should improve fuel duty incidence estimates.

Technical details

Since LCFS doesn't collect vehicle counts directly, we impute them using the same WAS model that's used for the FRS. For LCFS variables not in WAS (capital_income, num_bedrooms, council_tax, is_renting), we use sensible defaults.

🤖 Generated with Claude Code

- Add num_vehicles to consumption model predictors - Impute num_vehicles to LCFS training data using WAS wealth model - Swap imputation order: wealth before consumption (num_vehicles dependency) This improves fuel spending predictions by using vehicle ownership, which has ~0.13 correlation with fuel spending in LCFS. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Train a separate QRF model for vehicle imputation using only predictors available in both WAS and LCFS. This avoids biasing predictions with hardcoded values for council_tax, num_bedrooms, is_renting, etc. Improves correlation with fuel spending from 0.13 to 0.17. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Documents all imputation models with: - Source datasets (WAS, LCFS, SPI, ETB, etc.) - Predictor variables for each model - Output variables - Pipeline order and dependencies - Calibration targets 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Instead of imputing num_vehicles to LCFS (which lacks vehicle data), we now: 1. Create has_fuel_consumption in WAS from vehicle ownership: - has_fuel = (num_vehicles > 0) AND (random < 0.90) - 90% accounts for EVs/PHEVs per NTS 2024 fuel type data 2. Train QRF to predict has_fuel_consumption from demographics 3. Apply to LCFS for consumption model training 4. At FRS time, compute has_fuel_consumption from num_vehicles This properly bridges vehicle ownership (~78% of households per NTS) to fuel consumption (~70% after EV adjustment), fixing the LCFS diary undercount issue (only 58% recorded any fuel purchase). Sources cited in code: - NTS 2024 vehicle ownership: 22% none, 44% one, 34% two+ - NTS 2024 fuel type: 59% petrol, 30% diesel, 4% BEV, 6% hybrid, 2% PHEV 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

MaxGhenis and others added 5 commits December 3, 2025 13:43

bd sync: 2025-12-03 13:44:07

be54ada

MaxGhenis merged commit 6023f8c into main Dec 3, 2025
4 checks passed

MaxGhenis deleted the improve-fuel-imputation branch December 3, 2025 23:01

MaxGhenis mentioned this pull request Dec 3, 2025

Zero out fuel spending for non-fuel households #247

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use num_vehicles as predictor for fuel spending imputation #244

Use num_vehicles as predictor for fuel spending imputation #244

Uh oh!

MaxGhenis commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use num_vehicles as predictor for fuel spending imputation #244

Use num_vehicles as predictor for fuel spending imputation #244

Uh oh!

Conversation

MaxGhenis commented Dec 3, 2025

Summary

Why this matters

Technical details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants