Skip to content

Conversation

@MaxGhenis
Copy link
Contributor

Summary

  • Adds num_vehicles as a predictor for fuel spending imputation in the consumption model
  • Imputes num_vehicles to LCFS training data using the WAS-trained wealth model
  • Swaps imputation order: wealth before consumption (since consumption now depends on num_vehicles)

This builds on #243 which added vehicle ownership calibration.

Why this matters

Vehicle ownership is a strong predictor of fuel spending:

  • Households with 1 vehicle: ~£917/year fuel
  • Households with 2 vehicles: ~£1,242/year fuel
  • Households with 3+ vehicles: ~£1,386/year fuel

The correlation between imputed num_vehicles and fuel spending in LCFS is ~0.13, which should improve fuel duty incidence estimates.

Technical details

Since LCFS doesn't collect vehicle counts directly, we impute them using the same WAS model that's used for the FRS. For LCFS variables not in WAS (capital_income, num_bedrooms, council_tax, is_renting), we use sensible defaults.

🤖 Generated with Claude Code

MaxGhenis and others added 5 commits December 3, 2025 13:43
- Add num_vehicles to consumption model predictors
- Impute num_vehicles to LCFS training data using WAS wealth model
- Swap imputation order: wealth before consumption (num_vehicles dependency)

This improves fuel spending predictions by using vehicle ownership,
which has ~0.13 correlation with fuel spending in LCFS.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Train a separate QRF model for vehicle imputation using only predictors
available in both WAS and LCFS. This avoids biasing predictions with
hardcoded values for council_tax, num_bedrooms, is_renting, etc.

Improves correlation with fuel spending from 0.13 to 0.17.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Documents all imputation models with:
- Source datasets (WAS, LCFS, SPI, ETB, etc.)
- Predictor variables for each model
- Output variables
- Pipeline order and dependencies
- Calibration targets

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Instead of imputing num_vehicles to LCFS (which lacks vehicle data),
we now:
1. Create has_fuel_consumption in WAS from vehicle ownership:
   - has_fuel = (num_vehicles > 0) AND (random < 0.90)
   - 90% accounts for EVs/PHEVs per NTS 2024 fuel type data
2. Train QRF to predict has_fuel_consumption from demographics
3. Apply to LCFS for consumption model training
4. At FRS time, compute has_fuel_consumption from num_vehicles

This properly bridges vehicle ownership (~78% of households per NTS)
to fuel consumption (~70% after EV adjustment), fixing the LCFS diary
undercount issue (only 58% recorded any fuel purchase).

Sources cited in code:
- NTS 2024 vehicle ownership: 22% none, 44% one, 34% two+
- NTS 2024 fuel type: 59% petrol, 30% diesel, 4% BEV, 6% hybrid, 2% PHEV

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@MaxGhenis MaxGhenis merged commit 6023f8c into main Dec 3, 2025
4 checks passed
@MaxGhenis MaxGhenis deleted the improve-fuel-imputation branch December 3, 2025 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants