-
Notifications
You must be signed in to change notification settings - Fork 2
Zero out fuel spending for non-fuel households #247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Households with has_fuel_consumption=0 (non-vehicle owners or EV owners) now have petrol_spending and diesel_spending set to zero after imputation. This prevents the QRF from assigning fuel spending based on other predictors to households that shouldn't have any. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@PolicyEngine can you PTAL at this and see if you can get it to a point we can merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR correctly addresses the fuel duty over-estimation issue by zeroing out fuel spending for non-fuel households after QRF imputation. The logic is sound and follows the existing pattern in the codebase.
What's good:
- The fix is minimal and targeted - just 6 lines of code
- The logic correctly uses the already-computed
has_fuel_consumptionvariable - Documentation is updated to explain the new step
- The approach makes sense: QRF is a continuous model, so post-hoc zeroing for categorical exclusions is reasonable
Minor consideration:
- The code assumes
has_fuel_consumptionis still in scope at line 347. Looking at the context, it's computed around line 336-340 before imputation, so this should work fine.
The fix should significantly reduce the fuel duty over-estimation from £49.5B towards the £24.4B target, since ~30% of households (non-vehicle owners + EV owners) will no longer incorrectly contribute to fuel spending.
| for column in output_df.columns: | ||
| dataset.household[column] = output_df[column].values | ||
|
|
||
| # Zero out fuel spending for households without fuel consumption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick verification: has_fuel_consumption is computed earlier in this function (around line 336-340) and is a numpy array that persists in scope. The == 0 comparison is correct for boolean-like integers. This looks good.
| dataset.household[column] = output_df[column].values | ||
|
|
||
| # Zero out fuel spending for households without fuel consumption | ||
| # This ensures only ICE vehicle owners contribute to fuel duty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This direct assignment to a subset of dataset.household[...] works because the household dict stores numpy arrays. The boolean mask indexing correctly updates only the non-fuel households in place.
| 3. **Apply to LCFS**: Impute `has_fuel_consumption` to LCFS households before training consumption model | ||
|
|
||
| 4. **At FRS imputation time**: Compute `has_fuel_consumption` directly from `num_vehicles` (already calibrated to NTS targets) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good addition - the documentation clearly explains what happens and why.
Summary
has_fuel_consumption=0(non-vehicle owners or EV owners) now havepetrol_spendinganddiesel_spendingset to zero after imputationContext
After PR #244 added
has_fuel_consumptionas a predictor, fuel duties were over-estimated by ~100%. This happened because:Test plan
🤖 Generated with Claude Code