Zero out fuel spending for non-fuel households#247
Conversation
Households with has_fuel_consumption=0 (non-vehicle owners or EV owners) now have petrol_spending and diesel_spending set to zero after imputation. This prevents the QRF from assigning fuel spending based on other predictors to households that shouldn't have any. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@PolicyEngine can you PTAL at this and see if you can get it to a point we can merge |
There was a problem hiding this comment.
This PR correctly addresses the fuel duty over-estimation issue by zeroing out fuel spending for non-fuel households after QRF imputation. The logic is sound and follows the existing pattern in the codebase.
What's good:
- The fix is minimal and targeted - just 6 lines of code
- The logic correctly uses the already-computed
has_fuel_consumptionvariable - Documentation is updated to explain the new step
- The approach makes sense: QRF is a continuous model, so post-hoc zeroing for categorical exclusions is reasonable
Minor consideration:
- The code assumes
has_fuel_consumptionis still in scope at line 347. Looking at the context, it's computed around line 336-340 before imputation, so this should work fine.
The fix should significantly reduce the fuel duty over-estimation from £49.5B towards the £24.4B target, since ~30% of households (non-vehicle owners + EV owners) will no longer incorrectly contribute to fuel spending.
| for column in output_df.columns: | ||
| dataset.household[column] = output_df[column].values | ||
|
|
||
| # Zero out fuel spending for households without fuel consumption |
There was a problem hiding this comment.
Quick verification: has_fuel_consumption is computed earlier in this function (around line 336-340) and is a numpy array that persists in scope. The == 0 comparison is correct for boolean-like integers. This looks good.
| dataset.household[column] = output_df[column].values | ||
|
|
||
| # Zero out fuel spending for households without fuel consumption | ||
| # This ensures only ICE vehicle owners contribute to fuel duty |
There was a problem hiding this comment.
This direct assignment to a subset of dataset.household[...] works because the household dict stores numpy arrays. The boolean mask indexing correctly updates only the non-fuel households in place.
| @@ -106,6 +106,8 @@ LCFS 2-week diaries undercount fuel purchasers (58%) compared to actual vehicle | |||
|
|
|||
| 4. **At FRS imputation time**: Compute `has_fuel_consumption` directly from `num_vehicles` (already calibrated to NTS targets) | |||
|
|
|||
There was a problem hiding this comment.
Good addition - the documentation clearly explains what happens and why.
Summary
has_fuel_consumption=0(non-vehicle owners or EV owners) now havepetrol_spendinganddiesel_spendingset to zero after imputationContext
After PR #244 added
has_fuel_consumptionas a predictor, fuel duties were over-estimated by ~100%. This happened because:Test plan
🤖 Generated with Claude Code