Use num_vehicles as predictor for fuel spending imputation#244
Merged
Use num_vehicles as predictor for fuel spending imputation#244
Conversation
- Add num_vehicles to consumption model predictors - Impute num_vehicles to LCFS training data using WAS wealth model - Swap imputation order: wealth before consumption (num_vehicles dependency) This improves fuel spending predictions by using vehicle ownership, which has ~0.13 correlation with fuel spending in LCFS. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Train a separate QRF model for vehicle imputation using only predictors available in both WAS and LCFS. This avoids biasing predictions with hardcoded values for council_tax, num_bedrooms, is_renting, etc. Improves correlation with fuel spending from 0.13 to 0.17. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Documents all imputation models with: - Source datasets (WAS, LCFS, SPI, ETB, etc.) - Predictor variables for each model - Output variables - Pipeline order and dependencies - Calibration targets 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Instead of imputing num_vehicles to LCFS (which lacks vehicle data), we now: 1. Create has_fuel_consumption in WAS from vehicle ownership: - has_fuel = (num_vehicles > 0) AND (random < 0.90) - 90% accounts for EVs/PHEVs per NTS 2024 fuel type data 2. Train QRF to predict has_fuel_consumption from demographics 3. Apply to LCFS for consumption model training 4. At FRS time, compute has_fuel_consumption from num_vehicles This properly bridges vehicle ownership (~78% of households per NTS) to fuel consumption (~70% after EV adjustment), fixing the LCFS diary undercount issue (only 58% recorded any fuel purchase). Sources cited in code: - NTS 2024 vehicle ownership: 22% none, 44% one, 34% two+ - NTS 2024 fuel type: 59% petrol, 30% diesel, 4% BEV, 6% hybrid, 2% PHEV 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
num_vehiclesas a predictor for fuel spending imputation in the consumption modelnum_vehiclesto LCFS training data using the WAS-trained wealth modelnum_vehicles)This builds on #243 which added vehicle ownership calibration.
Why this matters
Vehicle ownership is a strong predictor of fuel spending:
The correlation between imputed
num_vehiclesand fuel spending in LCFS is ~0.13, which should improve fuel duty incidence estimates.Technical details
Since LCFS doesn't collect vehicle counts directly, we impute them using the same WAS model that's used for the FRS. For LCFS variables not in WAS (capital_income, num_bedrooms, council_tax, is_renting), we use sensible defaults.
🤖 Generated with Claude Code