Impute three variables related to new OBBBA deductions#397
Conversation
|
@donboyd5, Thanks for your helpful review of the changes in PR #397. Please, download to your computer the new version of PR #397 and execute: and let me know exactly what happens. Thanks again for your help on this. |
|
Great. It passes all not-skipped tests other than auto loan interest mean and standard deviation. It does not miss them by a lot, but still, the miss is surprisingly large (0.35% and 0.65%). Here's the key result, followed by full output: E ValueError: tests/test_imputed_variables.py:190: ValueError |
|
@martinholmer FWIW, the stats on s006 in the newly generated cached_allvars.csv match the stats on yesterday's file, so the reproducibility within a single machine (rather than across machines) seems to be pretty good for weight generation - and weight generation is probably the one area that has any meaningful randomness. |
|
@donboyd5, Again, thanks for all the helpful feedback on how the unit tests work on your computer. The newest commit to this PR loosens the tolerances for the As before, download the newest version of PR #397 to your computer, and then executie: and let me know if you can pass all the tests. |
|
Everything passes now - see below. |
|
Using Full output below: |
|
@donboyd5, Thanks for the testing on your computer as reported in this comment. |
|
Please take a look at this:
https://drive.google.com/drive/folders/1gtUF1Q66Kmu8Fa1aiEdcB9Eggbh9irKs?usp=drive_link
(It should work only for you.)
…On Tue, Feb 10, 2026 at 9:20 AM Martin Holmer ***@***.***> wrote:
*martinholmer* left a comment (PSLmodels/tax-microdata-benchmarking#397)
<#397 (comment)>
@donboyd5 <https://github.com/donboyd5>, Thanks for the testing on your
computer as reported in this comment
<#397 (comment)>
.
Could you somehow make the three tmd* files in the storage/output folder
available to me?
—
Reply to this email directly, view it on GitHub
<#397 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABR4JGBYVSDFZQJV6YNBRPD4LHSKJAVCNFSM6AAAAACUE7WW7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNZXHEZTGOBXGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@donboyd5, Thanks for the three private |
|
@donboyd5, comparing the So, now that the test tolerances are adjusted so that we can both pass all the unit tests, you might have a look at the new code in this PR to see if you can identify any logic problems. |
|
@martinholmer, thanks for doing all this work. I had an extensive conversation with Claude to try to understand potential issues or improvements. I give what I consider the highlights below. In a subsequent comment I will copy the version of the entire Claude conversation that Claude put into a .md file (there was a lot more back and forth than Claude put into the file). I did not verify everything Claude said but did ask some follow-up questions to make sure we understood each other. In the full conversation I am not sure Claude did the best possible comparisons that I asked for, and we could improve them with more effort, I am sure. My advice is to consider the 3 items I discuss below, make any changes you deem warranted, and merge. There are three areas where I think improvements might be worth doing before merging:
I did not do a serious search for any data sources that might be better for current purposes than the SIPP and CEX. I am pretty familiar with most major U.S. socio-economic survey datasets and I think these two are probably best. You'll find my full Q&A below. I think there are some useful insights but I would not recommend holding up the merge to explore any of them in detail. |
|
@martinholmer, here is the full .md file that Claude created from our conversation (it leaves out some of the questioning back and forth where we worked to make sure we were talking about the same things). There is a link within to some python code it wrote that's in my associated branch in my fork, although I'm not sure we need it for anything: PR #397 (impute-obbba-vars) — Q&A NotesBackgroundPR #397 on the upstream PSLmodels/tax-microdata-benchmarking repo adds imputation of three OBBBA (One Big Beautiful Bill Act) variables: Q1. Is my high-level understanding correct?Question: Here is my high-level summary of what the PR does. Is it correct?
Answer: The high-level understanding is mostly correct, with three clarifications: Correction 1: Auto loan interest is imputed as dollar amounts, not fractionsThe fraction-based approach applies only to overtime and tips (from SIPP). For those, the code computes For auto loan interest (from CEX), the code imputes the dollar amount directly — there is no conversion to a fraction of income. The Correction 2: The tree creates flexible leaf-node groups, not predefined cellsThe phrase "for each marital status, age, and wage or income group" suggests predefined cross-tabulated cells. In reality, the ExtraTreeRegressor decides how to partition the predictor space. It may split on age at 45 in one branch and at 62 in another; it may split on wages at $30K for married people but $50K for unmarried people. The resulting leaf nodes are data-adaptive groups, not a fixed grid of marital-status x age-bracket x income-bracket cells. The conceptual idea is correct — similar people get matched together — but the mechanism is more flexible than fixed cells. On deflation with fractionsThe intuition is right that fractions themselves don't need deflation. However, deflation still matters because wages ( For auto loan interest, deflation is essential since the imputed quantity is a dollar amount. On
|
| Group | What it tests |
|---|---|
| Constructor validation (~20 tests) | Every parameter is type-checked and range-checked: wrong types, out-of-range values, mismatched lengths, duplicate indices, etc. Pure input-validation testing. |
| Basic imputation (~8 tests) | Does imputation actually fill in NaNs? Does it leave non-missing values untouched? Does it work with ignored variables? Does it validate input shape and missing patterns? |
| Monotone vs. non-monotone mode (~8 tests) | Does monotone mode enforce the monotone missing pattern? Does it skip initialization? Do the post-imputation adjustments (shift, scale, zero_below_abs, convert_zero_prob) actually change values in the expected direction? |
| Reproducibility (~5 tests) | Same seed = identical results? Different seed = different results? Statistics reproducible? Seed increments across iterations? |
None of these tests say anything about whether the imputed tax data is reasonable. They only verify that the MICE algorithm is mechanically correct.
tests/test_weights.py (34 lines, 1 test)
Checks that the TMD weight variable (s006) has the expected mean and standard deviation. This is a regression/sanity test to ensure the TMD file hasn't been corrupted — not specific to imputation quality.
Category 2: Quality / plausibility tests
tests/test_imputed_variables.py (194 lines, 2 tests)
These are the substantive tests that examine whether the imputed values produce reasonable policy results:
Test 1: test_obbba_deduction_tax_benefits — The key quality test. It runs Tax-Calculator to simulate each new OBBBA deduction in 2026 and checks three statistics per deduction:
| Statistic | Meaning |
|---|---|
totben |
Total tax benefit in billions of dollars |
affpct |
Percent of tax units affected |
affben |
Average dollar benefit among those affected |
These are compared against expected values that were calibrated to match Tax Policy Center (TPC) estimates. The comments in the code are explicit about this — e.g., for overtime, the calibration parameters (convert_zero_prob=0.073, scale=2.6) were chosen so that affpct is close to TPC's 8.8% and affben is close to TPC's $1,400. For the combined "ALL" scenario, the test checks against TPC's published 29.6% affected and $1,081 average benefit.
This is the test that validates the end-to-end plausibility of the imputation: do the imputed values, when run through the tax system, produce results consistent with independent expert estimates?
Test 2: test_imputed_variable_distribution — A simpler distributional check. It verifies that the unweighted mean and standard deviation of each imputed variable match expected values within tight tolerances. This is more of a regression test (did the imputation produce the same distribution as last time?) than a plausibility test, but it does guard against the distribution shifting unexpectedly.
Summary
| Test file | Category | What it validates |
|---|---|---|
test_mice.py |
Technical | MICE algorithm works correctly as software |
test_weights.py |
Technical | TMD file integrity (weights unchanged) |
test_imputed_variables.py test 1 |
Quality | Imputed values produce tax policy results matching TPC estimates |
test_imputed_variables.py test 2 |
Quality/regression | Distributional statistics of imputed values are stable |
Q3. Where do the expected values in test_imputed_variable_distribution come from?
Question: When you say "Checks unweighted mean and std dev of each imputed variable against expected values" — where do the expected values come from? SIPP and CEX?
Answer: Neither. The expected values are hardcoded snapshots of the imputation's own output. Someone ran the imputation code, observed the resulting mean and standard deviation of each variable in the output TMD file, and recorded those numbers as the expected values in the test.
They don't come from SIPP, CEX, or any external benchmark. This is purely a regression test — "did the code produce the same output as last time?" If someone changes the imputation logic, the MICE seed, the post-imputation adjustments, or the input data, this test will catch that the distribution shifted.
It does not tell you whether those means and standard deviations are correct in any external sense. That job belongs to test 1 (test_obbba_deduction_tax_benefits), which compares against TPC estimates.
Q4. Where exactly are the calibration knobs, and what do they do?
Question: Where in the code are the calibration parameters set, and what is their practical effect?
Answer: They are set directly in tmd/create_taxcalc_imputed_variables.py when constructing MICE:
- SIPP-based overtime/tip imputation uses:
convert_zero_prob=[0.073, 0.012]scale=[2.6, 1.1]- in
create_sipp_imputed_tmd(...)attmd/create_taxcalc_imputed_variables.py:562.
- CEX-based auto-loan-interest imputation uses:
convert_zero_prob=[0.000]scale=[1.0]- in
create_cex_imputed_tmd(...)attmd/create_taxcalc_imputed_variables.py:605.
The adjustment order inside MICE.impute(...) is:
- Draw donor values from matching tree leaves
- Apply shift/floor/ceiling/zeroing rules
- Optionally convert some zeros to nonzero values
- Multiply by
scale
That order is implemented in tmd/utils/mice.py:473.
For overtime/tips, after imputation and adjustments, fractions are clipped to [0, 0.5] in tmd/create_taxcalc_imputed_variables.py:572, then converted to dollars by multiplying by e00200 and summing to tax-unit level in tmd/create_taxcalc_imputed_variables.py:526.
Q5. What does the current test suite not verify?
Question: If tests pass, what important things could still be wrong or fragile?
Answer: Several things could still be off even with all tests green:
-
No subgroup plausibility checks
The policy-result test checks national totals (totben,affpct,affben) only. It does not verify distributions by income decile, filing status, age, etc. (tests/test_imputed_variables.py:20). -
No external benchmark for auto loan interest
Comments explicitly say ALI calibration is not tuned to external targets because TPC did not provide comparable statistics (tests/test_imputed_variables.py:76). -
Distribution regression is unweighted
test_imputed_variable_distributionuses simple unweighted mean/std (varray.mean(),varray.std()), not weighted moments usings006(tests/test_imputed_variables.py:177). -
Snapshot tests can lock in accidental behavior
Because expected values are hardcoded outputs, a subtle modeling issue can persist if it was present when the snapshot was recorded.
Q6. Is there any implementation detail worth flagging for future refinement?
Question: Any code-level choices that are valid but maybe worth revisiting?
Answer: Three notable ones:
-
Zero-to-nonzero replacement is global
As discussed earlier, replacement draws from all nonzero predicted values for that variable, not from leaf-local values (tmd/utils/mice.py:489). That can weaken local predictor consistency. -
Overtime/tip clipping at 50% is a hard cap
Clipping at0.5prevents extreme fractions (tmd/create_taxcalc_imputed_variables.py:572), which is practical, but it is still a modeling assumption rather than an empirically estimated bound. -
CEX income is top-coded via
clip(lower=0.0)plus rounding but no explicit upper-tail treatment
This is not necessarily wrong, but if the fit is sensitive in the upper income tail, explicit diagnostics could be useful (tmd/create_taxcalc_imputed_variables.py:358andtmd/create_taxcalc_imputed_variables.py:415).
Q7. Desk audit: Do the external data sources match the tax-law concepts?
Question: For each of the three variables, what exactly did OBBBA change in tax law, what does Tax-Calculator need as input, and what do SIPP and CEX actually measure? Are there conceptual mismatches?
Answer:
7A. Overtime income
What the tax law requires
The OBBBA overtime deduction (effective 2025–2028) is for qualified overtime compensation, defined as the overtime premium only — the amount exceeding the regular rate of pay. For time-and-a-half, that's the "half" portion. The IRS instructs workers whose employer reports total overtime to divide by 3 to isolate the premium. (IRS guidance)
Additional rules:
- Only overtime required under FLSA Section 7 (hours over 40/week) qualifies
- Cap: $12,500 ($25,000 joint)
- Phase-out: begins at MAGI $150,000 ($300,000 joint)
- Self-employed individuals are not eligible
What Tax-Calculator expects
The input variable overtime_income is described as "Filing unit's OBBBA-qualified overtime income." The MiscDed function in calcfunctions.py treats it directly as the deductible amount (subject to cap and phase-out). It does not divide by 3 internally — it expects the premium amount as input. (Tax-Calculator records_variables.json)
What SIPP measures
SIPP variable TJB{n}_OXAMT is the monthly overtime amount for each job. The SIPP data dictionary PDF in the repo is not text-searchable, and the Census Bureau's online documentation does not provide a precise definition distinguishing total overtime pay from premium. However, based on the SIPP questionnaire structure (which asks about earnings components), OXAMT most likely reports total overtime pay, not the premium portion. The SIPP was designed well before OBBBA and has no reason to separate the premium from the regular-rate portion of overtime.
Additionally, TJB{n}_MSUM appears to be the monthly earnings summary. The code computes e00200 = wage + otm + tip (line 287 of create_taxcalc_imputed_variables.py), treating these as additive components.
Conceptual mismatch: SIGNIFICANT
The imputation code appears to impute total overtime pay, but Tax-Calculator needs the overtime premium. If overtime is paid at time-and-a-half (the FLSA minimum), the premium is only 1/3 of total overtime pay. The code does not perform the divide-by-3 adjustment. This means the imputed overtime_income values may be roughly 3x too large before the scale and cap adjustments are applied.
However, the calibration parameters (scale=2.6, convert_zero_prob=0.073) were tuned to produce results matching TPC estimates. So the end result may be reasonable in aggregate, but the path from SIPP data to the imputed variable involves a conceptual mismatch that is compensated for by the calibration knobs rather than by a principled conversion.
Universe mismatch: SIPP covers all persons age 15+, including non-filers. The imputation maps SIPP individual-level data to TMD tax-unit-level data, so non-filers in SIPP serve as donors for filers in TMD. This is not necessarily wrong — SIPP donors with zero/low wages would naturally match to TMD records with similar characteristics — but the populations differ.
7B. Tip income
What the tax law requires
The deduction covers qualified tips — voluntary cash or charged tips received from customers, or through tip sharing, in occupations customarily receiving tips (IRS to publish a list). (IRS overview)
- Cap: $25,000/year
- Phase-out: begins at MAGI $150,000 ($300,000 joint)
- Self-employed in Specified Service Trades or Businesses are ineligible, as are their employees
What Tax-Calculator expects
tip_income is "Filing unit's OBBBA-qualified tip income." Like overtime, it's used directly as the deductible amount (subject to cap and phase-out).
What SIPP measures
TJB{n}_TXAMT is the monthly tip amount for each job. This likely captures total tips reported by the respondent, which is conceptually close to the qualified amount (tips are tips — there is no "premium" vs. "base" distinction).
Conceptual mismatch: MINOR
The tip concept is a much closer match than overtime. SIPP tip amounts should approximate qualified tip income reasonably well. Minor issues:
- SIPP may capture tips in occupations not on the IRS "customarily receiving tips" list (though such cases are likely rare)
- SIPP self-reports may understate actual tips
- The restriction excluding Specified Service Trade employees may not be captured
- Universe mismatch same as overtime (SIPP includes non-filers)
7C. Auto loan interest
What the tax law requires
The deduction covers interest on loans for qualified passenger vehicles — defined as cars, minivans, vans, SUVs, pickups, or motorcycles under 14,000 lbs GVWR. (IRS overview)
Critical restrictions:
- Vehicle must be new (original use starts with taxpayer — used vehicles do not qualify)
- Vehicle must have undergone final assembly in the United States
- Loan must be originated after December 31, 2024
- For personal use only (not business)
- Secured by a lien on the vehicle
- Leases do not qualify
- Cap: $10,000/year
- Phase-out: begins at MAGI $100,000 ($200,000 joint)
What Tax-Calculator expects
auto_loan_interest is "Filing unit's interest payments on OBBBA-qualified auto loans." Again, used directly as the deductible amount.
What CEX measures
VEHFINCQ is described in the CEX data dictionary as "Vehicle finance charges this quarter," defined as SUM(510110, 510901, 510902, 850300) — a sum of four UCC (Universal Classification Code) expenditure categories. Based on general BLS CEX documentation:
- "Finance charges" in the consumer lending context means the total cost of borrowing, which can include interest, origination fees, processing fees, and other lender charges — not just interest. (Experian: What is a finance charge on a car loan?)
- The CEX vehicle section covers all owned vehicles — new and used, domestic and foreign
- It does not distinguish vehicles by country of assembly
- It likely covers all vehicle types owned by the consumer unit
Conceptual mismatch: SUBSTANTIAL
Several conceptual gaps:
| Tax law requirement | CEX measures | Match? |
|---|---|---|
| Interest only | Total finance charges (interest + fees) | Wider than needed |
| New vehicles only | All vehicles (new and used) | Much wider |
| Domestically assembled only | All vehicles regardless of origin | Much wider |
| Loans originated after 12/31/2024 | All current loan payments | Much wider |
| Personal use only | All consumer-unit vehicles | Likely close (CEX is consumer survey) |
| Loans only (no leases) | Unclear if lease finance charges are included | Uncertain |
The CEX measure is systematically broader than the tax-law concept in every dimension. It captures finance charges (not just interest) on all vehicles (not just new, not just domestic). This means the raw CEX data overstates the qualifying amount.
However, the calibration for auto loan interest uses scale=1.0 and convert_zero_prob=0.0 — i.e., no post-imputation adjustment at all. The code comments note that TPC did not provide comparable statistics for this deduction. The resulting imputed values are therefore the raw MICE output without any external calibration.
Universe mismatch: CEX "consumer units" approximate households/families, not tax-filing units. A consumer unit can contain multiple tax-filing units, or non-filers. The CEX FINCBTAX income concept (income before tax) differs from AGI.
Summary of conceptual alignment
| Variable | Tax-law concept | External data concept | Severity of mismatch | Compensated by calibration? |
|---|---|---|---|---|
overtime_income |
Overtime premium (e.g., the "half" in time-and-a-half) | Likely total overtime pay | Significant — ~3x overstatement | Yes — scale=2.6 and convert_zero_prob=0.073 tuned to match TPC |
tip_income |
Qualified tips | Total reported tips | Minor — concepts are close | Lightly — scale=1.1, convert_zero_prob=0.012 |
auto_loan_interest |
Interest on new, domestic-assembled vehicle loans originated after 2024 | Finance charges on all vehicle loans | Substantial — much wider scope | No — no calibration applied |
Q8. Would it be cleaner to apply the 1/3 rule explicitly before calibrating overtime?
Question: For overtime, wouldn't it be conceptually cleaner to (1) impute total overtime pay from SIPP, (2) apply the IRS 1/3 rule to get the premium, (3) then calibrate? It probably wouldn't change any final numbers (just re-tune the calibration knobs), but it would be more principled and future-proof.
Answer: Yes, this is a sound idea. Here's the analysis:
Why it's cleaner
Currently the pipeline is:
SIPP total overtime → fraction of comp → MICE impute → calibrate (scale=2.6) → dollars
The imputed quantity has no clean real-world interpretation at the intermediate stage — it's a fraction of total overtime pay that has been scaled by 2.6 to match a policy target that's based on premium pay. The scale=2.6 conflates two things: (a) correcting for the total-vs-premium mismatch, and (b) calibrating the level to match TPC.
The proposed pipeline would be:
SIPP total overtime → fraction of comp → MICE impute → apply 1/3 → calibrate → dollars
Now the intermediate quantities have clear interpretations at each stage:
- After MICE: imputed total-overtime fraction (directly comparable to SIPP donor data)
- After 1/3: imputed overtime-premium fraction (aligned with the tax-law concept)
- After calibration: calibrated overtime-premium fraction (tuned to match TPC)
Would it change final numbers?
No, not if you re-tune the calibration parameters. Inserting a * (1/3) step before the scale would require roughly scale ≈ 2.6 * 3 ≈ 7.8 to produce the same final values. (Not exactly 7.8 because the 50% cap interacts with the order of operations, but close.)
Why it's future-proof
If external targets for total overtime pay ever become available (e.g., from BLS or IRS W-2 data), you could validate the MICE-imputed total-overtime fraction directly against those targets before the 1/3 conversion. Right now, that's not possible because the calibration bakes the total-to-premium conversion into the scale factor.
One nuance: the 1/3 rule is a simplification
The IRS "divide by 3" assumes time-and-a-half across the board. Some workers earn double-time (premium = 1/2 of total overtime) or other rates. The true premium-to-total ratio varies by worker. Using a flat 1/3 is still more principled than no conversion, and it matches IRS guidance, but it's worth noting as an approximation.
Where the change would go
In prep_sipp_for_imputation() at create_taxcalc_imputed_variables.py:290-291, after computing otm_frac, add a line like:
otm_frac /= 3.0 # convert total overtime fraction to premium fraction per IRS guidanceOr equivalently, apply it after MICE imputation but before calibration. Then re-tune scale (and possibly convert_zero_prob) to re-match TPC targets.
Q9. Auto loan interest mismatch — adjustments and data sources
Question: The auto loan interest mismatch seems especially important because there are no external targets. Can we confirm that? And what adjustments to the CEX data might narrow the mismatch — specifically for new vehicles only, interest rates, proportion assembled in the US, and excluding leases? What data sources could inform each adjustment?
Answer:
Confirming: no external targets for auto loan interest
Confirmed. The code comments at test_imputed_variables.py:75-80 explicitly state:
"The ALI imputation calibration parameters used in the create_taxcalc_imputed_variables.py module do not adjust the MICE-imputed values of auto_loan_interest because the Tax Policy Center did not provide any statistics for this new deduction."
The calibration uses scale=1.0 and convert_zero_prob=0.0 — no adjustment at all. PR #397 discussion on GitHub contains no substantive conversation about ALI targets or validation; the only ALI-related discussion concerned loosening test tolerances for cross-machine reproducibility. No issues or PRs in either the tax-microdata-benchmarking or Tax-Calculator repos discuss external benchmarks for auto loan interest.
The test_obbba_deduction_tax_benefits test does produce ALI results (totben=1.72B, affpct=10.29%, affben=$87) but these are self-referential snapshots, not externally validated.
Proposed adjustments, ranked by impact
Here are the mismatches, roughly ordered from largest expected impact to smallest:
1. New vehicles only (LARGE impact — likely ~75% reduction)
The problem: CEX VEHFINCQ covers finance charges on all vehicles — new and used. Used vehicles account for roughly 70-75% of vehicle sales (~36M used vs. ~16M new in 2023). Used vehicles are more likely to be fully paid off, but those with outstanding loans contribute to VEHFINCQ.
Possible adjustment approach: Apply a scalar reduction factor based on the share of total vehicle finance charges attributable to new vehicles. This is not simply the sales share because new cars have higher loan balances and higher financing rates.
Data sources:
- BTS New and Used Passenger Car Sales and Leases — sales volume split
- Experian State of Auto Finance reports — average loan amounts for new ($40K+) vs. used ($25K) vehicles, financing penetration rates (most new vehicles are financed; only ~35% of used)
- CFPB Auto Loans data — outstanding balances by loan type
- CEX detail files (OVB section) may distinguish new vs. used vehicle purchases within the interview data — worth investigating whether the individual vehicle records in the CEX can be filtered
A rough estimate: New vehicles represent ~30% of vehicle sales but a higher share of outstanding loan balances (higher amounts, higher financing rate). A reasonable first-cut scalar might be 0.35–0.45 of total VEHFINCQ. This is the single biggest adjustment.
2. US final assembly only (MODERATE impact — likely ~50% further reduction)
The problem: Only vehicles with final assembly in the United States qualify.
Data: Approximately 50% of new vehicles sold in the US were assembled domestically in 2024. Of the ~400 light-duty models sold in the US for 2025, 133 are made solely in the US, 248 are imported, and 19 split.
Data sources:
- Kogod Made in America Auto Index — detailed domestic content rankings
- Cars.com American-Made Index — models ranked by US assembly
- FRED Domestic Auto Production (DAUPSA) — domestic production volume
- NHTSA VIN decoder data — could theoretically determine assembly location by model
A rough estimate: ~50% of new vehicles sold are domestically assembled. So the qualifying share of new-vehicle finance charges would be roughly halved again.
3. Exclude leases (MODERATE impact — ~20-25% reduction of new-vehicle charges)
The problem: Leases do not qualify; only purchased vehicles with loans do.
Data: Leasing accounted for ~21% of new vehicle transactions in 2023 and ~26% in 2024.
Data sources:
- Statista: Share of new vehicles on lease (may require subscription)
- Experian State of Auto Finance quarterly reports — lease vs. purchase share
- Cox Automotive market insights
Note: It's unclear whether CEX VEHFINCQ includes lease finance charges. If it does, removing ~20-25% of the new-vehicle portion accounts for leases. If CEX already excludes leases (which is possible — the interview may ask specifically about loan payments), this adjustment is unnecessary. The CEX questionnaire would need to be checked.
4. Finance charges vs. interest only (SMALL impact — likely <5%)
The problem: "Finance charges" include origination fees, documentation fees, etc. in addition to interest. The tax deduction is for interest only.
In practice: For most auto loans, interest dominates the finance charge. Fees are typically a small fraction of total finance charges — an origination fee of $200-500 on a $35K loan where total interest is $5K-8K is only 3-6% of total charges.
Data sources:
- CFPB auto loan key terms
- Experian/LendingTree average auto loan data
Verdict: This is a small adjustment and probably not worth worrying about given the much larger issues above.
5. Business vs. personal use, secured by lien (MINIMAL impact)
Agreed — likely minimal. The CEX is a consumer/household survey, so business vehicles are largely excluded. Virtually all auto loans are secured by a lien on the vehicle (that's how auto lending works). These don't need adjustment.
6. The $10K cap and the loan-origination-date requirement
The $10K cap is applied by Tax-Calculator during the simulation (via the AutoLoanInterestDed_c parameter), not in the imputation. So the imputation doesn't need to worry about it — Tax-Calculator handles it.
The loan-origination-date requirement (after 12/31/2024) is conceptually the hardest to handle. In steady state, most qualifying loans would be relatively recent. But for a 2026 simulation, only loans originated in 2025 qualify. This significantly limits the eligible population compared to CEX data that covers all outstanding loans. Over time this constraint relaxes (by 2028, loans from 3+ years qualify). This is hard to adjust for with CEX data alone because the CEX doesn't distinguish loan origination dates.
Combined rough estimate
If we compose the adjustment factors multiplicatively:
| Step | Factor | Cumulative |
|---|---|---|
| Start: CEX VEHFINCQ | 1.00 | 1.00 |
| New vehicles only | × 0.40 | 0.40 |
| US-assembled only | × 0.50 | 0.20 |
| Exclude leases | × 0.80 | 0.16 |
| Interest vs. finance charges | × 0.95 | ~0.15 |
This very rough calculation suggests that the qualifying amount is perhaps ~15% of the raw CEX vehicle finance charges. The current code uses 100% (scale=1.0). If this estimate is even directionally correct, the current imputation substantially overstates qualifying auto loan interest.
The current test shows totben=$1.72B and affpct=10.29%. If the true qualifying amount is ~15% of what's imputed, the actual tax benefit would be much smaller and fewer tax units would be affected. (Though the $10K cap and phase-outs applied by Tax-Calculator would compress the impact somewhat, since many imputed values may already exceed the cap.)
Recommended approach
A principled adjustment would apply a composite scalar to CEX VEHFINCQ before imputation (similar to the proposed 1/3 rule for overtime). Then calibrate if/when external targets become available. The scalar could be documented with its components, making each assumption transparent and individually adjustable as better data emerges.
Q10. Data comparison: How do SIPP and CEX source data compare to TMD imputed results?
Question: For each of the three imputed variables, compare the source data (SIPP for overtime/tips, CEX for auto loan interest) to the TMD output. Compute % nonzero, distributional statistics for nonzero values (min, max, median, weighted mean, weighted std dev), broken down by "cells" (marital status × age group). Also compute correlations between the imputed variables and their predictors, and between overtime and tip income. Weight records appropriately and make populations as comparable as possible.
Answer: Analysis code is in analysis_compare_sources.py. The script replicates the data-reading logic from create_taxcalc_imputed_variables.py to ensure comparable populations: SIPP individuals with positive weight (aggregated to annual, self-employment excluded, wages deflated to TMD year 2021); CEX units with positive weight (quarterly weights ÷ 4, VEHFINCQ annualized × 4, deflated to TMD year); TMD tax units as output. Fractions are computed as overtime_income/e00200 and tip_income/e00200 for comparability between SIPP (which imputes fractions) and TMD (which stores dollar amounts).
Key caveat: SIPP statistics are at the individual level (persons with positive weight), while TMD statistics are at the tax-unit level. For married couples, the TMD pipeline splits each tax unit into head and spouse for MICE imputation, then re-aggregates. The TMD fractions shown here are tax-unit-level ratios (overtime_income / e00200), which blend the head and spouse fractions. This makes the TMD fractions not perfectly comparable to SIPP individual-level fractions, but the comparison is still informative.
Key Findings
-
Overtime fraction: SIPP has 6.2% nonzero (weighted mean 0.081); TMD has 16.4% nonzero (weighted mean 0.149). TMD overtime is ~2.6× more prevalent and ~1.8× larger in magnitude, reflecting
convert_zero_prob=0.073andscale=2.6. Dollar aggregates: SIPP $63B vs TMD $238B (3.8× amplification). -
Tip fraction: SIPP has 2.7% nonzero (weighted mean 0.166); TMD has 5.4% nonzero (weighted mean 0.171). Tips are moderately amplified, matching the light calibration (
convert_zero_prob=0.012,scale=1.1). Dollar aggregates: SIPP $33B vs TMD $62B (~1.9×). -
Auto loan interest: CEX has 21.1% nonzero (weighted mean $557); TMD has 19.3% nonzero (weighted mean $509). Distributions are remarkably close — consistent with no post-imputation adjustment (
scale=1.0,convert_zero_prob=0.0). Dollar aggregates: CEX $15.8B vs TMD $18.0B (difference due to larger TMD weighted population). -
Correlations: Signs are perfectly preserved between source and TMD for all variable pairs. Magnitudes are systematically attenuated (weaker in TMD), especially for ALI. The tip_frac–age correlation is remarkably well preserved (−0.080 in both SIPP and TMD).
10A. Overtime fraction: SIPP vs TMD
Population: records with positive wages (e00200 > 0).
| Source | Weighted pop (M) | % nonzero (wtd) | Wtd mean (nonzero) | Wtd std (nonzero) | Wtd median (nonzero) |
|---|---|---|---|---|---|
| SIPP (individuals) | 162.1 | 6.2% | 0.0809 | 0.0876 | 0.0515 |
| TMD (tax units) | 128.2 | 16.4% | 0.1486 | 0.1504 | 0.0928 |
Key observations:
-
Prevalence is ~2.6× higher in TMD (16.4% vs 6.2%). This is the direct effect of
convert_zero_prob=0.073, which converts 7.3% of zeros to nonzero, plus the married-couple expansion (married TMD units get two individual draws, so either head or spouse having nonzero overtime makes the tax unit nonzero). -
Mean fraction among nonzero is ~1.8× higher in TMD (0.149 vs 0.081). This reflects
scale=2.6combined with the 0.5 cap (which compresses the upper tail — without the cap, the ratio would be closer to 2.6×). -
Dollar aggregates diverge sharply: SIPP $63B vs TMD $238B (3.8× ratio). The combination of more nonzero records and higher fractions, applied to TMD's wage base, produces this amplification. This is by design — the calibration targets TPC estimates of total overtime deductions.
-
Cell-level patterns: In SIPP, overtime prevalence is fairly uniform across cells (4–9%). In TMD, married units show much higher prevalence (17–43%) than unmarried (12–15%), likely because the two-individual expansion doubles the chance of getting a nonzero imputation.
10B. Tip fraction: SIPP vs TMD
| Source | Weighted pop (M) | % nonzero (wtd) | Wtd mean (nonzero) | Wtd std (nonzero) | Wtd median (nonzero) |
|---|---|---|---|---|---|
| SIPP (individuals) | 162.1 | 2.7% | 0.1660 | 0.1546 | 0.1082 |
| TMD (tax units) | 128.2 | 5.4% | 0.1711 | 0.1602 | 0.1064 |
Key observations:
-
Prevalence roughly doubled (5.4% vs 2.7%). The light calibration (
convert_zero_prob=0.012,scale=1.1) plus the married-couple expansion accounts for this. -
Mean fractions are nearly identical (0.171 vs 0.166). The
scale=1.1barely changes the distribution shape. This is a much closer match than overtime. -
Dollar aggregates: SIPP $33B vs TMD $62B (1.9× ratio). The amplification comes primarily from the prevalence doubling, not from fraction inflation.
-
Age gradient preserved: Both SIPP and TMD show younger workers (<25) having the highest tip prevalence (7–10%), consistent with food-service employment patterns.
10C. Auto loan interest: CEX vs TMD
| Source | Weighted pop (M) | % nonzero (wtd) | Wtd mean (nonzero) | Wtd std (nonzero) | Wtd median (nonzero) |
|---|---|---|---|---|---|
| CEX (consumer units) | 134.6 | 21.1% | $557 | $611 | $355 |
| TMD (tax units) | 184.0 | 19.3% | $509 | $586 | $322 |
Key observations:
-
Distributions are remarkably similar — consistent with
scale=1.0andconvert_zero_prob=0.0(no post-imputation adjustment). The MICE algorithm is transferring the CEX distribution to TMD with minimal distortion. -
Weighted totals: CEX $15.8B vs TMD $18.0B. The difference arises because TMD has a larger weighted population (184M vs 135M tax units), not from distributional changes.
-
Cell-level patterns preserved: Married units have higher prevalence (~25–31% in both) than unmarried (~9–22%), and middle-aged units (45-54) have the highest mean amounts. These patterns transfer cleanly from CEX to TMD.
-
This confirms the Q9 concern: The raw CEX data flows through essentially unchanged, meaning all the conceptual mismatches identified in Q9 (new vehicles only, US-assembled, interest vs. finance charges) remain unaddressed in the TMD output.
10D. Correlations
| Pair | SIPP | TMD | Comment |
|---|---|---|---|
| overtime_frac, tip_frac | −0.015 | −0.025 | Both show weak negative — having overtime slightly reduces tip likelihood. Preserved. |
| overtime_frac, age | −0.004 | +0.003 | Near zero in both. No age gradient for overtime. |
| overtime_frac, e00200 | +0.033 | +0.006 | Weak positive in SIPP, nearly zero in TMD. Some attenuation. |
| tip_frac, age | −0.080 | −0.080 | Identical. Young workers get more tips. Well preserved. |
| tip_frac, e00200 | −0.026 | −0.011 | Both negative (tipped workers earn less). Some attenuation in TMD. |
| Pair | CEX | TMD | Comment |
|---|---|---|---|
| ALI, age | −0.087 | −0.028 | Both negative (younger = more car loans). Attenuated in TMD. |
| ALI, income | +0.107 | +0.025 | Both positive (higher income = more car spending). Substantially attenuated in TMD. |
Correlation observations:
-
Sign preservation is perfect — every correlation has the same sign in source and TMD. The MICE algorithm preserves the directional relationships.
-
Magnitude is systematically attenuated — TMD correlations are weaker than source correlations, especially for ALI. This is expected: MICE with ExtraTreeRegressor produces noisier imputed values than the true donor relationships, and the random sampling within tree leaves adds variance.
-
tip_frac vs age is remarkably well preserved (−0.080 in both SIPP and TMD), suggesting age is a strong predictor that the tree captures well.
-
ALI correlations are most attenuated — the CEX→TMD transfer weakens the ALI-income correlation from +0.11 to +0.02. This may reflect the different income definitions (CEX FINCBTAX vs TMD constructed income) and the population mismatch between consumer units and tax units.
10E. Population comparison notes
- SIPP weighted population (162M individuals) is larger than TMD wage-earning tax units (128M) because SIPP counts all wage-earning individuals including those who are spouses within a married tax unit. Each married TMD tax unit may correspond to two SIPP individuals.
- CEX weighted population (135M consumer units) is smaller than TMD total tax units (184M) because consumer units approximate households, which can contain multiple tax-filing units (e.g., adult children filing separately).
- These population differences mean weighted dollar aggregates are not directly comparable. The distributional statistics (fractions, percentages, correlations) are more meaningful comparisons.
|
@donboyd5 said in an above comment:
|
|
@martinholmer, thanks for opening issues #398 and #399. With the two new commits addressing item 1 above, the only test failures are those below, which are extremely close to passing with current tolerances. |
|
@martinholmer, please see issue #400. I suggest that I address that right after you merge this PR. There is a good chance that implementing the proposed steps in #400 will reduce or eliminate the need to worry about cross-machine differences, and I think reweighting should also then run much more quickly on CPU-only machines. |
|
All tests now pass after commit: Loosen totben relative tolerance in test_obbba_deduction_tax_benefits The tail end of output is below: |
|
Done, @martinholmer. I will start working on #400 after I finish certain must-do things tomorrow. I don't think it should take too long. It would be nice to get to the point where cross-machine differences are so small as to be unnoticeable (and to have faster optimization); this seems possible. Would definitely appreciate your review once I have a PR. |

Fixes issue #396 by imputing
overtime_incomeandtip_incomefrom SIPP data and imputingauto_loan_interestfrom CEX data.Read the new
tests/test_imputed_variables.pyandtmd/create_taxcalc_imputed_variables.pycode for details on how the MICE-imputed variable values were adjusted given Tax Policy Center results.As mentioned in issue #396, this pull request has benefitted substantially from the
work on this topic by @maxkossek. Max identified SIPP data as being useful for
imputing overtime and tip income and CEX data as being useful for imputing auto
loan interest. In addition, this pull request is built on code that he developed to read
these two complex data sets. And it benefits from his experience with different
imputation methods. Many thanks to Max for all of his very helpful work.
Be sure to read the revised top-level
README.mdfile about how to install the SIPP and CEX public-use data files.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTE on the new
tests/test_mice.pymodule:Claude Code (running Sonnet 4.5) was asked this:
And Claude responded like this:
Then I said this to Claude:
Then Claude Code generated to
test_mice.pyfile included in this PR.