Optimize select() statements by removing redundant conditions#7242
Optimize select() statements by removing redundant conditions#7242
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7242 +/- ##
===========================================
- Coverage 100.00% 98.97% -1.03%
===========================================
Files 12 16 +4
Lines 205 294 +89
Branches 0 3 +3
===========================================
+ Hits 205 291 +86
- Misses 0 3 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Optimization suggestion: Remove redundant conditionsWhen using
|
When using np.select(), the most efficient pattern is to have the default handle the most common case(s), eliminating explicit condition checks. This commit: - Removes explicit conditions that return the same value as the default - Adds clarifying comments documenting what cases the default covers - Reduces condition evaluations from N to N-k where k conditions matched default Key optimizations: - taxsim_mstat: SINGLE and HOH both return 1, now handled by default - age_group: WORKING_AGE (most common) now handled by default - 50+ state tax files: SINGLE filing status now handled by default Performance benefit: Each removed condition eliminates one array comparison per element during vectorized calculations. 56 files changed with optimizations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
399c93a to
ad9d8cb
Compare
- Format ny_supplemental_tax.py with black - Add changelog_entry.yaml Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/rebase |
Summary
Optimizes all
np.select()statements in the codebase by removing conditions that return the same value as the default. This is the numpy-efficient pattern where the default handles the most common case(s).Key insight: When using
np.select(), each condition requires an array comparison. By settingdefault=to the most common value and removing explicit conditions for that value, we reduce evaluations from N to N-k.Example optimization (taxsim_mstat.py)
Before:
After:
Changes
taxsim_mstat(SINGLE + HOH → 1),age_group(WORKING_AGE as default)Performance benefit
Each removed condition eliminates one boolean array comparison per element. For microsimulations with millions of tax units, this reduces memory allocations and CPU cycles.
Test plan
🤖 Generated with Claude Code