I ran `RUSTFLAGS=-Ccodegen-units=1 cargo bench` to see what changed versus just `cargo bench`. There were a few that stuck out in particular: - [ ] wrapping ops/widening_mul - `I512xI512`: -25% - `I1024xI1024`: -4% - [ ] widening ops/concatenating_mul - `I128xI128`: −18% - `I512xI512`: −31% - [ ] wrapping ops/sub - `I512-I512`: −35% - `I1024-I1024`: −25% - [ ] bounded random/random_mod - `U1024`: -11% - `U1024 tiny high limb`: -9% - [ ] wrapping ops/widening_mul - `U256xU256`: -12% - [ ] wrapping ops/mul_mod_special - `U256`: -10% - [ ] extended greatest common divisor/xgcd - 1: -3% - 2: -3% - 3: -2% - 5: -23% - 6: -16% - 7: -18% - 8: -5% - 16: -22% - 32: -16% - 128: -8% - 256: -5% - [ ] left shift/shl_vartime - small, `U2048`: -40% - large, `U2048`: -41% - [ ] left shift/shl_vartime_wide - large, `U2048`: -22% - [ ] right shift/shr - `U2048`: -8% - [ ] right shift/shr_vartime_wide - large, `U2048`: -24% - [ ] modular ops/invert_mod2k_vartime - `U256`: -18% cc @andrewwhitehead @erik-3milabs
I ran
RUSTFLAGS=-Ccodegen-units=1 cargo benchto see what changed versus justcargo bench. There were a few that stuck out in particular:I512xI512: -25%I1024xI1024: -4%I128xI128: −18%I512xI512: −31%I512-I512: −35%I1024-I1024: −25%U1024: -11%U1024 tiny high limb: -9%U256xU256: -12%U256: -10%U2048: -40%U2048: -41%U2048: -22%U2048: -8%U2048: -24%U256: -18%cc @andrewwhitehead @erik-3milabs