Return a BandedMatrix from a view of a BandedBlockBandedMatrix#223
Return a BandedMatrix from a view of a BandedBlockBandedMatrix#223dlfivefifty wants to merge 3 commits intomasterfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #223 +/- ##
==========================================
- Coverage 88.16% 86.06% -2.10%
==========================================
Files 11 11
Lines 1115 1134 +19
==========================================
- Hits 983 976 -7
- Misses 132 158 +26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@MikaelSlevinsky With this (and the just tagged BlockArrays v1.6.2) we have the following: julia> using AppleAccelerate # needed so banded matrices aren't insanely slow
julia> for n in 20:20:200
ax = BlockArrays.BlockedOneTo(ArrayLayouts.RangeCumsum(Base.OneTo(n)))
D = BandedBlockBandedMatrix{Float64}(I, (ax,ax), (0, 0), (0, 0))
x = BlockVector(randn(sum(1:n)), (ax,))
y = zero(x)
@time my_special_mul_through_data!(y, D, x);
@time my_special_mul!(y, D, x);
@time my_special_mul_with_a_view!(y, D, x);
@time mul!(y, D, x);
end
0.000003 seconds
0.000011 seconds (80 allocations: 5.938 KiB)
0.000003 seconds
0.000012 seconds (2 allocations: 64 bytes)
0.000002 seconds
0.000008 seconds (160 allocations: 18.219 KiB)
0.000002 seconds
0.000026 seconds (2 allocations: 64 bytes)
0.000002 seconds
0.000012 seconds (240 allocations: 37.188 KiB)
0.000003 seconds
0.000007 seconds (2 allocations: 64 bytes)
0.000003 seconds
0.000017 seconds (320 allocations: 62.438 KiB)
0.000006 seconds
0.000011 seconds (2 allocations: 64 bytes)
0.000004 seconds
0.000029 seconds (400 allocations: 94.500 KiB)
0.000009 seconds
0.000011 seconds (2 allocations: 64 bytes)
0.000005 seconds
0.000034 seconds (480 allocations: 133.469 KiB)
0.000013 seconds
0.000015 seconds (2 allocations: 64 bytes)
0.000007 seconds
0.000039 seconds (560 allocations: 178.156 KiB)
0.000013 seconds
0.000020 seconds (2 allocations: 64 bytes)
0.000009 seconds
0.000049 seconds (640 allocations: 229.531 KiB)
0.000022 seconds
0.000027 seconds (2 allocations: 64 bytes)
0.000012 seconds
0.000052 seconds (720 allocations: 287.469 KiB)
0.000059 seconds
0.000034 seconds (2 allocations: 64 bytes)
0.000017 seconds
0.000072 seconds (800 allocations: 351.938 KiB)
0.000029 seconds
0.000045 seconds (2 allocations: 64 bytes)So we have got rid of almost all allocations. The remaining speed difference is due to julia> n = 10_000; D = BandedMatrix(0 => 1:n); x = randn(n); @time D*x;
0.000042 seconds (3 allocations: 78.188 KiB)
julia> D = Diagonal(1:n); x = randn(n); @time D*x;
0.000010 seconds (3 allocations: 78.188 KiB)We could special case diagonal blocks not to call BLAS... actually I suspect the fastest would be to get rid of all calls to banded BLAS and just do naive for loops.... |
|
Awesome, thanks! |
|
The reason ApproxFun is broken is that is assuming that @jishnub thoughts? |
@MikaelSlevinsky This gets rid of the allocations (for lazy axes at least).
Still need to figure out why it's slow.