See JuliaLang/julia#61452 for history. This reduces to the following:
# Minimal reproducer: SubArray of SparseMatrixCSC loses specialized triangular solve
#
# `ldiv!(UpperTriangular(R), x)` is fast when R is a SparseMatrixCSC,
# but falls back to a generic O(n^2) method when R is a SubArray view.
#
# Introduced in JuliaSparse/SparseArrays.jl#676 (commit 95b6ac4).
# The new ldiv! uses a @view of F.R instead of indexing:
# New (slow): https://github.com/JuliaSparse/SparseArrays.jl/blob/95b6ac4b0fa4d99e17920505dd36d7b95d91a0ab/src/solvers/spqr.jl#L513
# Old (fast): https://github.com/JuliaSparse/SparseArrays.jl/blob/4500d8656d26a9328f50a48eaf9eca2f1fabc8ef/src/solvers/spqr.jl#L440
using SparseArrays, LinearAlgebra
n = 9000
R = spdiagm(0 => fill(2.0, n), 1 => fill(1.0, n - 1))
x = ones(n)
R_copy = R[Base.OneTo(n), Base.OneTo(n)] # SparseMatrixCSC
R_view = @view R[Base.OneTo(n), Base.OneTo(n)] # SubArray
# Warmup
ldiv!(UpperTriangular(R_copy), copy(x))
ldiv!(UpperTriangular(R_view), copy(x))
t_copy = @elapsed for _ in 1:10; ldiv!(UpperTriangular(R_copy), copy(x)); end
t_view = @elapsed for _ in 1:10; ldiv!(UpperTriangular(R_view), copy(x)); end
println("SparseMatrixCSC: $(round(t_copy/10*1e6, digits=1)) μs")
println("SubArray view: $(round(t_view/10*1e6, digits=1)) μs")
println("Slowdown: $(round(t_view/t_copy, digits=0))x")
See JuliaLang/julia#61452 for history. This reduces to the following: