[perf] avoid `@autoopt` for partition function #245

ogauthe · 2025-08-15T23:33:41Z

This PR is a follow-up to #229 and #237. It replaces @autoopt by explicit contraction scheme in CTMRG partition function contractions. I assumed the optimal permutation was the same as for a wavefunction and reproduced the same order.

I do not really understand which constraints are imposed by planar non-braiding categories, so I may be doing illegal permutations. I do not know how to check for these, I am happy to learn.

codecov · 2025-08-15T23:38:33Z

Codecov Report

❌ Patch coverage is 27.27273% with 32 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/algorithms/contractions/ctmrg_contractions.jl	27.27%	32 Missing ⚠️

Files with missing lines	Coverage Δ
src/algorithms/contractions/ctmrg_contractions.jl	`50.89% <27.27%> (-4.35%)`	⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lkdvos

Looks good to me! Any chance you have some timing results too?

Considering the planar and braiding things, happy to explain but this isn't super relevant here because they require @planar anyways so we don't really support that in PEPSKit.jl right now. In principle we could do this for the partition functions, but until someone actually needs it I'm fine with just keeping the @tensor.

(Combining planarity and efficiency would actually be kind of a nightmare: for braided categories at least we could replace the permutations with braidings and their inverses, but for non-braided ones we cannot arbitrarily choose the intermediary permutations so we're more limited in what can be done)

ogauthe · 2025-09-11T15:19:01Z

I now specialize all renormalize edge contractions for partition function. The motivation is that when the partition function tensor is actually a contracted double layer quantum wavefunction, D~χ and permuting site tensor A is not cheap. With the new functions, A is never permuted within renormalize_X_edge.

The new contraction scheme may not be optimal in the limit D very small and χ very large and non-abelian symmetry, but I think this is a pretty uncommon case. In other cases, either contraction will dominate or if D is large having D or χ as the first leg would be equivalent.

I also fixed variables names in some other methods: although the contraction scheme were correct, the name employed for P_left/P_right were swapped.

Todo: some timing

src/algorithms/contractions/ctmrg_contractions.jl

ogauthe · 2025-09-11T23:08:08Z

I did the benchmarks in 3 cases:

benchmark code

using TensorOperations: @tensor
using TensorKit
using TensorKit: ×
using PEPSKit
using PEPSKit: @autoopt, CTMRGCornerTensor, CTMRG_PF_EdgeTensor, EnlargedCorner, PFTensor, dtmap!!, eachcoordinate, leading_boundary, select_algorithm, simultaneous_projectors
using BenchmarkTools

# ====================  master  =======================================================
function enlarge_northwest_corner_autoopt(
    E_west::CTMRG_PF_EdgeTensor, C_northwest::CTMRGCornerTensor,
    E_north::CTMRG_PF_EdgeTensor, A::PFTensor,
)
    return @autoopt @tensor corner[χ_S D_S; χ_E D_E] :=
        E_west[χ_S D1; χ1] * C_northwest[χ1; χ2] * E_north[χ2 D2; χ_E] * A[D1 D_S; D2 D_E]
end

function enlarge_northeast_corner_autoopt(E_north::CTMRG_PF_EdgeTensor, C_northeast::CTMRGCornerTensor,
    E_east::CTMRG_PF_EdgeTensor, A::PFTensor,
)
    return @autoopt @tensor corner[χ_W D_W; χ_S D_S] :=
        E_north[χ_W D1; χ1] * C_northeast[χ1; χ2] * E_east[χ2 D2; χ_S] * A[D_W D_S; D1 D2]
end

function enlarge_southeast_corner_autoopt(
    E_east::CTMRG_PF_EdgeTensor, C_southeast::CTMRGCornerTensor,
    E_south::CTMRG_PF_EdgeTensor, A::PFTensor,
)
    return @autoopt @tensor corner[χ_N D_N; χ_W D_W] :=
        E_east[χ_N D1; χ1] * C_southeast[χ1; χ2] * E_south[χ2 D2; χ_W] * A[D_W D2; D_N D1]
end

function enlarge_southwest_corner_autoopt(
    E_south::CTMRG_PF_EdgeTensor, C_southwest::CTMRGCornerTensor,
    E_west::CTMRG_PF_EdgeTensor, A::PFTensor,
)
    return @autoopt @tensor corner[χ_E D_E; χ_N D_N] :=
        E_south[χ_E D1; χ1] * C_southwest[χ1; χ2] * E_west[χ2 D2; χ_N] * A[D2 D1; D_N D_E]
end



function renormalize_north_edge_rotate(E_north, P_right, P_left, A)
    A_west = PEPSKit._rotl90_localsandwich(A)
    return renormalize_west_edge_autoopt(E_north, P_right, P_left, A_west)
end

function renormalize_east_edge_rotate(E_east, P_bottom, P_top, A)
    A_west = PEPSKit._rot180_localsandwich(A)
    return renormalize_west_edge_autoopt(E_east, P_bottom, P_top, A_west)
end

function renormalize_south_edge_rotate(E_south, P_left, P_right, A)
    A_west = PEPSKit._rotr90_localsandwich(A)
    return renormalize_west_edge_autoopt(E_south, P_left, P_right, A_west)
end

function renormalize_west_edge_autoopt(E_west::CTMRG_PF_EdgeTensor, P_top, P_bottom, A::PFTensor)
    return @autoopt @tensor edge[χ_S D_E; χ_N] :=
        E_west[χ1 D1; χ2] * A[D1 D5; D3 D_E] * P_top[χ2 D3; χ_N] * P_bottom[χ_S; χ1 D5]
end

# mixed
function renormalize_north_edge_rotate_explicit(E_north, P_right, P_left, A)
    A_west = PEPSKit._rotl90_localsandwich(A)
    return renormalize_west_edge_explicit(E_north, P_right, P_left, A_west)
end

function renormalize_east_edge_rotate_explicit(E_east, P_bottom, P_top, A)
    A_west = PEPSKit._rot180_localsandwich(A)
    return renormalize_west_edge_explicit(E_east, P_bottom, P_top, A_west)
end
function renormalize_south_edge_rotate_explicit(E_south, P_left, P_right, A)
    A_west = PEPSKit._rotr90_localsandwich(A)
    return renormalize_west_edge_explicit(E_south, P_left, P_right, A_west)
end

# ====================  explicit  =======================================================
function enlarge_northwest_corner_explicit(
    E_west::CTMRG_PF_EdgeTensor, C_northwest::CTMRGCornerTensor,
    E_north::CTMRG_PF_EdgeTensor, A::PFTensor,
)
    return @tensor begin
        EC[χ_S DW; χ2] := E_west[χ_S DW; χ1] * C_northwest[χ1; χ2]
        ECE[χ_S χ_E; DW DN] := EC[χ_S DW; χ2] * E_north[χ2 DN; χ_E]
        corner[χ_S D_S; χ_E D_E] := ECE[χ_S χ_E; DW DN] * A[DW D_S; DN D_E]
    end
end

function enlarge_northeast_corner_explicit(
    E_north::CTMRG_PF_EdgeTensor, C_northeast::CTMRGCornerTensor,
    E_east::CTMRG_PF_EdgeTensor, A::PFTensor,
)
    return @tensor begin
        EC[χ_W DN; χ2] := E_north[χ_W DN; χ1] * C_northeast[χ1; χ2]
        ECE[χ_W χ_S; DN DE] := EC[χ_W DN; χ2] * E_east[χ2 DE; χ_S]
        corner[χ_W D_W; χ_S D_S] := ECE[χ_W χ_S; DN DE] * A[D_W D_S; DN DE]
    end
end

function enlarge_northeast_corner_explicit_NE(
    E_north::CTMRG_PF_EdgeTensor, C_northeast::CTMRGCornerTensor,
    E_east::CTMRG_PF_EdgeTensor, A::PFTensor,
)
    return @tensor begin
        EC[DN χ_W; χ2] := E_north[χ_W DN; χ1] * C_northeast[χ1; χ2]
        ECE[DN DE; χ_S χ_W] := EC[DN χ_W; χ2] * E_east[χ2 DE; χ_S]
        corner[χ_W D_W; χ_S D_S] :=  A[D_W D_S; DN DE] * ECE[DN DE; χ_S χ_W]
    end
end

function enlarge_southeast_corner_explicit(
    E_east::CTMRG_PF_EdgeTensor, C_southeast::CTMRGCornerTensor,
    E_south::CTMRG_PF_EdgeTensor, A::PFTensor,
)
    return @tensor begin
        EC[χ_N D1; χ2] := E_east[χ_N D1; χ1] * C_southeast[χ1; χ2]
        ECE[χ_N χ_W; D1 D2] := EC[χ_N D1; χ2] * E_south[χ2 D2; χ_W]
        corner[χ_N D_N; χ_W D_W] := ECE[χ_N χ_W; D1 D2] * A[D_W D2; D_N D1]
    end
end

function enlarge_southwest_corner_explicit(
    E_south::CTMRG_PF_EdgeTensor, C_southwest::CTMRGCornerTensor,
    E_west::CTMRG_PF_EdgeTensor, A::PFTensor,
)
    return @tensor begin
        EC[χ_E D1; χ2] := E_south[χ_E D1; χ1] * C_southwest[χ1; χ2]
        ECE[χ_E χ_N; D2 D1] := EC[χ_E D1; χ2] * E_west[χ2 D2; χ_N]
        corner[χ_E D_E; χ_N D_N] := ECE[χ_E χ_N; D2 D1] * A[D2 D1; D_N D_E]
    end
end


function renormalize_north_edge_explicit(E_north::CTMRG_PF_EdgeTensor, P_right, P_left, A::PFTensor)
    return @tensor begin
        temp = permute(E_north, ((2, 1), (3,))) # impose D_N as 1st leg
        PE[D_N D_E; χNW χ_E] := temp[D_N χNW; χNE] * P_right[χNE D_E; χ_E]
        PEA[D_W χNW; D_S χ_E] := A[D_W D_S; D_N D_E] * PE[D_N D_E; χNW χ_E]
        P_leftp = permute(P_left, ((1,), (3, 2)))
        edge[χ_W D_S; χ_E] := P_leftp[χ_W; D_W χNW] * PEA[D_W χNW; D_S χ_E]
    end
end

function renormalize_east_edge_explicit(E_east::CTMRG_PF_EdgeTensor, P_bottom, P_top, A::PFTensor)
    return @tensor begin
        temp = permute(P_top, ((3, 1), (2,)))  # impose D_N as 1st leg
        PE[D_N D_E; χN χSE] := temp[D_N χN; χNE] * E_east[χNE D_E; χSE]
        PEA[D_W χN; χSE D_S] := A[D_W D_S; D_N D_E] * PE[D_N D_E; χN χSE]
        edge[χ_N D_W; χ_S] := PEA[D_W χ_N; χSE D_S] * P_bottom[χSE D_S; χ_S]
    end
end

function renormalize_south_edge_explicit(E_south::CTMRG_PF_EdgeTensor, P_left, P_right, A::PFTensor)
    # specialize to avoid extra permute on A when calling renormalize_west_edge
    return @tensor begin
        P_leftp = permute(P_left, ((3, 2), (1,)))  # impose χ_W as 1st leg
        PE[χ_W χSE; D_W D_S] := P_leftp[χ_W D_W; χSW] * E_south[χSE D_S; χSW]
        PEA[χ_W D_N; χSE D_E] := PE[χ_W χSE; D_W D_S] * A[D_W D_S; D_N D_E]
        edge[χ_E D_N; χ_W] := PEA[χ_W D_N; χSE D_E] * P_right[χ_E; χSE D_E]
    end
end

function renormalize_west_edge_explicit(E_west::CTMRG_PF_EdgeTensor, P_top, P_bottom, A::PFTensor)
    return @tensor begin
        PE[χ_S χNW; D_W D_S] := P_bottom[χ_S; χSW D_S] * E_west[χSW D_W; χNW]
        PEA[χ_S D_E; χNW D_N] := PE[χ_S χNW; D_W D_S] * A[D_W D_S; D_N D_E]
        edge[χ_S D_E; χ_N] := PEA[χ_S D_E; χNW D_N] * P_top[χNW D_N; χ_N]
    end
end


# ============================================================================================


function get_projectors(env, Z)
    alg = select_algorithm(leading_boundary, env; projector_alg=:fullinfinite)
    network = InfiniteSquareNetwork(Z)
    coordinates = eachcoordinate(network, 1:4)
    T_corners = Base.promote_op(
        TensorMap ∘ EnlargedCorner, typeof(network), typeof(env), eltype(coordinates)
    )
    enlarged_corners′ = similar(coordinates, T_corners)
    enlarged_corners::typeof(enlarged_corners′) =
        dtmap!!(enlarged_corners′, eachcoordinate(network, 1:4)) do idx
            return TensorMap(EnlargedCorner(network, env, idx))
        end  # expand environment
    projectors, info = simultaneous_projectors(enlarged_corners, env, alg.projector_alg)  # compute projectors on all coordinates
    return projectors
end

Ising partition function, Trivial sector, `D=2`, `χ=50`.

enlarge_XXX_corner: explicit contraction scheme is always better than @autoopt
enlarge_northeast_corner: permuting A better than not
explicit renormalize_west_edge_explicit is slighthly faster than @autoopt
the other directions are slightly slower than calling rotate + renormalize_west

Ising specific code

A_ising, _, _ = classical_ising(; beta=0.6)
Z_ising = InfinitePartitionFunction(A_ising)


χenv = ℂ^50
env0 = CTMRGEnv(Z, χenv)
env_ising, = leading_boundary(env0, Z; alg=:simultaneous, maxiter=20, projector_alg=:fullinfinite)
projectors_ising = get_projectors(env_ising, Z_ising)


E_north_ising, E_east_ising, E_south_ising, E_west_ising = env_ising.edges[:, 1, 1]
C_northwest_ising, C_northeast_ising, C_southeast_ising, C_southwest_ising = env_ising.corners[:, 1, 1]

benchmark Ising D=2

"""
julia> @benchmark enlarge_northwest_corner_autoopt(E_west_ising, C_northwest_ising, E_north_ising, A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  209.346 μs …   8.101 ms  ┊ GC (min … max):  0.00% … 96.25%
 Time  (median):     234.376 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   269.505 μs ± 301.047 μs  ┊ GC (mean ± σ):  11.44% ±  9.63%

  █▂    ▁                                                       ▁
  ██▄▇▄▁█▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▆ █
  209 μs        Histogram: log(frequency) by time       2.77 ms <

 Memory estimate: 705.90 KiB, allocs estimate: 61.

julia> @benchmark enlarge_northwest_corner_explicit(E_west_ising, C_northwest_ising, E_north_ising, A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  193.406 μs …   7.852 ms  ┊ GC (min … max):  0.00% … 96.01%
 Time  (median):     213.049 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   247.898 μs ± 300.844 μs  ┊ GC (mean ± σ):  12.10% ±  9.49%

  █▁    ▂                                                       ▁
  ██▇▇▃▄█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ █
  193 μs        Histogram: log(frequency) by time       2.77 ms <

 Memory estimate: 706.21 KiB, allocs estimate: 65.

julia> @benchmark enlarge_northeast_corner_autoopt(E_north_ising, C_northeast_ising, E_east_ising, A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  220.104 μs …   5.634 ms  ┊ GC (min … max): 0.00% … 93.90%
 Time  (median):     236.588 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   264.014 μs ± 272.544 μs  ┊ GC (mean ± σ):  9.65% ±  8.62%

           ▂ ▆ █ ▆
  ▂▂▂▂▃▃▄▅▄█▅█▆█▅█▆▇▆▄▅▃▄▂▃▂▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▂▁▁▁▁▂▁▂▂ ▃
  220 μs           Histogram: frequency by time          294 μs <

 Memory estimate: 705.46 KiB, allocs estimate: 56.

julia> @benchmark enlarge_northeast_corner_explicit(E_north_ising, C_northeast_ising, E_east_ising, A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  202.621 μs …   5.957 ms  ┊ GC (min … max):  0.00% … 94.61%
 Time  (median):     216.515 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   244.139 μs ± 275.941 μs  ┊ GC (mean ± σ):  10.39% ±  8.60%

      ▁▆█▅▁
  ▂▃▄▆██████▆▅▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▁▂▁▂▂▁▁▂▂▂▁▂▂▂▁▁▂▁▂▂▂▁▁▁▁▁▂▂ ▃
  203 μs           Histogram: frequency by time          324 μs <

 Memory estimate: 705.90 KiB, allocs estimate: 61.

julia> @benchmark enlarge_northeast_corner_explicit_NE(E_north_ising, C_northeast_ising, E_east_ising, A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  257.989 μs …   5.827 ms  ┊ GC (min … max):  0.00% … 93.48%
 Time  (median):     279.117 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   312.305 μs ± 305.405 μs  ┊ GC (mean ± σ):  10.04% ±  9.35%

  █▂                                                            ▁
  ██▅▆▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ █
  258 μs        Histogram: log(frequency) by time          3 ms <

 Memory estimate: 784.05 KiB, allocs estimate: 67.

julia> @benchmark enlarge_southeast_corner_autoopt(E_east_ising, C_southeast_ising, E_south_ising, A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  226.324 μs …   8.514 ms  ┊ GC (min … max):  0.00% … 96.13%
 Time  (median):     246.882 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   282.713 μs ± 304.425 μs  ┊ GC (mean ± σ):  11.19% ±  9.71%

  █▁    ▂                                                       ▁
  ██▅▃▃▁█▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▆ █
  226 μs        Histogram: log(frequency) by time       2.79 ms <

 Memory estimate: 705.90 KiB, allocs estimate: 61.

julia> @benchmark enlarge_southeast_corner_explicit(E_east_ising, C_southeast_ising, E_south_ising, A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  189.667 μs …   8.140 ms  ┊ GC (min … max):  0.00% … 96.29%
 Time  (median):     213.889 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   248.873 μs ± 305.923 μs  ┊ GC (mean ± σ):  12.24% ±  9.54%

  █▂    ▂                                                       ▁
  ██▄▃▄▁█▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ █
  190 μs        Histogram: log(frequency) by time       2.76 ms <

 Memory estimate: 706.52 KiB, allocs estimate: 69.

julia> @benchmark enlarge_southwest_corner_autoopt(E_south_ising, C_southwest_ising, E_west_ising, A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  225.358 μs …   8.261 ms  ┊ GC (min … max):  0.00% … 96.26%
 Time  (median):     245.846 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   281.233 μs ± 304.465 μs  ┊ GC (mean ± σ):  11.21% ±  9.69%

  █▁    ▂                                                       ▁
  ██▃▃▃▃█▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆ █
  225 μs        Histogram: log(frequency) by time        2.8 ms <

 Memory estimate: 705.90 KiB, allocs estimate: 61.

julia> @benchmark enlarge_southwest_corner_explicit(E_south_ising, C_southwest_ising, E_west_ising, A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  186.922 μs …   6.817 ms  ┊ GC (min … max):  0.00% … 95.75%
 Time  (median):     208.119 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   242.452 μs ± 297.941 μs  ┊ GC (mean ± σ):  12.30% ±  9.50%

  █▁    ▂                                                       ▁
  ██▃▃▃▃█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ █
  187 μs        Histogram: log(frequency) by time       2.76 ms <

 Memory estimate: 705.46 KiB, allocs estimate: 56.

#
# explicit renormalize_west_edge_explicit is slighthly faster than @autoopt
# the other are slower than calling rotate + renormalize_west
#

julia> @benchmark renormalize_north_edge_rotate(E_north_ising, projectors_ising[1][1, 1, 1], projectors_ising[2][1, 1, 1], A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  247.698 μs …   6.997 ms  ┊ GC (min … max):  0.00% … 94.64%
 Time  (median):     267.623 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   309.301 μs ± 340.862 μs  ┊ GC (mean ± σ):  11.38% ±  9.58%

  █▂   ▂                                                        ▁
  ██▁▄▁█▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▆ █
  248 μs        Histogram: log(frequency) by time       3.07 ms <

 Memory estimate: 706.45 KiB, allocs estimate: 68.

julia> @benchmark renormalize_north_edge_explicit(E_north_ising, projectors_ising[1][1, 1, 1], projectors_ising[2][1, 1, 1], A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  334.574 μs …   7.253 ms  ┊ GC (min … max):  0.00% … 93.79%
 Time  (median):     351.892 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   405.273 μs ± 383.753 μs  ┊ GC (mean ± σ):  11.36% ± 10.72%

  █▁    ▂                                                       ▁
  ██▃▁▄▁█▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▆▅▆ █
  335 μs        Histogram: log(frequency) by time       3.33 ms <

 Memory estimate: 862.21 KiB, allocs estimate: 70.


 julia> @benchmark renormalize_north_edge_rotate_explicit(E_north_ising, projectors_ising[1][1, 1, 1], projectors_ising[2][1, 1, 1], A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  244.771 μs …   6.452 ms  ┊ GC (min … max):  0.00% … 94.21%
 Time  (median):     260.623 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   305.785 μs ± 352.318 μs  ┊ GC (mean ± σ):  12.50% ± 10.03%

  █    ▁                                                        ▁
  ██▄▃▄██▅▅▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▆▆ █
  245 μs        Histogram: log(frequency) by time       3.18 ms <

 Memory estimate: 784.64 KiB, allocs estimate: 73.



julia> @benchmark renormalize_east_edge_rotate(E_east_ising, projectors_ising[1][2, 1, 1], projectors_ising[2][2, 1, 1], A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  250.680 μs …   7.480 ms  ┊ GC (min … max):  0.00% … 95.14%
 Time  (median):     269.076 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   310.202 μs ± 337.006 μs  ┊ GC (mean ± σ):  11.22% ±  9.57%

  █▂   ▂                                                        ▁
  ██▃▄▁█▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▆ █
  251 μs        Histogram: log(frequency) by time       3.06 ms <

 Memory estimate: 706.45 KiB, allocs estimate: 68.

julia> @benchmark renormalize_east_edge_explicit(E_east_ising, projectors_ising[1][2, 1, 1], projectors_ising[2][2, 1, 1], A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  332.842 μs …   7.848 ms  ┊ GC (min … max):  0.00% … 94.07%
 Time  (median):     348.757 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   403.554 μs ± 386.715 μs  ┊ GC (mean ± σ):  11.45% ± 10.72%

  █▂    ▁                                                       ▁
  ██▄▃▃▃█▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▅▆▆▆ █
  333 μs        Histogram: log(frequency) by time       3.34 ms <

 Memory estimate: 862.48 KiB, allocs estimate: 75.

 julia> @benchmark renormalize_east_edge_rotate_explicit(E_east_ising, projectors_ising[1][2, 1, 1], projectors_ising[2][2, 1, 1], A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  240.570 μs …   6.813 ms  ┊ GC (min … max):  0.00% … 93.73%
 Time  (median):     263.562 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   308.746 μs ± 358.399 μs  ┊ GC (mean ± σ):  12.52% ±  9.98%

  █▃   ▁                                                        ▁
  ██▄▃▃██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▄▅▆ █
  241 μs        Histogram: log(frequency) by time       3.21 ms <

 Memory estimate: 784.33 KiB, allocs estimate: 69.

julia> @benchmark renormalize_south_edge_rotate(E_south_ising, projectors_ising[1][3, 1, 1], projectors_ising[2][3, 1, 1], A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  256.781 μs …   8.372 ms  ┊ GC (min … max):  0.00% … 95.19%
 Time  (median):     273.659 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   316.367 μs ± 358.941 μs  ┊ GC (mean ± σ):  11.62% ±  9.53%

  █   ▂                                                         ▁
  █▇▁▃█▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▄▄▅ █
  257 μs        Histogram: log(frequency) by time       3.34 ms <

 Memory estimate: 706.45 KiB, allocs estimate: 68.

julia> @benchmark renormalize_south_edge_explicit(E_south_ising, projectors_ising[1][3, 1, 1], projectors_ising[2][3, 1, 1], A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  298.483 μs …   6.602 ms  ┊ GC (min … max):  0.00% … 93.83%
 Time  (median):     323.327 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   382.574 μs ± 410.074 μs  ┊ GC (mean ± σ):  13.60% ± 11.42%

  █▄                                                            ▁
  ██▁▃▁▃██▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▅▆▆▇▆ █
  298 μs        Histogram: log(frequency) by time       3.34 ms <

 Memory estimate: 1019.37 KiB, allocs estimate: 89.

 julia> @benchmark renormalize_south_edge_rotate_explicit(E_south_ising, projectors_ising[1][3, 1, 1], projectors_ising[2][3, 1, 1], A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  247.671 μs …   6.634 ms  ┊ GC (min … max):  0.00% … 94.38%
 Time  (median):     268.044 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   313.979 μs ± 367.780 μs  ┊ GC (mean ± σ):  12.70% ± 10.01%

  █    ▂                                                        ▁
  ██▁▃▃█▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▃▄▃▄▄▄▄▄▄ █
  248 μs        Histogram: log(frequency) by time       3.33 ms <

 Memory estimate: 784.64 KiB, allocs estimate: 73.

julia> @benchmark renormalize_west_edge_autoopt(E_west_ising, projectors_ising[1][4, 1, 1], projectors_ising[2][4, 1, 1], A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  248.072 μs …   8.750 ms  ┊ GC (min … max):  0.00% … 95.84%
 Time  (median):     267.580 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   308.741 μs ± 340.750 μs  ┊ GC (mean ± σ):  11.37% ±  9.58%

  █▁   ▂                                                        ▁
  ██▄▄▁█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▆ █
  248 μs        Histogram: log(frequency) by time       3.09 ms <

 Memory estimate: 706.01 KiB, allocs estimate: 63.

julia> @benchmark renormalize_west_edge_explicit(E_west_ising, projectors_ising[1][4, 1, 1], projectors_ising[2][4, 1, 1], A_ising)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  241.214 μs …   7.053 ms  ┊ GC (min … max):  0.00% … 95.11%
 Time  (median):     257.874 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   302.953 μs ± 354.252 μs  ┊ GC (mean ± σ):  12.62% ± 10.01%

  █▁   ▁                                                        ▁
  ██▄▃▃██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅▆ █
  241 μs        Histogram: log(frequency) by time       3.14 ms <

 Memory estimate: 783.89 KiB, allocs estimate: 64.

bilayer quantum tensor from finite temperature, with `D2=121`,`χ=121`

spaceD = Rep[ℤ₂ × SU₂]((0, 0)=>9, (1, 0)=>4, (0, 1)=>9, (1, 1)=>12, (0, 2)=>6, (1, 2)=>3)'
spaceχ = Rep[ℤ₂ × SU₂]((0, 0)=>8, (1, 0)=>4, (0, 1)=>9, (1, 1)=>11, (0, 2)=>4, (1, 2)=>4, (1, 3)=>1)

enlarge_XXX_corner: @autoopt and explicit scheme are similar
except for northeast, where @autoopt = explicit_NE is quite faster
renormalize_XXX_edge: explicit scheme much faster

D=11 specific code

# D = 11
spaceD = Rep[ℤ₂ × SU₂]((0, 0)=>9, (1, 0)=>4, (0, 1)=>9, (1, 1)=>12, (0, 2)=>6, (1, 2)=>3)'
spaceχ = Rep[ℤ₂ × SU₂]((0, 0)=>8, (1, 0)=>4, (0, 1)=>9, (1, 1)=>11, (0, 2)=>4, (1, 2)=>4, (1, 3)=>1)

A_z2su2 = randn(spaceD ⊗ spaceD ← spaceD ⊗ spaceD)
B_z2su2 = randn(spaceD ⊗ spaceD ← spaceD ⊗ spaceD)
C_z2su2 = randn(spaceD ⊗ spaceD ← spaceD ⊗ spaceD)
D_z2su2 = randn(spaceD ⊗ spaceD ← spaceD ⊗ spaceD)
Z_z2su2 = InfinitePartitionFunction([A_z2su2 B_z2su2; C_z2su2 D_z2su2])

env0 = CTMRGEnv(Z_z2su2, spaceχ)
env_z2su2, = leading_boundary(env0, Z_z2su2; alg=:simultaneous, maxiter=2, projector_alg=:fullinfinite)
projectors_z2su2 = get_projectors(env_z2su2, Z_z2su2)


E_north_z2su2, E_east_z2su2, E_south_z2su2, E_west_z2su2 = env_z2su2.edges[:, 1, 1]
C_northwest_z2su2, C_northeast_z2su2, C_southeast_z2su2, C_southwest_z2su2 = env_z2su2.corners[:, 1, 1]

benchmark D2=121

# Z2xSU2 D = 11
# enlarge_XXX_corner: @autoopt and explicit are similar
# except for NE, where @autoopt gives explicit_NE, quite faster
"""
julia> @benchmark enlarge_northwest_corner_autoopt(E_west_z2su2, C_northwest_z2su2, E_north_z2su2, A_z2su2)
BenchmarkTools.Trial: 77 samples with 1 evaluation per sample.
 Range (min … max):  61.427 ms … 69.770 ms  ┊ GC (min … max): 0.00% … 9.46%
 Time  (median):     65.769 ms              ┊ GC (median):    5.16%
 Time  (mean ± σ):   65.388 ms ±  2.206 ms  ┊ GC (mean ± σ):  5.45% ± 3.64%

   ▂▂▅                        ▂▅█ ▂ ▅  █ ▂▂▂
  ████▅█▁▁▅█▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▅▅█████████▁█████▅█▅▅▁▁▅▅▁▁█▁▁▅▁▁█ ▁
  61.4 ms         Histogram: frequency by time        69.4 ms <

 Memory estimate: 74.45 MiB, allocs estimate: 197.

julia> @benchmark enlarge_northwest_corner_explicit(E_west_z2su2, C_northwest_z2su2, E_north_z2su2, A_z2su2)
BenchmarkTools.Trial: 76 samples with 1 evaluation per sample.
 Range (min … max):  62.003 ms … 70.532 ms  ┊ GC (min … max): 0.00% … 9.28%
 Time  (median):     66.585 ms              ┊ GC (median):    5.04%
 Time  (mean ± σ):   66.116 ms ±  2.037 ms  ┊ GC (mean ± σ):  5.17% ± 3.68%

                                █▂▂     ▅
  ▄▁▁▅▇▇▄▁▇▁▄▇▁▄▁▅▁▁▄▁▁▁▁▁▁▁▄▅▄▁███▇▅▇▄▅█▇█▄▄▁▁▄▁▁▄▁▁▁▁▁▁▄▁▁▄ ▁
  62 ms           Histogram: frequency by time        70.5 ms <

 Memory estimate: 74.45 MiB, allocs estimate: 211.

julia> @benchmark enlarge_northeast_corner_autoopt(E_north_z2su2, C_northeast_z2su2, E_east_z2su2, A_z2su2)
BenchmarkTools.Trial: 94 samples with 1 evaluation per sample.
 Range (min … max):  50.309 ms … 57.861 ms  ┊ GC (min … max): 0.00% … 12.24%
 Time  (median):     54.436 ms              ┊ GC (median):    6.51%
 Time  (mean ± σ):   53.579 ms ±  1.902 ms  ┊ GC (mean ± σ):  4.53% ±  3.17%

            ▂                        ▄ ▂ ▆▂▄ █  ▄
  ██▆█▄▄▆██▁█▄▆▆▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆█▆█▆████████▆▆▄▆▁▁▄▁▆▁▁▄ ▁
  50.3 ms         Histogram: frequency by time        56.5 ms <

 Memory estimate: 57.95 MiB, allocs estimate: 194.

julia> @benchmark enlarge_northeast_corner_explicit(E_north_z2su2, C_northeast_z2su2, E_east_z2su2, A_z2su2)
BenchmarkTools.Trial: 78 samples with 1 evaluation per sample.
 Range (min … max):  60.268 ms … 68.362 ms  ┊ GC (min … max): 0.00% … 5.30%
 Time  (median):     64.368 ms              ┊ GC (median):    5.53%
 Time  (mean ± σ):   64.101 ms ±  2.271 ms  ┊ GC (mean ± σ):  5.06% ± 3.72%

  ▆   █                       ▆ █▁▁▃▃             ▆▁▃
  █▄▇▁█▇▄▁▄▁▁▁▁▄▄▄▁▁▄▁▁▁▁▁▄▁▄▇█▇█████▇▇▁▄▄▄▁▁▁▁▁▄▁███▁▄▁▁▁▇▁▄ ▁
  60.3 ms         Histogram: frequency by time        68.1 ms <

 Memory estimate: 72.75 MiB, allocs estimate: 211.

 julia> @benchmark enlarge_northeast_corner_explicit_NE(E_north_z2su2, C_northeast_z2su2, E_east_z2su2, A_z2su2)
BenchmarkTools.Trial: 94 samples with 1 evaluation per sample.
 Range (min … max):  50.275 ms … 57.916 ms  ┊ GC (min … max): 0.00% … 11.75%
 Time  (median):     53.789 ms              ┊ GC (median):    6.60%
 Time  (mean ± σ):   53.543 ms ±  2.025 ms  ┊ GC (mean ± σ):  5.70% ±  4.33%

                               ██▃                    ▂
  ▇▇▃▃▅▅▇▁▃▇▁▃▃▇▃▁▁▁▁▁▁▁▁▁▁▁▁▁▆███▅▆▅▃▆▃▃▁▁▁▁▁▃▁▁▁▁▁▁▇█▃▇▁▁▃▃ ▁
  50.3 ms         Histogram: frequency by time        57.1 ms <

 Memory estimate: 58.14 MiB, allocs estimate: 242.

julia> @benchmark enlarge_southeast_corner_autoopt(E_east_z2su2, C_southeast_z2su2, E_south_z2su2, A_z2su2)
BenchmarkTools.Trial: 75 samples with 1 evaluation per sample.
 Range (min … max):  62.922 ms … 73.666 ms  ┊ GC (min … max): 0.00% … 9.28%
 Time  (median):     67.283 ms              ┊ GC (median):    5.07%
 Time  (mean ± σ):   66.977 ms ±  2.342 ms  ┊ GC (mean ± σ):  5.11% ± 3.65%

   ▂  ▂  ▄                █▂▂   ▆  ▆
  ▆█▆▁█▁▁█▁▆▄▁▁▁▁▁▁▄▁▁▄▄▆▆███▄█▆█▆▄█▆▁▄█▁▁▁▁▄▁▄▁▄▁▄▁▁▁▁▁▁▁▁▁▄ ▁
  62.9 ms         Histogram: frequency by time        72.9 ms <

 Memory estimate: 74.45 MiB, allocs estimate: 207.

julia> @benchmark enlarge_southeast_corner_explicit(E_east_z2su2, C_southeast_z2su2, E_south_z2su2, A_z2su2)
BenchmarkTools.Trial: 75 samples with 1 evaluation per sample.
 Range (min … max):  62.347 ms … 74.354 ms  ┊ GC (min … max): 0.00% … 9.26%
 Time  (median):     66.947 ms              ┊ GC (median):    5.07%
 Time  (mean ± σ):   66.710 ms ±  2.318 ms  ┊ GC (mean ± σ):  5.11% ± 3.65%

                              ▅       █
  ▄▅▅▇▇▁▁▁▁▄▄▁▅▅▁▄▁▁▁▄▁▁▁▁▅▅▄██▇▅█▄▇▄▄█▇▅▇▄▁▁▄▁▄▁▁▁▄▁▄▁▁▁▁▄▁▄ ▁
  62.3 ms         Histogram: frequency by time        71.5 ms <

 Memory estimate: 74.45 MiB, allocs estimate: 211.

julia> @benchmark enlarge_southwest_corner_autoopt(E_south_z2su2, C_southwest_z2su2, E_west_z2su2, A_z2su2)
BenchmarkTools.Trial: 79 samples with 1 evaluation per sample.
 Range (min … max):  60.290 ms … 69.281 ms  ┊ GC (min … max): 0.00% … 9.89%
 Time  (median):     63.984 ms              ┊ GC (median):    5.32%
 Time  (mean ± σ):   63.831 ms ±  2.274 ms  ┊ GC (mean ± σ):  4.85% ± 3.46%

  ▂                         █▂
  ██▅▆▁▁▅▁▃▁▃▁▁▃▃▁▁▁▁▁▁▁▁▅▃▇███▆▃▆▅▃▃▁▁▁▃▁▃▁▃▃▅▃▁▃▃▁▅▃▁▁▃▅▁▁▃ ▁
  60.3 ms         Histogram: frequency by time        68.4 ms <

 Memory estimate: 72.75 MiB, allocs estimate: 205.

julia> @benchmark enlarge_southwest_corner_explicit(E_south_z2su2, C_southwest_z2su2, E_west_z2su2, A_z2su2)
BenchmarkTools.Trial: 93 samples with 1 evaluation per sample.
 Range (min … max):  50.399 ms … 57.117 ms  ┊ GC (min … max): 0.00% … 11.42%
 Time  (median):     54.141 ms              ┊ GC (median):    6.26%
 Time  (mean ± σ):   53.853 ms ±  1.989 ms  ┊ GC (mean ± σ):  5.45% ±  4.08%

                                  ▆█                      ▄
  ▄▅▄▃▃▅▃▅▄▁▁▅▃▃▃▃▁▄▃▃▁▁▁▁▁▁▁▁▁▃▃▄████▁▁▃▃▄▃▃▁▁▁▁▁▁▁▁▁▁▁▁▅█▅▄ ▁
  50.4 ms         Histogram: frequency by time          57 ms <

 Memory estimate: 57.95 MiB, allocs estimate: 194.
#
# Z2xSU2 D = 11
# renormalize_XXX_edge: explicit scheme much faster
#
julia> @benchmark renormalize_north_edge_rotate(E_north_z2su2, projectors_z2su2[1][1, 1, 1], projectors_z2su2[2][1, 1, 1], A_z2su2)
BenchmarkTools.Trial: 64 samples with 1 evaluation per sample.
 Range (min … max):  74.734 ms … 82.965 ms  ┊ GC (min … max): 0.00% … 8.10%
 Time  (median):     78.901 ms              ┊ GC (median):    4.27%
 Time  (mean ± σ):   79.152 ms ±  2.290 ms  ┊ GC (mean ± σ):  4.67% ± 3.16%

                               █                   ▂▂
  ▄█▄▁▄▁▅▁▁▁▁▅▁▁▄▁▁▄▁▄▁▄▁▄▁▁▇▁▇██▄▅▁▁█▁▁▁▁▁▁▁▁▁▁▅▄███▄▄▁▁▁▁▄▄ ▁
  74.7 ms         Histogram: frequency by time        82.9 ms <

 Memory estimate: 91.13 MiB, allocs estimate: 264.

julia> @benchmark renormalize_north_edge_explicit(E_north_z2su2, projectors_z2su2[1][1, 1, 1], projectors_z2su2[2][1, 1, 1], A_z2su2)
BenchmarkTools.Trial: 91 samples with 1 evaluation per sample.
 Range (min … max):  52.184 ms … 60.411 ms  ┊ GC (min … max): 0.00% … 15.43%
 Time  (median):     55.829 ms              ┊ GC (median):    5.95%
 Time  (mean ± σ):   55.424 ms ±  1.818 ms  ┊ GC (mean ± σ):  5.23% ±  4.06%

                                  █   ▄
  ▃▆▇▅▃▃▃▆▁▅▃▆▅▁▁▅▁▅▃▁▁▁▁▁▁▁▁▁▁▆▇▅█▆▇▆█▅▅▅▁▁▁▁▁▃▅▇▆▇▅▁▁▁▁▁▁▁▃ ▁
  52.2 ms         Histogram: frequency by time        58.8 ms <

 Memory estimate: 58.52 MiB, allocs estimate: 282.

julia> @benchmark renormalize_north_edge_rotate_explicit(E_north_z2su2, projectors_z2su2[1][1, 1, 1], projectors_z2su2[2][1, 1, 1], A_z2su2)
BenchmarkTools.Trial: 75 samples with 1 evaluation per sample.
 Range (min … max):  63.539 ms … 71.630 ms  ┊ GC (min … max): 0.00% … 9.00%
 Time  (median):     67.409 ms              ┊ GC (median):    4.96%
 Time  (mean ± σ):   67.373 ms ±  2.160 ms  ┊ GC (mean ± σ):  4.76% ± 3.44%

    ▂   ▂                     ▂▅█▂▅ ▅                    ▅
  ▅▁█▅█▅██▅▁██▁▁▁▁█▅▁▁▁▁▁▁▁▁█▅█████▅██▅▅▅▁▁▅██▅▅▁▁▅▁▁▁██▅█▁██ ▁
  63.5 ms         Histogram: frequency by time        70.9 ms <

 Memory estimate: 74.64 MiB, allocs estimate: 220.

julia> @benchmark renormalize_east_edge_rotate(E_east_z2su2, projectors_z2su2[1][2, 1, 1], projectors_z2su2[2][2, 1, 1], A_z2su2)
BenchmarkTools.Trial: 62 samples with 1 evaluation per sample.
 Range (min … max):  76.963 ms … 86.096 ms  ┊ GC (min … max): 0.00% … 4.07%
 Time  (median):     83.121 ms              ┊ GC (median):    7.93%
 Time  (mean ± σ):   81.617 ms ±  2.504 ms  ┊ GC (mean ± σ):  5.45% ± 3.43%

                                                    ██▄ ▃
  ▇▄▄▄▆▄▁▁▁▆▄▆▁▁▁▁▄▁▁▁▁▁▁▁▄▆▆▁▄▄▄▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▄▁▆███▇█▆▁▄▄ ▁
  77 ms           Histogram: frequency by time        84.2 ms <

 Memory estimate: 91.13 MiB, allocs estimate: 264.

julia> @benchmark renormalize_east_edge_explicit(E_east_z2su2, projectors_z2su2[1][2, 1, 1], projectors_z2su2[2][2, 1, 1], A_z2su2)
BenchmarkTools.Trial: 91 samples with 1 evaluation per sample.
 Range (min … max):  51.763 ms … 58.892 ms  ┊ GC (min … max): 0.00% … 11.11%
 Time  (median):     55.416 ms              ┊ GC (median):    6.04%
 Time  (mean ± σ):   55.046 ms ±  1.774 ms  ┊ GC (mean ± σ):  5.23% ±  4.02%

    ▆     ▄                        ▆  ▆ █▆            ▄ ▆
  ▄▄█▄▄▄▄▁█▁▄▆▆▁▆▁▄▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁█████▆██▄██▄▁▁▁▄▄▄▁▄███▄▁▁▄ ▁
  51.8 ms         Histogram: frequency by time        57.7 ms <

 Memory estimate: 58.33 MiB, allocs estimate: 265.

julia> @benchmark renormalize_east_edge_rotate_explicit(E_east_z2su2, projectors_z2su2[1][2, 1, 1], projectors_z2su2[2][2, 1, 1], A_z2su2)
BenchmarkTools.Trial: 72 samples with 1 evaluation per sample.
 Range (min … max):  66.029 ms … 74.751 ms  ┊ GC (min … max): 0.00% … 8.92%
 Time  (median):     70.200 ms              ┊ GC (median):    4.78%
 Time  (mean ± σ):   70.195 ms ±  2.180 ms  ┊ GC (mean ± σ):  4.55% ± 3.29%

     ▁    ▄  ▁▄            █▁▁█▄▁  ▄ █▄  ▁ ▁ ▁▁ ▁
  ▆▆▁█▁▆▆▆█▆▆██▆▁▁▆▆▁▁▁▁▆▆▆██████▁▆█▆██▁▁█▁█▁██▆█▆▆▆▆▆▆▁▁▆▁▁▆ ▁
  66 ms           Histogram: frequency by time        74.5 ms <

 Memory estimate: 74.64 MiB, allocs estimate: 218.

julia> @benchmark renormalize_south_edge_rotate(E_south_z2su2, projectors_z2su2[1][3, 1, 1], projectors_z2su2[2][3, 1, 1], A_z2su2)
BenchmarkTools.Trial: 63 samples with 1 evaluation per sample.
 Range (min … max):  75.586 ms … 83.626 ms  ┊ GC (min … max): 0.00% … 7.91%
 Time  (median):     81.926 ms              ┊ GC (median):    8.08%
 Time  (mean ± σ):   80.228 ms ±  2.580 ms  ┊ GC (mean ± σ):  5.59% ± 3.49%

                                                ▃█▃
  ▅▃▆▁▃▁▅▄▁▃▁▁▃▁▁▁▁▁▁▁▁▁▃▃▁▁▁▅▃▃▃▁▁▁▃▁▁▁▁▁▁▁▁▁▃▃███▆▃▁▁▁▁▁▁▁▃ ▁
  75.6 ms         Histogram: frequency by time        83.6 ms <

 Memory estimate: 91.13 MiB, allocs estimate: 260.

julia> @benchmark renormalize_south_edge_explicit(E_south_z2su2, projectors_z2su2[1][3, 1, 1], projectors_z2su2[2][3, 1, 1], A_z2su2)
BenchmarkTools.Trial: 90 samples with 1 evaluation per sample.
 Range (min … max):  52.635 ms … 59.056 ms  ┊ GC (min … max): 0.00% … 11.50%
 Time  (median):     56.449 ms              ┊ GC (median):    5.89%
 Time  (mean ± σ):   56.162 ms ±  1.731 ms  ┊ GC (mean ± σ):  5.21% ±  3.97%

          ▁▁                     █  ▃▁▁▃ ▃▁  ▁     ▃   ▃
  ▄▄▄▁▇▁▄▁██▇▇▄▇▁▄▄▇▄▁▁▄▁▄▄▁▁▁▁▄▄█▇▇████▄██▄▇█▄▇▁▁▄█▇▇▁█▄▇▄▁▇ ▁
  52.6 ms         Histogram: frequency by time        58.9 ms <

 Memory estimate: 58.52 MiB, allocs estimate: 282.

julia> @benchmark renormalize_south_edge_rotate_explicit(E_south_z2su2, projectors_z2su2[1][3, 1, 1], projectors_z2su2[2][3, 1, 1], A_z2su2)
BenchmarkTools.Trial: 74 samples with 1 evaluation per sample.
 Range (min … max):  64.614 ms … 71.895 ms  ┊ GC (min … max): 0.00% … 9.09%
 Time  (median):     68.119 ms              ┊ GC (median):    4.93%
 Time  (mean ± σ):   68.277 ms ±  2.003 ms  ┊ GC (mean ± σ):  4.70% ± 3.43%

       ▂                   █ ▂ ▅  ▂▅   ▂               ▂ ▂
  ▅▁▅▅▅█▁██▁█▅▁█▁▅▁▅▁▅▁▁▅▅████▅█▅███▅▁▁█▁▁▅▁▁█▁▅█▁▁▁▁▅████▁██ ▁
  64.6 ms         Histogram: frequency by time        71.6 ms <

 Memory estimate: 74.64 MiB, allocs estimate: 226.

julia> @benchmark renormalize_west_edge_autoopt(E_west_z2su2, projectors_z2su2[1][4, 1, 1], projectors_z2su2[2][4, 1, 1], A_z2su2)
BenchmarkTools.Trial: 75 samples with 1 evaluation per sample.
 Range (min … max):  63.522 ms … 71.659 ms  ┊ GC (min … max): 0.00% … 9.22%
 Time  (median):     67.321 ms              ┊ GC (median):    5.04%
 Time  (mean ± σ):   67.263 ms ±  2.247 ms  ┊ GC (mean ± σ):  5.13% ± 3.65%

                           ▄▆               █▄
  ███▆▄▄▄▁▄▄▆▆▁▄▁▁▁▁▁▁▁▄▁▁▁███▆▆█▄▆█▄▁▄▁▁▁▁▄███▁▄▁▁▁▄▁▁▄▄▁▆▁▄ ▁
  63.5 ms         Histogram: frequency by time        71.4 ms <

 Memory estimate: 74.64 MiB, allocs estimate: 259.

julia> @benchmark renormalize_west_edge_explicit(E_west_z2su2, projectors_z2su2[1][4, 1, 1], projectors_z2su2[2][4, 1, 1], A_z2su2)
BenchmarkTools.Trial: 89 samples with 1 evaluation per sample.
 Range (min … max):  52.455 ms … 61.024 ms  ┊ GC (min … max): 0.00% … 10.52%
 Time  (median):     56.303 ms              ┊ GC (median):    5.95%
 Time  (mean ± σ):   56.222 ms ±  1.950 ms  ┊ GC (mean ± σ):  5.09% ±  3.98%

       ▄                  █▂ ▄▂▂ ▂       ▂
  ▄▆▁▄▆█▆▁▄▁▄▆▆▄▁▁▄▄▄▁▁▄▆▆██▆███▆█▆▆▄▄▄█▄█▄█▄█▁▁▁▄▄▁▁▁▁▁▄▄▁▁▄ ▁
  52.5 ms         Histogram: frequency by time        60.7 ms <

 Memory estimate: 58.14 MiB, allocs estimate: 213.

bilayer quantum tensor from finite temperature, with `D2=256`,`χ=256`

spaceD = Rep[ℤ₂ × SU₂]((0, 0)=>10, (1, 0)=>4, (0, 1)=>12, (1, 1)=>16, (0, 2)=>12, (1, 2)=>8, (0, 3)=>3, (1, 3)=>4, (0, 4)=>1)
spaceχ = Rep[ℤ₂ × SU₂]((0, 0)=>10, (1, 0)=>4, (0, 1)=>12, (1, 1)=>15, (0, 2)=>13, (1, 2)=>9, (0, 3)=>2, (1, 3)=>4, (0, 4)=>1)

same story as D=121
enlarge_XXX_corner: @autoopt and explicit scheme are similar
except for northeast, where @autoopt = explicit_NE is quite faster
renormalize_XXX_edge: explicit scheme much faster

D2=256 specific code

spaceD = Rep[ℤ₂ × SU₂]((0, 0)=>10, (1, 0)=>4, (0, 1)=>12, (1, 1)=>16, (0, 2)=>12, (1, 2)=>8, (0, 3)=>3, (1, 3)=>4, (0, 4)=>1)
spaceχ = Rep[ℤ₂ × SU₂]((0, 0)=>10, (1, 0)=>4, (0, 1)=>12, (1, 1)=>15, (0, 2)=>13, (1, 2)=>9, (0, 3)=>2, (1, 3)=>4, (0, 4)=>1)

A_z2su2 = randn(spaceD ⊗ spaceD ← spaceD ⊗ spaceD)
B_z2su2 = randn(spaceD ⊗ spaceD ← spaceD ⊗ spaceD)
C_z2su2 = randn(spaceD ⊗ spaceD ← spaceD ⊗ spaceD)
D_z2su2 = randn(spaceD ⊗ spaceD ← spaceD ⊗ spaceD)
Z_z2su2 = InfinitePartitionFunction([A_z2su2 B_z2su2; C_z2su2 D_z2su2])

env0 = CTMRGEnv(Z_z2su2, spaceχ)
env_z2su2, = leading_boundary(env0, Z_z2su2; alg=:simultaneous, maxiter=2, projector_alg=:fullinfinite)
projectors_z2su2 = get_projectors(env_z2su2, Z_z2su2)


E_north_z2su2, E_east_z2su2, E_south_z2su2, E_west_z2su2 = env_z2su2.edges[:, 1, 1]
C_northwest_z2su2, C_northeast_z2su2, C_southeast_z2su2, C_southwest_z2su2 = env_z2su2.corners[:, 1, 1]

benchmark D2=256

julia> @benchmark enlarge_northwest_corner_autoopt(E_west_z2su2, C_northwest_z2su2, E_north_z2su2, A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  1.008 s …    1.713 s  ┊ GC (min … max):  0.00% … 41.47%
 Time  (median):     1.017 s               ┊ GC (median):     1.03%
 Time  (mean ± σ):   1.154 s ± 312.518 ms  ┊ GC (mean ± σ):  12.90% ± 18.18%

  █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
  1.01 s         Histogram: frequency by time         1.71 s <

 Memory estimate: 690.74 MiB, allocs estimate: 248.

julia> @benchmark enlarge_northwest_corner_explicit(E_west_z2su2, C_northwest_z2su2, E_north_z2su2, A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  1.011 s …    1.738 s  ┊ GC (min … max):  0.00% … 41.63%
 Time  (median):     1.022 s               ┊ GC (median):     1.04%
 Time  (mean ± σ):   1.163 s ± 321.145 ms  ┊ GC (mean ± σ):  12.98% ± 18.28%

  █
  █▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  1.01 s         Histogram: frequency by time         1.74 s <

 Memory estimate: 690.74 MiB, allocs estimate: 254.

julia> @benchmark enlarge_northeast_corner_autoopt(E_north_z2su2, C_northeast_z2su2, E_east_z2su2, A_z2su2)
BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
 Range (min … max):  883.822 ms … 899.711 ms  ┊ GC (min … max): 0.00% … 1.70%
 Time  (median):     890.642 ms               ┊ GC (median):    0.70%
 Time  (mean ± σ):   891.422 ms ±   5.177 ms  ┊ GC (mean ± σ):  0.81% ± 0.56%

  █                       ███          █                      █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁███▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  884 ms           Histogram: frequency by time          900 ms <

 Memory estimate: 552.81 MiB, allocs estimate: 243.

julia> @benchmark enlarge_northeast_corner_explicit(E_north_z2su2, C_northeast_z2su2, E_east_z2su2, A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  1.004 s …    1.760 s  ┊ GC (min … max):  0.00% … 42.29%
 Time  (median):     1.026 s               ┊ GC (median):     1.03%
 Time  (mean ± σ):   1.165 s ± 332.541 ms  ┊ GC (mean ± σ):  13.24% ± 18.62%

  ██                                                       ▁
  ██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1 s            Histogram: frequency by time         1.76 s <

 Memory estimate: 690.74 MiB, allocs estimate: 254.

julia> @benchmark enlarge_northeast_corner_explicit_NE(E_north_z2su2, C_northeast_z2su2, E_east_z2su2, A_z2su2)
BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
 Range (min … max):  885.249 ms … 903.390 ms  ┊ GC (min … max): 0.00% … 1.71%
 Time  (median):     891.860 ms               ┊ GC (median):    0.71%
 Time  (mean ± σ):   893.366 ms ±   6.062 ms  ┊ GC (mean ± σ):  0.81% ± 0.57%

  ▁                   ▁ █              ▁                      ▁
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  885 ms           Histogram: frequency by time          903 ms <

 Memory estimate: 553.66 MiB, allocs estimate: 291.

julia> @benchmark enlarge_southeast_corner_autoopt(E_east_z2su2, C_southeast_z2su2, E_south_z2su2, A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  998.571 ms …    1.734 s  ┊ GC (min … max):  0.00% … 41.51%
 Time  (median):        1.015 s               ┊ GC (median):     1.02%
 Time  (mean ± σ):      1.159 s ± 322.092 ms  ┊ GC (mean ± σ):  13.05% ± 18.19%

  ▁█▁                                                         ▁
  ███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  999 ms           Histogram: frequency by time          1.73 s <

 Memory estimate: 690.74 MiB, allocs estimate: 246.

julia> @benchmark enlarge_southeast_corner_explicit(E_east_z2su2, C_southeast_z2su2, E_south_z2su2, A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  1.006 s …    1.719 s  ┊ GC (min … max):  0.00% … 41.48%
 Time  (median):     1.017 s               ┊ GC (median):     0.98%
 Time  (mean ± σ):   1.155 s ± 315.221 ms  ┊ GC (mean ± σ):  12.81% ± 18.26%

  █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
  1.01 s         Histogram: frequency by time         1.72 s <

 Memory estimate: 690.74 MiB, allocs estimate: 250.

julia> @benchmark enlarge_southwest_corner_autoopt(E_south_z2su2, C_southwest_z2su2, E_west_z2su2, A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  1.002 s …    1.708 s  ┊ GC (min … max):  0.00% … 41.58%
 Time  (median):     1.008 s               ┊ GC (median):     0.99%
 Time  (mean ± σ):   1.148 s ± 313.217 ms  ┊ GC (mean ± σ):  12.84% ± 18.30%

  █
  █▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  1 s            Histogram: frequency by time         1.71 s <

 Memory estimate: 690.74 MiB, allocs estimate: 238.

julia> @benchmark enlarge_southwest_corner_explicit(E_south_z2su2, C_southwest_z2su2, E_west_z2su2, A_z2su2)
BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
 Range (min … max):  889.816 ms … 909.898 ms  ┊ GC (min … max): 0.00% … 1.69%
 Time  (median):     900.272 ms               ┊ GC (median):    0.69%
 Time  (mean ± σ):   901.398 ms ±   7.574 ms  ┊ GC (mean ± σ):  0.79% ± 0.56%

  ▁                         ▁ ▁     ▁                         █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  890 ms           Histogram: frequency by time          910 ms <

 Memory estimate: 552.81 MiB, allocs estimate: 237.
#
#
#
julia> @benchmark renormalize_north_edge_rotate(E_north_z2su2, projectors_z2su2[1][1, 1, 1], projectors_z2su2[2][1, 1, 1], A_z2su2)
BenchmarkTools.Trial: 4 samples with 1 evaluation per sample.
 Range (min … max):  1.146 s …    1.861 s  ┊ GC (min … max):  0.00% … 38.34%
 Time  (median):     1.158 s               ┊ GC (median):     0.93%
 Time  (mean ± σ):   1.331 s ± 353.742 ms  ┊ GC (mean ± σ):  13.81% ± 18.87%

  █▁                                                       ▁
  ██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.15 s         Histogram: frequency by time         1.86 s <

 Memory estimate: 829.52 MiB, allocs estimate: 331.

julia> @benchmark renormalize_north_edge_explicit(E_north_z2su2, projectors_z2su2[1][1, 1, 1], projectors_z2su2[2][1, 1, 1], A_z2su2)
BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
 Range (min … max):  906.581 ms …   1.004 s  ┊ GC (min … max): 0.00% … 9.46%
 Time  (median):     915.537 ms              ┊ GC (median):    0.84%
 Time  (mean ± σ):   928.849 ms ± 37.239 ms  ┊ GC (mean ± σ):  2.25% ± 3.61%

  ▁  ▁ █ ▁                                                   ▁
  █▁▁█▁█▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  907 ms          Histogram: frequency by time             1 s <

 Memory estimate: 555.38 MiB, allocs estimate: 343.

julia> @benchmark renormalize_north_edge_rotate_explicit(E_north_z2su2, projectors_z2su2[1][1, 1, 1], projectors_z2su2[2][1, 1, 1], A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  1.034 s …    1.744 s  ┊ GC (min … max):  0.00% … 40.93%
 Time  (median):     1.043 s               ┊ GC (median):     1.01%
 Time  (mean ± σ):   1.182 s ± 314.027 ms  ┊ GC (mean ± σ):  12.59% ± 17.99%

  █
  █▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  1.03 s         Histogram: frequency by time         1.74 s <

 Memory estimate: 691.59 MiB, allocs estimate: 269.

julia> @benchmark renormalize_east_edge_rotate(E_east_z2su2, projectors_z2su2[1][2, 1, 1], projectors_z2su2[2][2, 1, 1], A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  1.162 s …    1.878 s  ┊ GC (min … max):  0.38% … 37.89%
 Time  (median):     1.172 s               ┊ GC (median):     0.83%
 Time  (mean ± σ):   1.310 s ± 317.677 ms  ┊ GC (mean ± σ):  11.29% ± 16.68%

  █
  █▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  1.16 s         Histogram: frequency by time         1.88 s <

 Memory estimate: 829.52 MiB, allocs estimate: 323.

julia> @benchmark renormalize_east_edge_explicit(E_east_z2su2, projectors_z2su2[1][2, 1, 1], projectors_z2su2[2][2, 1, 1], A_z2su2)
BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
 Range (min … max):  904.080 ms … 995.931 ms  ┊ GC (min … max): 0.00% … 9.27%
 Time  (median):     911.708 ms               ┊ GC (median):    0.85%
 Time  (mean ± σ):   924.682 ms ±  35.068 ms  ┊ GC (mean ± σ):  2.21% ± 3.53%

  ▁   █▁▁                                                     ▁
  █▁▁▁███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  904 ms           Histogram: frequency by time          996 ms <

 Memory estimate: 554.52 MiB, allocs estimate: 322.

julia> @benchmark renormalize_east_edge_rotate_explicit(E_east_z2su2, projectors_z2su2[1][2, 1, 1], projectors_z2su2[2][2, 1, 1], A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  1.050 s …   1.158 s  ┊ GC (min … max): 0.00% … 9.16%
 Time  (median):     1.063 s              ┊ GC (median):    0.94%
 Time  (mean ± σ):   1.079 s ± 44.247 ms  ┊ GC (mean ± σ):  2.46% ± 3.84%

  ▁    ▁█                                                 ▁
  █▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.05 s         Histogram: frequency by time        1.16 s <

 Memory estimate: 691.59 MiB, allocs estimate: 267.

julia> @benchmark renormalize_south_edge_rotate(E_south_z2su2, projectors_z2su2[1][3, 1, 1], projectors_z2su2[2][3, 1, 1], A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  1.153 s …    1.859 s  ┊ GC (min … max):  0.00% … 38.13%
 Time  (median):     1.167 s               ┊ GC (median):     0.81%
 Time  (mean ± σ):   1.303 s ± 310.732 ms  ┊ GC (mean ± σ):  11.30% ± 16.80%

  ██                                                       ▁
  ██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.15 s         Histogram: frequency by time         1.86 s <

 Memory estimate: 829.52 MiB, allocs estimate: 313.

julia> @benchmark renormalize_south_edge_explicit(E_south_z2su2, projectors_z2su2[1][3, 1, 1], projectors_z2su2[2][3, 1, 1], A_z2su2)
BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
 Range (min … max):  905.743 ms …   1.000 s  ┊ GC (min … max): 0.00% … 9.39%
 Time  (median):     913.875 ms              ┊ GC (median):    0.84%
 Time  (mean ± σ):   929.156 ms ± 35.679 ms  ┊ GC (mean ± σ):  2.22% ± 3.58%

  █  ███         █                                           █
  █▁▁███▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  906 ms          Histogram: frequency by time             1 s <

 Memory estimate: 555.38 MiB, allocs estimate: 339.

julia> @benchmark renormalize_south_edge_rotate_explicit(E_south_z2su2, projectors_z2su2[1][3, 1, 1], projectors_z2su2[2][3, 1, 1], A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  1.033 s …    1.782 s  ┊ GC (min … max):  0.00% … 40.58%
 Time  (median):     1.044 s               ┊ GC (median):     1.01%
 Time  (mean ± σ):   1.189 s ± 331.084 ms  ┊ GC (mean ± σ):  12.67% ± 17.83%

  █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
  1.03 s         Histogram: frequency by time         1.78 s <

 Memory estimate: 691.59 MiB, allocs estimate: 267.

julia> @benchmark renormalize_west_edge_autoopt(E_west_z2su2, projectors_z2su2[1][4, 1, 1], projectors_z2su2[2][4, 1, 1], A_z2su2)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):  1.027 s …  1.045 s  ┊ GC (min … max): 0.00% … 1.78%
 Time  (median):     1.039 s             ┊ GC (median):    0.98%
 Time  (mean ± σ):   1.036 s ± 7.299 ms  ┊ GC (mean ± σ):  0.91% ± 0.66%

  █        █                           █  █              █
  █▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.03 s        Histogram: frequency by time        1.04 s <

 Memory estimate: 691.59 MiB, allocs estimate: 314.

julia> @benchmark renormalize_west_edge_explicit(E_west_z2su2, projectors_z2su2[1][4, 1, 1], projectors_z2su2[2][4, 1, 1], A_z2su2)
BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
 Range (min … max):  904.715 ms … 997.710 ms  ┊ GC (min … max): 0.00% … 9.31%
 Time  (median):     912.063 ms               ┊ GC (median):    0.84%
 Time  (mean ± σ):   925.313 ms ±  35.627 ms  ┊ GC (mean ± σ):  2.21% ± 3.55%

  ▁  ▁█ ▁                                                     ▁
  █▁▁██▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  905 ms           Histogram: frequency by time          998 ms <

 Memory estimate: 553.66 MiB, allocs estimate: 256.

lkdvos · 2025-09-15T13:53:18Z

Thanks a lot for the detailed benchmark! I do have to admit that the results are somewhat surprising to me. Am I reading that wrong or are there actual regressions by making this change too? I think it is indeed expected that the speedup isn't uniform over all the directions, but I would have guessed that we should be able to have an improvement overall, and seemingly right now it is only sometimes true. I realize also that the cases vary quite wildly, since D scales differently depending on whether or not it is a squashed quantum case or a partition function, but I expected them both to result in the same outcome.

In particular, what surprised me is some of the choices of putting a D leg first to keep a permutation contiguous, rather than a chi leg. While I agree that D might become as large as chi in your typical usecases, I think in all regimes we still expect D < chi, so that might be slightly suboptimal? In a similar line of thinking, permuting A should be less costly than permuting two edges that are connected, so maybe the focus might be not entirely fair. (correct me if I'm wrong though, this is really something I would have to measure and profile to see which of the steps is actually where the time is spent)

ogauthe · 2025-09-15T23:07:26Z

These results are a bit confusing. Here is a try to make things more clear, the smaller the best.

	Ising	D=11	D=16
enlarge NW	explicit < autoopt	=	=
enlarge NE	explicit < autoopt < explicitNE	explicitNE = autoopt < explicit	explicitNE = autoopt < explicit
enlarge SE	explicit < autoopt	=	=
enlarge SW	explicit < autoopt	explicit < autoopt	explicit < autoopt

Here are the edge renormalization. Since for direction west, explicit is always better than @autoopt, rotate_explicit is always better than rotate.

	Ising	D=11	D=16
renormalize N	rotate < explicit	explicit < rotate	explicit < rotate
renormalize E	rotate < explicit	explicit < rotate	explicit < rotate
renormalize S	rotate < explicit	explicit < rotate	explicit < rotate
renormalize W	explicit < autoopt	explicit < autoopt	explicit < autoopt

Hence the questions are

what to do about corner north east
what to do for renormalize N-E-S

But I think these benchmark show that this PR improves corners NW-SE-SW and renornalize west

ogauthe · 2025-09-16T14:50:06Z

More benchmarks: I considered tensors as found in https://arxiv.org/abs/2505.05889 with D=16 and χ=100. This looks more relevant that Ising square lattice with D=2 that was too simple.

Kagome Ising D=16 code

A_kagome = randn(ℂ^16 ⊗  ℂ^16 ← ℂ^16 ⊗ ℂ^16)

Z_kagome = InfinitePartitionFunction(A_kagome)
χ_kagome = ℂ^100
env0 = CTMRGEnv(Z_kagome, χ_kagome)
env_kagome, = leading_boundary(env0, Z_kagome; alg=:simultaneous, maxiter=20, projector_alg=:fullinfinite)
projectors_kagome = get_projectors(env_kagome, Z_kagome)

E_north_kagome, E_east_kagome, E_south_kagome, E_west_kagome = env_kagome.edges[:, 1, 1]
C_northwest_kagome, C_northeast_kagome, C_southeast_kagome, C_southwest_kagome = env_kagome.corners[:, 1, 1]

enlarge corner benchmark

julia> @benchmark enlarge_northwest_corner_autoopt(E_west_kagome, C_northwest_kagome, E_north_kagome, A_kagome)
BenchmarkTools.Trial: 107 samples with 1 evaluation per sample.
 Range (min … max):  42.886 ms … 53.565 ms  ┊ GC (min … max): 0.00% … 15.96%
 Time  (median):     47.162 ms              ┊ GC (median):    9.43%
 Time  (mean ± σ):   46.745 ms ±  2.163 ms  ┊ GC (mean ± σ):  6.87% ±  4.48%

                         █    ▆▂                               
  ▅█▆▆▆▁▅▃▃▃▁▁▁▁▃▃▁▁▁▁▃▇▆█▆▆▇▅██▄▅▆▆▆▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
  42.9 ms         Histogram: frequency by time        53.3 ms <

 Memory estimate: 79.85 MiB, allocs estimate: 62.

julia> @benchmark enlarge_northwest_corner_explicit(E_west_kagome, C_northwest_kagome, E_north_kagome, A_kagome)
BenchmarkTools.Trial: 134 samples with 1 evaluation per sample.
 Range (min … max):  33.823 ms … 42.002 ms  ┊ GC (min … max):  0.00% … 11.31%
 Time  (median):     38.438 ms              ┊ GC (median):    11.78%
 Time  (mean ± σ):   37.491 ms ±  1.946 ms  ┊ GC (mean ± σ):   8.95% ±  5.19%

  ▁▄▂▂                                       ▆██▄              
  ████▅▁▅▅▅▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▅▅▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁█████▆▆█▅▅▁▁▁▅▁▅ ▅
  33.8 ms      Histogram: log(frequency) by time      39.9 ms <

 Memory estimate: 79.85 MiB, allocs estimate: 62.

julia> 

julia> @benchmark enlarge_northeast_corner_autoopt(E_north_kagome, C_northeast_kagome, E_east_kagome, A_kagome)
BenchmarkTools.Trial: 108 samples with 1 evaluation per sample.
 Range (min … max):  42.761 ms … 60.854 ms  ┊ GC (min … max): 0.00% … 25.63%
 Time  (median):     47.327 ms              ┊ GC (median):    9.67%
 Time  (mean ± σ):   46.664 ms ±  2.455 ms  ┊ GC (mean ± σ):  7.18% ±  4.79%

      ▃                           ▃ ▇▇  ▃  █                   
  ▆▃▆▅██▇▅▁▁▁▁▁▅▁▁▁▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁█▆██▇▆█▇██▇▁▆▃▁▁▁▅▁▁▁▁▁▁▁▃▃ ▃
  42.8 ms         Histogram: frequency by time        50.4 ms <

 Memory estimate: 79.35 MiB, allocs estimate: 56.

julia> @benchmark enlarge_northeast_corner_explicit(E_north_kagome, C_northeast_kagome, E_east_kagome, A_kagome)
BenchmarkTools.Trial: 133 samples with 1 evaluation per sample.
 Range (min … max):  34.070 ms … 40.255 ms  ┊ GC (min … max):  0.00% … 11.48%
 Time  (median):     38.683 ms              ┊ GC (median):    11.75%
 Time  (mean ± σ):   37.692 ms ±  1.981 ms  ┊ GC (mean ± σ):   8.93% ±  5.20%

                                              ▃█▅              
  ▅▅█▄▃▃▃▁▃▃▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▃███▆▆▄▃▃▃▁▃▃▃▃▃ ▃
  34.1 ms         Histogram: frequency by time          40 ms <

 Memory estimate: 79.85 MiB, allocs estimate: 62.

julia> @benchmark enlarge_northeast_corner_explicit_NE(E_north_kagome, C_northeast_kagome, E_east_kagome, A_kagome)
BenchmarkTools.Trial: 109 samples with 1 evaluation per sample.
 Range (min … max):  42.475 ms … 48.382 ms  ┊ GC (min … max): 0.00% … 9.79%
 Time  (median):     46.905 ms              ┊ GC (median):    9.64%
 Time  (mean ± σ):   45.980 ms ±  1.802 ms  ┊ GC (mean ± σ):  7.00% ± 4.43%

                                                   █▆▃         
  ▄▄▆▃▄▅▄▄▁▄▃▁▃▁▁▁▁▃▁▁▁▁▃▁▁▃▁▁▁▁▁▁▁▃▁▁▃▁▁▁▁▁▁▁▁▁▃▄▇███▅▇▄▄▄▄▄ ▃
  42.5 ms         Histogram: frequency by time        47.7 ms <

 Memory estimate: 80.57 MiB, allocs estimate: 67.

julia> 

julia> 

julia> @benchmark enlarge_southeast_corner_autoopt(E_east_kagome, C_southeast_kagome, E_south_kagome, A_kagome)
BenchmarkTools.Trial: 107 samples with 1 evaluation per sample.
 Range (min … max):  42.998 ms … 54.184 ms  ┊ GC (min … max): 0.00% … 16.20%
 Time  (median):     47.820 ms              ┊ GC (median):    9.46%
 Time  (mean ± σ):   46.842 ms ±  2.101 ms  ┊ GC (mean ± σ):  7.15% ±  4.31%

                                          ▁▂█▁                 
  ▃▅▃▇▅▅▃▃▁▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅████▆▅▃▃▁▁▁▁▁▁▁▁▁▁▃ ▃
  43 ms           Histogram: frequency by time        49.9 ms <

 Memory estimate: 79.85 MiB, allocs estimate: 62.

julia> @benchmark enlarge_southeast_corner_explicit(E_east_kagome, C_southeast_kagome, E_south_kagome, A_kagome)
BenchmarkTools.Trial: 134 samples with 1 evaluation per sample.
 Range (min … max):  33.557 ms … 41.648 ms  ┊ GC (min … max):  0.00% … 11.41%
 Time  (median):     38.479 ms              ┊ GC (median):    12.37%
 Time  (mean ± σ):   37.537 ms ±  2.099 ms  ┊ GC (mean ± σ):   9.47% ±  5.50%

                                       ▂█▇                     
  ▅▅▅▆▅▁▃▃▁▁▁▃▁▃▁▁▁▁▁▁▁▃▁▁▁▁▁▁▃▁▁▁▁▁▁▃▅███▆▆▃▃▁▃▃▃▁▄▃▃▁▁▁▁▁▁▃ ▃
  33.6 ms         Histogram: frequency by time        41.1 ms <

 Memory estimate: 79.85 MiB, allocs estimate: 62.

julia> 

julia> @benchmark enlarge_southwest_corner_autoopt(E_south_kagome, C_southwest_kagome, E_west_kagome, A_kagome)
BenchmarkTools.Trial: 108 samples with 1 evaluation per sample.
 Range (min … max):  42.717 ms … 49.468 ms  ┊ GC (min … max): 0.00% … 9.29%
 Time  (median):     47.813 ms              ┊ GC (median):    9.46%
 Time  (mean ± σ):   46.698 ms ±  2.134 ms  ┊ GC (mean ± σ):  6.85% ± 4.34%

                                                █▃             
  ▃▁▄▄▅▄▇▇▄▃▃▃▁▁▁▁▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃████▇▆▅▄▃▅▄▁▃▃▃ ▃
  42.7 ms         Histogram: frequency by time        49.2 ms <

 Memory estimate: 79.85 MiB, allocs estimate: 62.

julia> @benchmark enlarge_southwest_corner_explicit(E_south_kagome, C_southwest_kagome, E_west_kagome, A_kagome)
BenchmarkTools.Trial: 135 samples with 1 evaluation per sample.
 Range (min … max):  33.540 ms … 39.497 ms  ┊ GC (min … max):  0.00% … 11.53%
 Time  (median):     38.313 ms              ┊ GC (median):    11.75%
 Time  (mean ± σ):   37.282 ms ±  2.019 ms  ┊ GC (mean ± σ):   8.89% ±  5.22%

                                                 ▁█▃▂          
  ▆▅█▄▃▃▁▁▃▁▁▁▁▃▃▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▁▁▁▁▁▁████▆▇▃▄▄▃▃▃ ▃
  33.5 ms         Histogram: frequency by time        39.3 ms <

 Memory estimate: 79.35 MiB, allocs estimate: 56.

renormalize edge benchmark

julia> @benchmark renormalize_north_edge_autoopt(E_north_kagome, projectors_kagome[1][1, 1, 1], projectors_kagome[2][1, 1, 1], A_kagome)
BenchmarkTools.Trial: 89 samples with 1 evaluation per sample.
 Range (min … max):  52.544 ms … 63.477 ms  ┊ GC (min … max): 0.00% … 13.68%
 Time  (median):     57.143 ms              ┊ GC (median):    7.78%
 Time  (mean ± σ):   56.386 ms ±  2.180 ms  ┊ GC (mean ± σ):  5.69% ±  3.65%

    ▂                                    ▆▂▅    ▃▂█            
  ▅▅█▅█▄▅▅▁▁▅▁▁▄▄▁▁▁▁▁▁▁▁▁▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁████▅▇▇███▅▅▄▄▁▁▁▄▅▄ ▁
  52.5 ms         Histogram: frequency by time        59.1 ms <

 Memory estimate: 79.35 MiB, allocs estimate: 58.

julia> @benchmark renormalize_north_edge_rotate(E_north_kagome, projectors_kagome[1][1, 1, 1], projectors_kagome[2][1, 1, 1], A_kagome)
BenchmarkTools.Trial: 88 samples with 1 evaluation per sample.
 Range (min … max):  52.712 ms … 61.252 ms  ┊ GC (min … max): 0.00% … 7.17%
 Time  (median):     58.150 ms              ┊ GC (median):    7.57%
 Time  (mean ± σ):   56.877 ms ±  2.415 ms  ┊ GC (mean ± σ):  5.28% ± 3.64%

                                            ▆▂█▃               
  ▄▃▄▅▆▆▅▃▄▃▄▃▁▃▁▁▁▁▁▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▄▃▆████▃▄▁▃▄▁▁▃▅▁▁▁▃ ▁
  52.7 ms         Histogram: frequency by time        60.3 ms <

 Memory estimate: 80.35 MiB, allocs estimate: 70.

julia> @benchmark renormalize_north_edge_explicit(E_north_kagome, projectors_kagome[1][1, 1, 1], projectors_kagome[2][1, 1, 1], A_kagome)
BenchmarkTools.Trial: 95 samples with 1 evaluation per sample.
 Range (min … max):  49.137 ms … 55.720 ms  ┊ GC (min … max): 0.00% … 8.08%
 Time  (median):     54.098 ms              ┊ GC (median):    8.23%
 Time  (mean ± σ):   52.891 ms ±  2.164 ms  ┊ GC (mean ± σ):  5.88% ± 3.82%

                                                ▃█▇▃▆          
  ▄▇▇▆█▃▇▃▁▁▄▁▄▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄█████▆▃▄▃▄▃▃▃ ▁
  49.1 ms         Histogram: frequency by time        55.3 ms <

 Memory estimate: 81.79 MiB, allocs estimate: 70.

julia> @benchmark renormalize_north_edge_rotate_explicit(E_north_kagome, projectors_kagome[1][1, 1, 1], projectors_kagome[2][1, 1, 1], A_kagome)
BenchmarkTools.Trial: 112 samples with 1 evaluation per sample.
 Range (min … max):  40.868 ms … 47.124 ms  ┊ GC (min … max): 0.00% … 9.73%
 Time  (median):     45.718 ms              ┊ GC (median):    9.89%
 Time  (mean ± σ):   44.648 ms ±  2.002 ms  ┊ GC (mean ± σ):  7.26% ± 4.49%

                                                ▂█▅▃           
  ▃▆▆▃▄▃▄▁▃▃▁▁▃▃▁▃▁▁▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃████▃▁▅▁▁▃▃▃▃ ▃
  40.9 ms         Histogram: frequency by time        46.9 ms <

 Memory estimate: 81.07 MiB, allocs estimate: 70.

julia> 

julia> 

julia> @benchmark renormalize_east_edge_autoopt(E_east_kagome, projectors_kagome[1][2, 1, 1], projectors_kagome[2][2, 1, 1], A_kagome)
BenchmarkTools.Trial: 82 samples with 1 evaluation per sample.
 Range (min … max):  57.533 ms … 66.594 ms  ┊ GC (min … max): 0.00% … 12.92%
 Time  (median):     62.341 ms              ┊ GC (median):    7.14%
 Time  (mean ± σ):   61.258 ms ±  2.121 ms  ┊ GC (mean ± σ):  4.99% ±  3.49%

        ▂                                   ▂▅▆█▂ ▃▃           
  ▇▅▄▇▄▄█▄▄▇▁▄▁▁▄▁▁▄▁▄▁▁▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇█████▄██▄▁▄▄▁▁▁▁▄ ▁
  57.5 ms         Histogram: frequency by time          64 ms <

 Memory estimate: 79.85 MiB, allocs estimate: 64.

julia> @benchmark renormalize_east_edge_rotate(E_east_kagome, projectors_kagome[1][2, 1, 1], projectors_kagome[2][2, 1, 1], A_kagome)
BenchmarkTools.Trial: 88 samples with 1 evaluation per sample.
 Range (min … max):  52.969 ms … 61.811 ms  ┊ GC (min … max): 0.00% … 12.36%
 Time  (median):     58.318 ms              ┊ GC (median):    7.52%
 Time  (mean ± σ):   57.217 ms ±  2.274 ms  ┊ GC (mean ± σ):  5.37% ±  3.65%

                                       ▃█▂                     
  ▄▃▁▆▄▃▃▄▅▄▄▇▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▁▆███▆▇▁▅▄▅▁▃▅▁▁▁▁▁▁▁▁▁▃ ▁
  53 ms           Histogram: frequency by time        61.2 ms <

 Memory estimate: 80.35 MiB, allocs estimate: 70.

julia> @benchmark renormalize_east_edge_explicit(E_east_kagome, projectors_kagome[1][2, 1, 1], projectors_kagome[2][2, 1, 1], A_kagome)
BenchmarkTools.Trial: 97 samples with 1 evaluation per sample.
 Range (min … max):  46.477 ms … 55.295 ms  ┊ GC (min … max): 0.00% … 8.10%
 Time  (median):     53.226 ms              ┊ GC (median):    8.40%
 Time  (mean ± σ):   51.701 ms ±  2.875 ms  ┊ GC (mean ± σ):  6.10% ± 3.88%

                                                  ▂▃▂█▃        
  ▄▇▅▅█▄▇█▄▄▄▄▄▁▄▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅▄▅▄▁▅▅▅▁▁▁▅███████▄▄█▇▄▇ ▁
  46.5 ms         Histogram: frequency by time        54.9 ms <

 Memory estimate: 81.79 MiB, allocs estimate: 75.

julia> @benchmark renormalize_east_edge_rotate_explicit(E_east_kagome, projectors_kagome[1][2, 1, 1], projectors_kagome[2][2, 1, 1], A_kagome)
BenchmarkTools.Trial: 109 samples with 1 evaluation per sample.
 Range (min … max):  41.175 ms … 52.484 ms  ┊ GC (min … max):  0.00% … 18.13%
 Time  (median):     46.925 ms              ┊ GC (median):    10.30%
 Time  (mean ± σ):   45.899 ms ±  2.562 ms  ┊ GC (mean ± σ):   7.56% ±  4.96%

     ▂                            █▄ ▂▂▂▂                      
  ▅▅▃██▃▅▅▇▃▆▁▃▃▁▃▁▃▁▁▁▁▁▁▁▁▃▅▅█▆▅██▆████▇▇▃▅▃▁▁▁▁▁▁▃▁▁▁▁▁▁▁▃ ▃
  41.2 ms         Histogram: frequency by time        51.5 ms <

 Memory estimate: 81.07 MiB, allocs estimate: 70.

julia> 

julia> @benchmark renormalize_south_edge_autoopt(E_south_kagome, projectors_kagome[1][3, 1, 1], projectors_kagome[2][3, 1, 1], A_kagome)
BenchmarkTools.Trial: 82 samples with 1 evaluation per sample.
 Range (min … max):  57.679 ms … 68.490 ms  ┊ GC (min … max): 0.00% … 12.67%
 Time  (median):     62.410 ms              ┊ GC (median):    7.17%
 Time  (mean ± σ):   61.463 ms ±  2.229 ms  ┊ GC (mean ± σ):  5.08% ±  3.46%

    ▂  ▃                             ▆█▃█▆                     
  ▅▄█▄██▇▁▁▁▁▁▁▄▁▄▁▁▁▁▄▁▁▁▁▄▁▁▁▁▁▁▁▄▅█████▇▇▅▁▁▁▁▁▁▁▄▄▁▁▄▁▁▁▄ ▁
  57.7 ms         Histogram: frequency by time        65.3 ms <

 Memory estimate: 79.85 MiB, allocs estimate: 64.

julia> @benchmark renormalize_south_edge_rotate(E_south_kagome, projectors_kagome[1][3, 1, 1], projectors_kagome[2][3, 1, 1], A_kagome)
BenchmarkTools.Trial: 86 samples with 1 evaluation per sample.
 Range (min … max):  51.854 ms … 78.354 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     58.419 ms              ┊ GC (median):    8.23%
 Time  (mean ± σ):   58.266 ms ±  3.277 ms  ┊ GC (mean ± σ):  6.08% ± 4.39%

        ▁              ▇█▃▁  ▁                                 
  ▅▁▅▁▅▇█▇▇▁▁▁▁▅▁▅▁▇▇▇▅████▅▅█▅▇▅▅▁▅▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  51.9 ms      Histogram: log(frequency) by time      69.3 ms <

 Memory estimate: 80.35 MiB, allocs estimate: 70.

julia> @benchmark renormalize_south_edge_explicit(E_south_kagome, projectors_kagome[1][3, 1, 1], projectors_kagome[2][3, 1, 1], A_kagome)
BenchmarkTools.Trial: 107 samples with 1 evaluation per sample.
 Range (min … max):  42.487 ms … 58.384 ms  ┊ GC (min … max): 0.00% … 26.47%
 Time  (median):     47.465 ms              ┊ GC (median):    9.49%
 Time  (mean ± σ):   46.956 ms ±  2.427 ms  ┊ GC (mean ± σ):  7.34% ±  4.82%

  ▁▁▂▁                  ▅█▅▂                                   
  ████▅▁▅▁▁▁▅▅▁▅▁▇▁▅▁▅▅▅████▇▅█▇█▅▇▅▁▁▅▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▅
  42.5 ms      Histogram: log(frequency) by time      55.1 ms <

 Memory estimate: 84.23 MiB, allocs estimate: 87.

julia> @benchmark renormalize_south_edge_rotate_explicit(E_south_kagome, projectors_kagome[1][3, 1, 1], projectors_kagome[2][3, 1, 1], A_kagome)
BenchmarkTools.Trial: 112 samples with 1 evaluation per sample.
 Range (min … max):  40.898 ms … 47.712 ms  ┊ GC (min … max): 0.00% … 9.54%
 Time  (median):     45.744 ms              ┊ GC (median):    9.85%
 Time  (mean ± σ):   44.768 ms ±  2.011 ms  ┊ GC (mean ± σ):  7.22% ± 4.47%

                                           ▁▆▇█                
  ▅▅▅▆▇▅▃▄▁▁▃▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▄▃▃▁▁▁▁▁▁▁▁▁▁▆████▆▆▄▆▃▃▃▃▁▁▄▁▁▃ ▃
  40.9 ms         Histogram: frequency by time        47.6 ms <

 Memory estimate: 81.07 MiB, allocs estimate: 70.

julia> 

julia> @benchmark renormalize_west_edge_autoopt(E_west_kagome, projectors_kagome[1][4, 1, 1], projectors_kagome[2][4, 1, 1], A_kagome)
BenchmarkTools.Trial: 88 samples with 1 evaluation per sample.
 Range (min … max):  52.704 ms … 61.470 ms  ┊ GC (min … max): 0.00% … 7.15%
 Time  (median):     58.146 ms              ┊ GC (median):    7.58%
 Time  (mean ± σ):   56.923 ms ±  2.439 ms  ┊ GC (mean ± σ):  5.32% ± 3.68%

          ▂                                ▅▇█▄                
  ▃▃▆▅▆▆█▅█▃▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃▁▃▁▆████▆█▅▃▆▃▁▃▁▁▃▃▁▅ ▁
  52.7 ms         Histogram: frequency by time        60.3 ms <

 Memory estimate: 79.85 MiB, allocs estimate: 64.

julia> @benchmark renormalize_west_edge_explicit(E_west_kagome, projectors_kagome[1][4, 1, 1], projectors_kagome[2][4, 1, 1], A_kagome)
BenchmarkTools.Trial: 113 samples with 1 evaluation per sample.
 Range (min … max):  40.600 ms … 47.968 ms  ┊ GC (min … max): 0.00% … 9.56%
 Time  (median):     45.330 ms              ┊ GC (median):    9.97%
 Time  (mean ± σ):   44.291 ms ±  2.096 ms  ┊ GC (mean ± σ):  7.35% ± 4.51%

                                        ▁▆▆█▄                  
  ▅█▇▅▇▆▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▄█████▆▆▃▁▃▃▁▃▁▁▁▁▃▁▁▃ ▃
  40.6 ms         Histogram: frequency by time        47.6 ms <

 Memory estimate: 80.57 MiB, allocs estimate: 64.

Updated summary enlarger corner, the smaller the best.

	Ising D=2, χ=100	Kagome D=16, χ=100	D2=121, χ=121	D2=256, χ=256
enlarge NW	explicit < autoopt	explicit < autoopt	=	=
enlarge NE	explicit < autoopt < explicitNE	explicit < explicitNE≲ autoopt	explicitNE = autoopt < explicit	explicitNE = autoopt < explicit
enlarge SE	explicit < autoopt	explicit < autoopt	explicit < autoopt	=
enlarge SW	explicit < autoopt	explicit < autoopt	explicit < autoopt	explicit < autoopt

Updated summary renormalize edge, the smaller the best. Since for direction west, explicit is always better than @autoopt, rotate_explicit is always better than rotate.

	Ising D=2, χ=100	Kagome D=16, χ=100	D2=121	D2=256
renormalize N	rotate < explicit	rotate < explicit < autoopt	explicit < rotate	explicit < rotate
renormalize E	rotate < explicit	rotate < explicit< autoopt	explicit < rotate	explicit < rotate
renormalize S	rotate < explicit	rotate < explicit < autoopt	explicit < rotate	explicit < rotate
renormalize W	explicit < autoopt	explicit < autoopt	explicit < autoopt	explicit < autoopt

Going from D=2 to D=16, for corner north east autoopt and explicitNE are now close similar, with explicit NE already slightly faster. For edge renormalization, the order did not change but the gap between rotate and explicit is closing, the two are now very close. With this new data, I now favor having this PR for renormalize_edge in all cases. The case enlarge corner NE is more complicated.

ogauthe · 2025-09-23T17:56:29Z

To summarize:

edge renormalization: the 2 competing implementations have close performances for vD = ℂ^16 (worst case: 40 ms vs ~49 ms). For D2=121, the explicit scheme is already much better
corner renormalization: corners NW, SW and SE are improved by this PR
corner NE is a special case with no size fits all. This PR currently uses a custom implementation optimizing the large D case. It is not always the fastest but is never too bad. It may be improved using AB = C = (C')' = (B'A')').

lkdvos · 2025-09-24T00:11:18Z

@leburgel it seems like something did actually go wrong with the last changes, as now the example tests seem to time out. Any idea what could be the cause?

leburgel · 2025-09-25T06:35:58Z

@leburgel it seems like something did actually go wrong with the last changes, as now the example tests seem to time out. Any idea what could be the cause?

I actually don't think things went wrong with #261, but rather that it fixed the oversight from #246 that caused the gradient computation with the fallback linear solver to terminate long before it actually converged. This means it was just moving on with very bad gradients, which I think didn't cause any problems because this usually happens at the start of an optimization.

I had a look, and in the test that timed out it got completely stuck in the first LBFGS iteration for the variational optimization of the Heisenberg model starting from the simple update result. How stuck it gets seems to vary a lot, but in the one that timed out it was particularly bad. I think if we come up with a generic procedure to 'kick' the simple update starting guess a bit before feeding it into the variational optimization this should be solved.

I don't think there's a way out algorithmically, it seems both the eigsolve and linsolve methods for computing the fixed point gradient just have a lot of trouble converging.

lkdvos · 2025-09-25T11:19:02Z

Ok, in that case I'll merge this since that is unrelated.

lkdvos previously approved these changes Aug 16, 2025

View reviewed changes

ogauthe marked this pull request as draft August 19, 2025 03:24

VictorVanthilt added documentation Improvements or additions to documentation and removed documentation Improvements or additions to documentation labels Aug 20, 2025

ogauthe force-pushed the autopt_PF branch from 4dc144b to b7a2ede Compare September 8, 2025 21:28

ogauthe dismissed lkdvos’s stale review via 099e2fe September 8, 2025 22:24

ogauthe force-pushed the autopt_PF branch from 94fdfa9 to 685e429 Compare September 11, 2025 15:12

ogauthe force-pushed the autopt_PF branch from 685e429 to 4ace464 Compare September 11, 2025 15:21

ogauthe marked this pull request as ready for review September 11, 2025 15:28

lkdvos reviewed Sep 11, 2025

View reviewed changes

src/algorithms/contractions/ctmrg_contractions.jl Show resolved Hide resolved

ogauthe force-pushed the autopt_PF branch 2 times, most recently from 5192f5a to c3d7e4a Compare September 23, 2025 14:25

ogauthe added 8 commits September 24, 2025 17:34

avoid @autoopt

eef91d6

optimize renomarlize_south_edge

47922bd

add space test

bc7b6db

fix contractions

5d981ff

use permute

4026327

debug

a1c4a37

do not permute A in enlarge_northeast_corner

a9f487e

runic

7c2d7bd

ogauthe force-pushed the autopt_PF branch from c3d7e4a to 7c2d7bd Compare September 24, 2025 21:34

lkdvos merged commit 73a5056 into QuantumKitHub:master Sep 25, 2025
49 of 51 checks passed

ogauthe deleted the autopt_PF branch September 25, 2025 23:24

[perf] avoid @autoopt for partition function #245

[perf] avoid @autoopt for partition function #245

Uh oh!

Conversation

ogauthe commented Aug 15, 2025

Uh oh!

codecov bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lkdvos left a comment

Choose a reason for hiding this comment

Uh oh!

ogauthe commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ogauthe commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ising partition function, Trivial sector, D=2, χ=50.

bilayer quantum tensor from finite temperature, with D2=121,χ=121

bilayer quantum tensor from finite temperature, with D2=256,χ=256

Uh oh!

lkdvos commented Sep 15, 2025

Uh oh!

ogauthe commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogauthe commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogauthe commented Sep 23, 2025

Uh oh!

lkdvos commented Sep 24, 2025

Uh oh!

leburgel commented Sep 25, 2025

Uh oh!

lkdvos commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[perf] avoid `@autoopt` for partition function #245

[perf] avoid `@autoopt` for partition function #245

codecov bot commented Aug 15, 2025 •

edited

Loading

ogauthe commented Sep 11, 2025 •

edited

Loading

ogauthe commented Sep 11, 2025 •

edited

Loading

Ising partition function, Trivial sector, `D=2`, `χ=50`.

bilayer quantum tensor from finite temperature, with `D2=121`,`χ=121`

bilayer quantum tensor from finite temperature, with `D2=256`,`χ=256`

ogauthe commented Sep 15, 2025 •

edited

Loading

ogauthe commented Sep 16, 2025 •

edited

Loading