-
Notifications
You must be signed in to change notification settings - Fork 26
[perf] avoid @autoopt for partition function
#245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
... and 2 files with indirect coverage changes 🚀 New features to boost your workflow:
|
lkdvos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Any chance you have some timing results too?
Considering the planar and braiding things, happy to explain but this isn't super relevant here because they require @planar anyways so we don't really support that in PEPSKit.jl right now. In principle we could do this for the partition functions, but until someone actually needs it I'm fine with just keeping the @tensor.
(Combining planarity and efficiency would actually be kind of a nightmare: for braided categories at least we could replace the permutations with braidings and their inverses, but for non-braided ones we cannot arbitrarily choose the intermediary permutations so we're more limited in what can be done)
|
I now specialize all renormalize edge contractions for partition function. The motivation is that when the partition function tensor is actually a contracted double layer quantum wavefunction, The new contraction scheme may not be optimal in the limit D very small and χ very large and non-abelian symmetry, but I think this is a pretty uncommon case. In other cases, either contraction will dominate or if D is large having D or χ as the first leg would be equivalent. I also fixed variables names in some other methods: although the contraction scheme were correct, the name employed for Todo: some timing |
|
I did the benchmarks in 3 cases: benchmark codeusing TensorOperations: @tensor
using TensorKit
using TensorKit: ×
using PEPSKit
using PEPSKit: @autoopt, CTMRGCornerTensor, CTMRG_PF_EdgeTensor, EnlargedCorner, PFTensor, dtmap!!, eachcoordinate, leading_boundary, select_algorithm, simultaneous_projectors
using BenchmarkTools
# ==================== master =======================================================
function enlarge_northwest_corner_autoopt(
E_west::CTMRG_PF_EdgeTensor, C_northwest::CTMRGCornerTensor,
E_north::CTMRG_PF_EdgeTensor, A::PFTensor,
)
return @autoopt @tensor corner[χ_S D_S; χ_E D_E] :=
E_west[χ_S D1; χ1] * C_northwest[χ1; χ2] * E_north[χ2 D2; χ_E] * A[D1 D_S; D2 D_E]
end
function enlarge_northeast_corner_autoopt(E_north::CTMRG_PF_EdgeTensor, C_northeast::CTMRGCornerTensor,
E_east::CTMRG_PF_EdgeTensor, A::PFTensor,
)
return @autoopt @tensor corner[χ_W D_W; χ_S D_S] :=
E_north[χ_W D1; χ1] * C_northeast[χ1; χ2] * E_east[χ2 D2; χ_S] * A[D_W D_S; D1 D2]
end
function enlarge_southeast_corner_autoopt(
E_east::CTMRG_PF_EdgeTensor, C_southeast::CTMRGCornerTensor,
E_south::CTMRG_PF_EdgeTensor, A::PFTensor,
)
return @autoopt @tensor corner[χ_N D_N; χ_W D_W] :=
E_east[χ_N D1; χ1] * C_southeast[χ1; χ2] * E_south[χ2 D2; χ_W] * A[D_W D2; D_N D1]
end
function enlarge_southwest_corner_autoopt(
E_south::CTMRG_PF_EdgeTensor, C_southwest::CTMRGCornerTensor,
E_west::CTMRG_PF_EdgeTensor, A::PFTensor,
)
return @autoopt @tensor corner[χ_E D_E; χ_N D_N] :=
E_south[χ_E D1; χ1] * C_southwest[χ1; χ2] * E_west[χ2 D2; χ_N] * A[D2 D1; D_N D_E]
end
function renormalize_north_edge_rotate(E_north, P_right, P_left, A)
A_west = PEPSKit._rotl90_localsandwich(A)
return renormalize_west_edge_autoopt(E_north, P_right, P_left, A_west)
end
function renormalize_east_edge_rotate(E_east, P_bottom, P_top, A)
A_west = PEPSKit._rot180_localsandwich(A)
return renormalize_west_edge_autoopt(E_east, P_bottom, P_top, A_west)
end
function renormalize_south_edge_rotate(E_south, P_left, P_right, A)
A_west = PEPSKit._rotr90_localsandwich(A)
return renormalize_west_edge_autoopt(E_south, P_left, P_right, A_west)
end
function renormalize_west_edge_autoopt(E_west::CTMRG_PF_EdgeTensor, P_top, P_bottom, A::PFTensor)
return @autoopt @tensor edge[χ_S D_E; χ_N] :=
E_west[χ1 D1; χ2] * A[D1 D5; D3 D_E] * P_top[χ2 D3; χ_N] * P_bottom[χ_S; χ1 D5]
end
# mixed
function renormalize_north_edge_rotate_explicit(E_north, P_right, P_left, A)
A_west = PEPSKit._rotl90_localsandwich(A)
return renormalize_west_edge_explicit(E_north, P_right, P_left, A_west)
end
function renormalize_east_edge_rotate_explicit(E_east, P_bottom, P_top, A)
A_west = PEPSKit._rot180_localsandwich(A)
return renormalize_west_edge_explicit(E_east, P_bottom, P_top, A_west)
end
function renormalize_south_edge_rotate_explicit(E_south, P_left, P_right, A)
A_west = PEPSKit._rotr90_localsandwich(A)
return renormalize_west_edge_explicit(E_south, P_left, P_right, A_west)
end
# ==================== explicit =======================================================
function enlarge_northwest_corner_explicit(
E_west::CTMRG_PF_EdgeTensor, C_northwest::CTMRGCornerTensor,
E_north::CTMRG_PF_EdgeTensor, A::PFTensor,
)
return @tensor begin
EC[χ_S DW; χ2] := E_west[χ_S DW; χ1] * C_northwest[χ1; χ2]
ECE[χ_S χ_E; DW DN] := EC[χ_S DW; χ2] * E_north[χ2 DN; χ_E]
corner[χ_S D_S; χ_E D_E] := ECE[χ_S χ_E; DW DN] * A[DW D_S; DN D_E]
end
end
function enlarge_northeast_corner_explicit(
E_north::CTMRG_PF_EdgeTensor, C_northeast::CTMRGCornerTensor,
E_east::CTMRG_PF_EdgeTensor, A::PFTensor,
)
return @tensor begin
EC[χ_W DN; χ2] := E_north[χ_W DN; χ1] * C_northeast[χ1; χ2]
ECE[χ_W χ_S; DN DE] := EC[χ_W DN; χ2] * E_east[χ2 DE; χ_S]
corner[χ_W D_W; χ_S D_S] := ECE[χ_W χ_S; DN DE] * A[D_W D_S; DN DE]
end
end
function enlarge_northeast_corner_explicit_NE(
E_north::CTMRG_PF_EdgeTensor, C_northeast::CTMRGCornerTensor,
E_east::CTMRG_PF_EdgeTensor, A::PFTensor,
)
return @tensor begin
EC[DN χ_W; χ2] := E_north[χ_W DN; χ1] * C_northeast[χ1; χ2]
ECE[DN DE; χ_S χ_W] := EC[DN χ_W; χ2] * E_east[χ2 DE; χ_S]
corner[χ_W D_W; χ_S D_S] := A[D_W D_S; DN DE] * ECE[DN DE; χ_S χ_W]
end
end
function enlarge_southeast_corner_explicit(
E_east::CTMRG_PF_EdgeTensor, C_southeast::CTMRGCornerTensor,
E_south::CTMRG_PF_EdgeTensor, A::PFTensor,
)
return @tensor begin
EC[χ_N D1; χ2] := E_east[χ_N D1; χ1] * C_southeast[χ1; χ2]
ECE[χ_N χ_W; D1 D2] := EC[χ_N D1; χ2] * E_south[χ2 D2; χ_W]
corner[χ_N D_N; χ_W D_W] := ECE[χ_N χ_W; D1 D2] * A[D_W D2; D_N D1]
end
end
function enlarge_southwest_corner_explicit(
E_south::CTMRG_PF_EdgeTensor, C_southwest::CTMRGCornerTensor,
E_west::CTMRG_PF_EdgeTensor, A::PFTensor,
)
return @tensor begin
EC[χ_E D1; χ2] := E_south[χ_E D1; χ1] * C_southwest[χ1; χ2]
ECE[χ_E χ_N; D2 D1] := EC[χ_E D1; χ2] * E_west[χ2 D2; χ_N]
corner[χ_E D_E; χ_N D_N] := ECE[χ_E χ_N; D2 D1] * A[D2 D1; D_N D_E]
end
end
function renormalize_north_edge_explicit(E_north::CTMRG_PF_EdgeTensor, P_right, P_left, A::PFTensor)
return @tensor begin
temp = permute(E_north, ((2, 1), (3,))) # impose D_N as 1st leg
PE[D_N D_E; χNW χ_E] := temp[D_N χNW; χNE] * P_right[χNE D_E; χ_E]
PEA[D_W χNW; D_S χ_E] := A[D_W D_S; D_N D_E] * PE[D_N D_E; χNW χ_E]
P_leftp = permute(P_left, ((1,), (3, 2)))
edge[χ_W D_S; χ_E] := P_leftp[χ_W; D_W χNW] * PEA[D_W χNW; D_S χ_E]
end
end
function renormalize_east_edge_explicit(E_east::CTMRG_PF_EdgeTensor, P_bottom, P_top, A::PFTensor)
return @tensor begin
temp = permute(P_top, ((3, 1), (2,))) # impose D_N as 1st leg
PE[D_N D_E; χN χSE] := temp[D_N χN; χNE] * E_east[χNE D_E; χSE]
PEA[D_W χN; χSE D_S] := A[D_W D_S; D_N D_E] * PE[D_N D_E; χN χSE]
edge[χ_N D_W; χ_S] := PEA[D_W χ_N; χSE D_S] * P_bottom[χSE D_S; χ_S]
end
end
function renormalize_south_edge_explicit(E_south::CTMRG_PF_EdgeTensor, P_left, P_right, A::PFTensor)
# specialize to avoid extra permute on A when calling renormalize_west_edge
return @tensor begin
P_leftp = permute(P_left, ((3, 2), (1,))) # impose χ_W as 1st leg
PE[χ_W χSE; D_W D_S] := P_leftp[χ_W D_W; χSW] * E_south[χSE D_S; χSW]
PEA[χ_W D_N; χSE D_E] := PE[χ_W χSE; D_W D_S] * A[D_W D_S; D_N D_E]
edge[χ_E D_N; χ_W] := PEA[χ_W D_N; χSE D_E] * P_right[χ_E; χSE D_E]
end
end
function renormalize_west_edge_explicit(E_west::CTMRG_PF_EdgeTensor, P_top, P_bottom, A::PFTensor)
return @tensor begin
PE[χ_S χNW; D_W D_S] := P_bottom[χ_S; χSW D_S] * E_west[χSW D_W; χNW]
PEA[χ_S D_E; χNW D_N] := PE[χ_S χNW; D_W D_S] * A[D_W D_S; D_N D_E]
edge[χ_S D_E; χ_N] := PEA[χ_S D_E; χNW D_N] * P_top[χNW D_N; χ_N]
end
end
# ============================================================================================
function get_projectors(env, Z)
alg = select_algorithm(leading_boundary, env; projector_alg=:fullinfinite)
network = InfiniteSquareNetwork(Z)
coordinates = eachcoordinate(network, 1:4)
T_corners = Base.promote_op(
TensorMap ∘ EnlargedCorner, typeof(network), typeof(env), eltype(coordinates)
)
enlarged_corners′ = similar(coordinates, T_corners)
enlarged_corners::typeof(enlarged_corners′) =
dtmap!!(enlarged_corners′, eachcoordinate(network, 1:4)) do idx
return TensorMap(EnlargedCorner(network, env, idx))
end # expand environment
projectors, info = simultaneous_projectors(enlarged_corners, env, alg.projector_alg) # compute projectors on all coordinates
return projectors
endIsing partition function, Trivial sector,
|
|
Thanks a lot for the detailed benchmark! I do have to admit that the results are somewhat surprising to me. Am I reading that wrong or are there actual regressions by making this change too? I think it is indeed expected that the speedup isn't uniform over all the directions, but I would have guessed that we should be able to have an improvement overall, and seemingly right now it is only sometimes true. I realize also that the cases vary quite wildly, since In particular, what surprised me is some of the choices of putting a |
|
These results are a bit confusing. Here is a try to make things more clear, the smaller the best.
Here are the edge renormalization. Since for direction west, explicit is always better than
Hence the questions are
But I think these benchmark show that this PR improves corners NW-SE-SW and renornalize west |
|
More benchmarks: I considered tensors as found in https://arxiv.org/abs/2505.05889 with D=16 and χ=100. This looks more relevant that Ising square lattice with D=2 that was too simple. Kagome Ising D=16 codeA_kagome = randn(ℂ^16 ⊗ ℂ^16 ← ℂ^16 ⊗ ℂ^16)
Z_kagome = InfinitePartitionFunction(A_kagome)
χ_kagome = ℂ^100
env0 = CTMRGEnv(Z_kagome, χ_kagome)
env_kagome, = leading_boundary(env0, Z_kagome; alg=:simultaneous, maxiter=20, projector_alg=:fullinfinite)
projectors_kagome = get_projectors(env_kagome, Z_kagome)
E_north_kagome, E_east_kagome, E_south_kagome, E_west_kagome = env_kagome.edges[:, 1, 1]
C_northwest_kagome, C_northeast_kagome, C_southeast_kagome, C_southwest_kagome = env_kagome.corners[:, 1, 1]enlarge corner benchmarkjulia> @benchmark enlarge_northwest_corner_autoopt(E_west_kagome, C_northwest_kagome, E_north_kagome, A_kagome)
BenchmarkTools.Trial: 107 samples with 1 evaluation per sample.
Range (min … max): 42.886 ms … 53.565 ms ┊ GC (min … max): 0.00% … 15.96%
Time (median): 47.162 ms ┊ GC (median): 9.43%
Time (mean ± σ): 46.745 ms ± 2.163 ms ┊ GC (mean ± σ): 6.87% ± 4.48%
█ ▆▂
▅█▆▆▆▁▅▃▃▃▁▁▁▁▃▃▁▁▁▁▃▇▆█▆▆▇▅██▄▅▆▆▆▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
42.9 ms Histogram: frequency by time 53.3 ms <
Memory estimate: 79.85 MiB, allocs estimate: 62.
julia> @benchmark enlarge_northwest_corner_explicit(E_west_kagome, C_northwest_kagome, E_north_kagome, A_kagome)
BenchmarkTools.Trial: 134 samples with 1 evaluation per sample.
Range (min … max): 33.823 ms … 42.002 ms ┊ GC (min … max): 0.00% … 11.31%
Time (median): 38.438 ms ┊ GC (median): 11.78%
Time (mean ± σ): 37.491 ms ± 1.946 ms ┊ GC (mean ± σ): 8.95% ± 5.19%
▁▄▂▂ ▆██▄
████▅▁▅▅▅▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▅▅▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁█████▆▆█▅▅▁▁▁▅▁▅ ▅
33.8 ms Histogram: log(frequency) by time 39.9 ms <
Memory estimate: 79.85 MiB, allocs estimate: 62.
julia>
julia> @benchmark enlarge_northeast_corner_autoopt(E_north_kagome, C_northeast_kagome, E_east_kagome, A_kagome)
BenchmarkTools.Trial: 108 samples with 1 evaluation per sample.
Range (min … max): 42.761 ms … 60.854 ms ┊ GC (min … max): 0.00% … 25.63%
Time (median): 47.327 ms ┊ GC (median): 9.67%
Time (mean ± σ): 46.664 ms ± 2.455 ms ┊ GC (mean ± σ): 7.18% ± 4.79%
▃ ▃ ▇▇ ▃ █
▆▃▆▅██▇▅▁▁▁▁▁▅▁▁▁▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁█▆██▇▆█▇██▇▁▆▃▁▁▁▅▁▁▁▁▁▁▁▃▃ ▃
42.8 ms Histogram: frequency by time 50.4 ms <
Memory estimate: 79.35 MiB, allocs estimate: 56.
julia> @benchmark enlarge_northeast_corner_explicit(E_north_kagome, C_northeast_kagome, E_east_kagome, A_kagome)
BenchmarkTools.Trial: 133 samples with 1 evaluation per sample.
Range (min … max): 34.070 ms … 40.255 ms ┊ GC (min … max): 0.00% … 11.48%
Time (median): 38.683 ms ┊ GC (median): 11.75%
Time (mean ± σ): 37.692 ms ± 1.981 ms ┊ GC (mean ± σ): 8.93% ± 5.20%
▃█▅
▅▅█▄▃▃▃▁▃▃▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▃███▆▆▄▃▃▃▁▃▃▃▃▃ ▃
34.1 ms Histogram: frequency by time 40 ms <
Memory estimate: 79.85 MiB, allocs estimate: 62.
julia> @benchmark enlarge_northeast_corner_explicit_NE(E_north_kagome, C_northeast_kagome, E_east_kagome, A_kagome)
BenchmarkTools.Trial: 109 samples with 1 evaluation per sample.
Range (min … max): 42.475 ms … 48.382 ms ┊ GC (min … max): 0.00% … 9.79%
Time (median): 46.905 ms ┊ GC (median): 9.64%
Time (mean ± σ): 45.980 ms ± 1.802 ms ┊ GC (mean ± σ): 7.00% ± 4.43%
█▆▃
▄▄▆▃▄▅▄▄▁▄▃▁▃▁▁▁▁▃▁▁▁▁▃▁▁▃▁▁▁▁▁▁▁▃▁▁▃▁▁▁▁▁▁▁▁▁▃▄▇███▅▇▄▄▄▄▄ ▃
42.5 ms Histogram: frequency by time 47.7 ms <
Memory estimate: 80.57 MiB, allocs estimate: 67.
julia>
julia>
julia> @benchmark enlarge_southeast_corner_autoopt(E_east_kagome, C_southeast_kagome, E_south_kagome, A_kagome)
BenchmarkTools.Trial: 107 samples with 1 evaluation per sample.
Range (min … max): 42.998 ms … 54.184 ms ┊ GC (min … max): 0.00% … 16.20%
Time (median): 47.820 ms ┊ GC (median): 9.46%
Time (mean ± σ): 46.842 ms ± 2.101 ms ┊ GC (mean ± σ): 7.15% ± 4.31%
▁▂█▁
▃▅▃▇▅▅▃▃▁▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅████▆▅▃▃▁▁▁▁▁▁▁▁▁▁▃ ▃
43 ms Histogram: frequency by time 49.9 ms <
Memory estimate: 79.85 MiB, allocs estimate: 62.
julia> @benchmark enlarge_southeast_corner_explicit(E_east_kagome, C_southeast_kagome, E_south_kagome, A_kagome)
BenchmarkTools.Trial: 134 samples with 1 evaluation per sample.
Range (min … max): 33.557 ms … 41.648 ms ┊ GC (min … max): 0.00% … 11.41%
Time (median): 38.479 ms ┊ GC (median): 12.37%
Time (mean ± σ): 37.537 ms ± 2.099 ms ┊ GC (mean ± σ): 9.47% ± 5.50%
▂█▇
▅▅▅▆▅▁▃▃▁▁▁▃▁▃▁▁▁▁▁▁▁▃▁▁▁▁▁▁▃▁▁▁▁▁▁▃▅███▆▆▃▃▁▃▃▃▁▄▃▃▁▁▁▁▁▁▃ ▃
33.6 ms Histogram: frequency by time 41.1 ms <
Memory estimate: 79.85 MiB, allocs estimate: 62.
julia>
julia> @benchmark enlarge_southwest_corner_autoopt(E_south_kagome, C_southwest_kagome, E_west_kagome, A_kagome)
BenchmarkTools.Trial: 108 samples with 1 evaluation per sample.
Range (min … max): 42.717 ms … 49.468 ms ┊ GC (min … max): 0.00% … 9.29%
Time (median): 47.813 ms ┊ GC (median): 9.46%
Time (mean ± σ): 46.698 ms ± 2.134 ms ┊ GC (mean ± σ): 6.85% ± 4.34%
█▃
▃▁▄▄▅▄▇▇▄▃▃▃▁▁▁▁▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃████▇▆▅▄▃▅▄▁▃▃▃ ▃
42.7 ms Histogram: frequency by time 49.2 ms <
Memory estimate: 79.85 MiB, allocs estimate: 62.
julia> @benchmark enlarge_southwest_corner_explicit(E_south_kagome, C_southwest_kagome, E_west_kagome, A_kagome)
BenchmarkTools.Trial: 135 samples with 1 evaluation per sample.
Range (min … max): 33.540 ms … 39.497 ms ┊ GC (min … max): 0.00% … 11.53%
Time (median): 38.313 ms ┊ GC (median): 11.75%
Time (mean ± σ): 37.282 ms ± 2.019 ms ┊ GC (mean ± σ): 8.89% ± 5.22%
▁█▃▂
▆▅█▄▃▃▁▁▃▁▁▁▁▃▃▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▁▁▁▁▁▁████▆▇▃▄▄▃▃▃ ▃
33.5 ms Histogram: frequency by time 39.3 ms <
Memory estimate: 79.35 MiB, allocs estimate: 56.renormalize edge benchmarkjulia> @benchmark renormalize_north_edge_autoopt(E_north_kagome, projectors_kagome[1][1, 1, 1], projectors_kagome[2][1, 1, 1], A_kagome)
BenchmarkTools.Trial: 89 samples with 1 evaluation per sample.
Range (min … max): 52.544 ms … 63.477 ms ┊ GC (min … max): 0.00% … 13.68%
Time (median): 57.143 ms ┊ GC (median): 7.78%
Time (mean ± σ): 56.386 ms ± 2.180 ms ┊ GC (mean ± σ): 5.69% ± 3.65%
▂ ▆▂▅ ▃▂█
▅▅█▅█▄▅▅▁▁▅▁▁▄▄▁▁▁▁▁▁▁▁▁▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁████▅▇▇███▅▅▄▄▁▁▁▄▅▄ ▁
52.5 ms Histogram: frequency by time 59.1 ms <
Memory estimate: 79.35 MiB, allocs estimate: 58.
julia> @benchmark renormalize_north_edge_rotate(E_north_kagome, projectors_kagome[1][1, 1, 1], projectors_kagome[2][1, 1, 1], A_kagome)
BenchmarkTools.Trial: 88 samples with 1 evaluation per sample.
Range (min … max): 52.712 ms … 61.252 ms ┊ GC (min … max): 0.00% … 7.17%
Time (median): 58.150 ms ┊ GC (median): 7.57%
Time (mean ± σ): 56.877 ms ± 2.415 ms ┊ GC (mean ± σ): 5.28% ± 3.64%
▆▂█▃
▄▃▄▅▆▆▅▃▄▃▄▃▁▃▁▁▁▁▁▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▄▃▆████▃▄▁▃▄▁▁▃▅▁▁▁▃ ▁
52.7 ms Histogram: frequency by time 60.3 ms <
Memory estimate: 80.35 MiB, allocs estimate: 70.
julia> @benchmark renormalize_north_edge_explicit(E_north_kagome, projectors_kagome[1][1, 1, 1], projectors_kagome[2][1, 1, 1], A_kagome)
BenchmarkTools.Trial: 95 samples with 1 evaluation per sample.
Range (min … max): 49.137 ms … 55.720 ms ┊ GC (min … max): 0.00% … 8.08%
Time (median): 54.098 ms ┊ GC (median): 8.23%
Time (mean ± σ): 52.891 ms ± 2.164 ms ┊ GC (mean ± σ): 5.88% ± 3.82%
▃█▇▃▆
▄▇▇▆█▃▇▃▁▁▄▁▄▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄█████▆▃▄▃▄▃▃▃ ▁
49.1 ms Histogram: frequency by time 55.3 ms <
Memory estimate: 81.79 MiB, allocs estimate: 70.
julia> @benchmark renormalize_north_edge_rotate_explicit(E_north_kagome, projectors_kagome[1][1, 1, 1], projectors_kagome[2][1, 1, 1], A_kagome)
BenchmarkTools.Trial: 112 samples with 1 evaluation per sample.
Range (min … max): 40.868 ms … 47.124 ms ┊ GC (min … max): 0.00% … 9.73%
Time (median): 45.718 ms ┊ GC (median): 9.89%
Time (mean ± σ): 44.648 ms ± 2.002 ms ┊ GC (mean ± σ): 7.26% ± 4.49%
▂█▅▃
▃▆▆▃▄▃▄▁▃▃▁▁▃▃▁▃▁▁▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃████▃▁▅▁▁▃▃▃▃ ▃
40.9 ms Histogram: frequency by time 46.9 ms <
Memory estimate: 81.07 MiB, allocs estimate: 70.
julia>
julia>
julia> @benchmark renormalize_east_edge_autoopt(E_east_kagome, projectors_kagome[1][2, 1, 1], projectors_kagome[2][2, 1, 1], A_kagome)
BenchmarkTools.Trial: 82 samples with 1 evaluation per sample.
Range (min … max): 57.533 ms … 66.594 ms ┊ GC (min … max): 0.00% … 12.92%
Time (median): 62.341 ms ┊ GC (median): 7.14%
Time (mean ± σ): 61.258 ms ± 2.121 ms ┊ GC (mean ± σ): 4.99% ± 3.49%
▂ ▂▅▆█▂ ▃▃
▇▅▄▇▄▄█▄▄▇▁▄▁▁▄▁▁▄▁▄▁▁▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇█████▄██▄▁▄▄▁▁▁▁▄ ▁
57.5 ms Histogram: frequency by time 64 ms <
Memory estimate: 79.85 MiB, allocs estimate: 64.
julia> @benchmark renormalize_east_edge_rotate(E_east_kagome, projectors_kagome[1][2, 1, 1], projectors_kagome[2][2, 1, 1], A_kagome)
BenchmarkTools.Trial: 88 samples with 1 evaluation per sample.
Range (min … max): 52.969 ms … 61.811 ms ┊ GC (min … max): 0.00% … 12.36%
Time (median): 58.318 ms ┊ GC (median): 7.52%
Time (mean ± σ): 57.217 ms ± 2.274 ms ┊ GC (mean ± σ): 5.37% ± 3.65%
▃█▂
▄▃▁▆▄▃▃▄▅▄▄▇▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▁▆███▆▇▁▅▄▅▁▃▅▁▁▁▁▁▁▁▁▁▃ ▁
53 ms Histogram: frequency by time 61.2 ms <
Memory estimate: 80.35 MiB, allocs estimate: 70.
julia> @benchmark renormalize_east_edge_explicit(E_east_kagome, projectors_kagome[1][2, 1, 1], projectors_kagome[2][2, 1, 1], A_kagome)
BenchmarkTools.Trial: 97 samples with 1 evaluation per sample.
Range (min … max): 46.477 ms … 55.295 ms ┊ GC (min … max): 0.00% … 8.10%
Time (median): 53.226 ms ┊ GC (median): 8.40%
Time (mean ± σ): 51.701 ms ± 2.875 ms ┊ GC (mean ± σ): 6.10% ± 3.88%
▂▃▂█▃
▄▇▅▅█▄▇█▄▄▄▄▄▁▄▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅▄▅▄▁▅▅▅▁▁▁▅███████▄▄█▇▄▇ ▁
46.5 ms Histogram: frequency by time 54.9 ms <
Memory estimate: 81.79 MiB, allocs estimate: 75.
julia> @benchmark renormalize_east_edge_rotate_explicit(E_east_kagome, projectors_kagome[1][2, 1, 1], projectors_kagome[2][2, 1, 1], A_kagome)
BenchmarkTools.Trial: 109 samples with 1 evaluation per sample.
Range (min … max): 41.175 ms … 52.484 ms ┊ GC (min … max): 0.00% … 18.13%
Time (median): 46.925 ms ┊ GC (median): 10.30%
Time (mean ± σ): 45.899 ms ± 2.562 ms ┊ GC (mean ± σ): 7.56% ± 4.96%
▂ █▄ ▂▂▂▂
▅▅▃██▃▅▅▇▃▆▁▃▃▁▃▁▃▁▁▁▁▁▁▁▁▃▅▅█▆▅██▆████▇▇▃▅▃▁▁▁▁▁▁▃▁▁▁▁▁▁▁▃ ▃
41.2 ms Histogram: frequency by time 51.5 ms <
Memory estimate: 81.07 MiB, allocs estimate: 70.
julia>
julia> @benchmark renormalize_south_edge_autoopt(E_south_kagome, projectors_kagome[1][3, 1, 1], projectors_kagome[2][3, 1, 1], A_kagome)
BenchmarkTools.Trial: 82 samples with 1 evaluation per sample.
Range (min … max): 57.679 ms … 68.490 ms ┊ GC (min … max): 0.00% … 12.67%
Time (median): 62.410 ms ┊ GC (median): 7.17%
Time (mean ± σ): 61.463 ms ± 2.229 ms ┊ GC (mean ± σ): 5.08% ± 3.46%
▂ ▃ ▆█▃█▆
▅▄█▄██▇▁▁▁▁▁▁▄▁▄▁▁▁▁▄▁▁▁▁▄▁▁▁▁▁▁▁▄▅█████▇▇▅▁▁▁▁▁▁▁▄▄▁▁▄▁▁▁▄ ▁
57.7 ms Histogram: frequency by time 65.3 ms <
Memory estimate: 79.85 MiB, allocs estimate: 64.
julia> @benchmark renormalize_south_edge_rotate(E_south_kagome, projectors_kagome[1][3, 1, 1], projectors_kagome[2][3, 1, 1], A_kagome)
BenchmarkTools.Trial: 86 samples with 1 evaluation per sample.
Range (min … max): 51.854 ms … 78.354 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 58.419 ms ┊ GC (median): 8.23%
Time (mean ± σ): 58.266 ms ± 3.277 ms ┊ GC (mean ± σ): 6.08% ± 4.39%
▁ ▇█▃▁ ▁
▅▁▅▁▅▇█▇▇▁▁▁▁▅▁▅▁▇▇▇▅████▅▅█▅▇▅▅▁▅▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
51.9 ms Histogram: log(frequency) by time 69.3 ms <
Memory estimate: 80.35 MiB, allocs estimate: 70.
julia> @benchmark renormalize_south_edge_explicit(E_south_kagome, projectors_kagome[1][3, 1, 1], projectors_kagome[2][3, 1, 1], A_kagome)
BenchmarkTools.Trial: 107 samples with 1 evaluation per sample.
Range (min … max): 42.487 ms … 58.384 ms ┊ GC (min … max): 0.00% … 26.47%
Time (median): 47.465 ms ┊ GC (median): 9.49%
Time (mean ± σ): 46.956 ms ± 2.427 ms ┊ GC (mean ± σ): 7.34% ± 4.82%
▁▁▂▁ ▅█▅▂
████▅▁▅▁▁▁▅▅▁▅▁▇▁▅▁▅▅▅████▇▅█▇█▅▇▅▁▁▅▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▅
42.5 ms Histogram: log(frequency) by time 55.1 ms <
Memory estimate: 84.23 MiB, allocs estimate: 87.
julia> @benchmark renormalize_south_edge_rotate_explicit(E_south_kagome, projectors_kagome[1][3, 1, 1], projectors_kagome[2][3, 1, 1], A_kagome)
BenchmarkTools.Trial: 112 samples with 1 evaluation per sample.
Range (min … max): 40.898 ms … 47.712 ms ┊ GC (min … max): 0.00% … 9.54%
Time (median): 45.744 ms ┊ GC (median): 9.85%
Time (mean ± σ): 44.768 ms ± 2.011 ms ┊ GC (mean ± σ): 7.22% ± 4.47%
▁▆▇█
▅▅▅▆▇▅▃▄▁▁▃▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▄▃▃▁▁▁▁▁▁▁▁▁▁▆████▆▆▄▆▃▃▃▃▁▁▄▁▁▃ ▃
40.9 ms Histogram: frequency by time 47.6 ms <
Memory estimate: 81.07 MiB, allocs estimate: 70.
julia>
julia> @benchmark renormalize_west_edge_autoopt(E_west_kagome, projectors_kagome[1][4, 1, 1], projectors_kagome[2][4, 1, 1], A_kagome)
BenchmarkTools.Trial: 88 samples with 1 evaluation per sample.
Range (min … max): 52.704 ms … 61.470 ms ┊ GC (min … max): 0.00% … 7.15%
Time (median): 58.146 ms ┊ GC (median): 7.58%
Time (mean ± σ): 56.923 ms ± 2.439 ms ┊ GC (mean ± σ): 5.32% ± 3.68%
▂ ▅▇█▄
▃▃▆▅▆▆█▅█▃▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▃▁▃▁▆████▆█▅▃▆▃▁▃▁▁▃▃▁▅ ▁
52.7 ms Histogram: frequency by time 60.3 ms <
Memory estimate: 79.85 MiB, allocs estimate: 64.
julia> @benchmark renormalize_west_edge_explicit(E_west_kagome, projectors_kagome[1][4, 1, 1], projectors_kagome[2][4, 1, 1], A_kagome)
BenchmarkTools.Trial: 113 samples with 1 evaluation per sample.
Range (min … max): 40.600 ms … 47.968 ms ┊ GC (min … max): 0.00% … 9.56%
Time (median): 45.330 ms ┊ GC (median): 9.97%
Time (mean ± σ): 44.291 ms ± 2.096 ms ┊ GC (mean ± σ): 7.35% ± 4.51%
▁▆▆█▄
▅█▇▅▇▆▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▄█████▆▆▃▁▃▃▁▃▁▁▁▁▃▁▁▃ ▃
40.6 ms Histogram: frequency by time 47.6 ms <
Memory estimate: 80.57 MiB, allocs estimate: 64.Updated summary enlarger corner, the smaller the best.
Updated summary renormalize edge, the smaller the best. Since for direction west, explicit is always better than
Going from D=2 to D=16, for corner north east autoopt and explicitNE are now close similar, with explicit NE already slightly faster. For edge renormalization, the order did not change but the gap between rotate and explicit is closing, the two are now very close. With this new data, I now favor having this PR for renormalize_edge in all cases. The case enlarge corner NE is more complicated. |
5192f5a to
c3d7e4a
Compare
|
To summarize:
|
|
@leburgel it seems like something did actually go wrong with the last changes, as now the example tests seem to time out. Any idea what could be the cause? |
I actually don't think things went wrong with #261, but rather that it fixed the oversight from #246 that caused the gradient computation with the fallback linear solver to terminate long before it actually converged. This means it was just moving on with very bad gradients, which I think didn't cause any problems because this usually happens at the start of an optimization. I had a look, and in the test that timed out it got completely stuck in the first LBFGS iteration for the variational optimization of the Heisenberg model starting from the simple update result. How stuck it gets seems to vary a lot, but in the one that timed out it was particularly bad. I think if we come up with a generic procedure to 'kick' the simple update starting guess a bit before feeding it into the variational optimization this should be solved. I don't think there's a way out algorithmically, it seems both the eigsolve and linsolve methods for computing the fixed point gradient just have a lot of trouble converging. |
|
Ok, in that case I'll merge this since that is unrelated. |
This PR is a follow-up to #229 and #237. It replaces
@autooptby explicit contraction scheme in CTMRG partition function contractions. I assumed the optimal permutation was the same as for a wavefunction and reproduced the same order.I do not really understand which constraints are imposed by planar non-braiding categories, so I may be doing illegal permutations. I do not know how to check for these, I am happy to learn.