The issue was originally filed as a convergence failure of SYEVD in Jax. It can be traced to a convergence failure of DLAED4. Here is a minimal standalone reproducer.
// main.cpp
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <vector>
#include <iostream>
#include <iomanip>
extern "C" {
void dlaed4_(const int* N, const int* I,
const double* D, const double* Z,
double* DELTA, const double* RHO,
double* DLAM, int* INFO);
}
int main(int argc, char** argv) {
const int n = 29;
const int i = 23;
const double rho = 6.2964808955495258e-05;
const double d[n] = {
-8.9907411945275305e-17, 3.1527281176230556e-06, 7.8104209108493992e-06, 1.4528378412105169e-05, 1.6571840586254615e-05, 1.6791959834500457e-05, 1.6951003434697501e-05, 1.6961635049587819e-05, 1.6969218460422806e-05, 1.7028037406879576e-05, 1.9135232081591496e-05, 2.1876238530929982e-05, 2.1920146416725336e-05, 3.9357203097119847e-05, 1.0557423781645928e-04, 1.3686127039388258e-04, 1.3686300570614894e-04, 1.3686313427158908e-04, 1.3686314099616321e-04, 1.3686314127052412e-04, 1.3686314210337733e-04, 1.3686314231692531e-04, 1.3686314257213152e-04, 5.3532577970471340e-02, 9.9996866934433004e-01, 1.0000000849390769e+00, 1.0000000901678945e+00, 1.0000000918404301e+00, 1.0000001095998177e+00
};
const double z[n] = {
1.6700411548440888e-07, 2.2704845744451106e-06, -6.8596999842130412e-06, -1.9813464524908256e-04, 3.4676472662621390e-03, 5.8131816598176476e-02, -2.4389840480820292e-03, 2.2571085689296944e-03, -2.0150012240231657e-03, -9.7432047705138750e-03, 7.3184880793391433e-04, -2.9319792346511997e-02, 8.3364827297377855e-07, -8.9722126354645391e-09, 7.0412738449665979e-01, -1.4042299148200350e-06, 3.8731670039916887e-06, -5.4330381400333468e-07, 4.4026694557354993e-07, 2.2528424563293299e-07, 3.6064772311683760e-06, 2.8472995783701232e-07, 8.3086036055888118e-07, 2.8464273601162125e-02, -7.0642255076744132e-01, -5.2009238883566685e-08, 8.6348330718758654e-08, -2.9664496252475970e-08, 1.4932776629316714e-09
};
std::vector<double> delta(n, 0.0);
double dlam = 0.0;
int info = 0;
dlaed4_(&n, &i, d, z, delta.data(), &rho, &dlam, &info);
std::cout << std::setprecision(17) << std::scientific;
std::cout << "info = " << info << "\n";
std::cout << "dlam = " << dlam << "\n";
return (info == 0) ? 0 : 1;
}
The code returns info = 1, which means not converged. The maximum number of iterations in DLAED4 is 30; the algorithm requires 31 iterations.
The eigenvalue to be found is very close to the left pole. There are several iterations where the Newton-like step overshoots and a fallback is taken. This happens for the first time in iteration 5. The step taken by the fallback turns out to be a bad step. The objective function value (W in the table) grows a lot and the remaining iterations do not suffice to correct this. Recall that this is a root finder, so W should go to zero.
| iter |
W |
relative position |
| 3 |
-15.183206754205958 |
9.3224424664962507E-012 |
| 4 |
-7.2541345644250672 |
2.8321247865713130E-011 |
| 5 |
15845.035200581398 |
0.25000000001416062 |
| 6 |
7937.0973190230861 |
5.8695311144112583E-004 |
| 7 |
5292.0409126050627 |
2.9347656988118687E-004 |
| 8 |
2646.0775072154233 |
1.1736977032438319E-004 |
| 9 |
1443.4324592139662 |
5.8684899322815527E-005 |
| 10 |
721.72516385211111 |
2.7943564578650639E-005 |
| 11 |
369.27924724566680 |
1.3971796449949251E-005 |
| 12 |
184.64794882583919 |
6.9033154364403488E-006 |
| 13 |
92.881305918680368 |
3.4516718788441071E-006 |
| 14 |
46.448843381611326 |
1.7204602732982808E-006 |
| 15 |
-2.9323820134484166 |
8.3977374357369141E-011 |
| 16 |
-1.4556166910784307 |
1.8246643456796507E-010 |
| 17 |
23.277161269594469 |
8.6032136986642434E-007 |
| 18 |
11.646269558195501 |
4.2954626252879772E-007 |
| 19 |
-0.91250965489136449 |
3.0284357148388565E-010 |
| 20 |
-0.45016815654621373 |
6.3928690716673472E-010 |
| 21 |
5.8486784388543924 |
2.1509277471798222E-007 |
| 22 |
2.9300774463199879 |
1.0726528303886306E-007 |
| 23 |
1.4829047906270059 |
5.3952284973014894E-008 |
| 24 |
0.73997721204365996 |
2.6852690757367693E-008 |
| 25 |
-0.35749940737873870 |
8.0525079151536220E-010 |
| 26 |
-0.15628904239033362 |
1.6119748312661070E-009 |
| 27 |
7.0768362172978744E-002 |
4.8489293641088358E-009 |
| 28 |
1.1908566236531963E-002 |
3.6314105287554539E-009 |
| 29 |
-3.2274931252138529E-004 |
3.4136194909394072E-009 |
| 30 |
-2.3743868398512502E-007 |
3.4192031981775864E-009 |
The root cause of the slow convergence is the fallback to bisection.
|
TEMP = TAU + ETA |
|
IF( TEMP.GT.DLTUB .OR. TEMP.LT.DLTLB ) THEN |
|
IF( W.LT.ZERO ) THEN |
|
ETA = ( DLTUB-TAU ) / TWO |
|
ELSE |
|
ETA = ( DLTLB-TAU ) / TWO |
|
END IF |
|
END IF |
As we see by the relative position in iteration 5, bisection is really bad in this case and moves the candidate point far away from the root.
I would like to understand why bisection was chosen as the fallback.
- LAPACK 2.0 had a different update (I did not try to decipher what exactly it does) and goes into a cyclic infinite loop for this case.
- LAPACK 3.0 introduced bisection as fallback. This part has not been changed since.
Are there any release notes for LAPACK 3.0? Was there any other kind of dissemination, email lists etc, where I could find a hint?
The issue was originally filed as a convergence failure of SYEVD in Jax. It can be traced to a convergence failure of DLAED4. Here is a minimal standalone reproducer.
The code returns
info = 1, which means not converged. The maximum number of iterations in DLAED4 is 30; the algorithm requires 31 iterations.The eigenvalue to be found is very close to the left pole. There are several iterations where the Newton-like step overshoots and a fallback is taken. This happens for the first time in iteration 5. The step taken by the fallback turns out to be a bad step. The objective function value (
Win the table) grows a lot and the remaining iterations do not suffice to correct this. Recall that this is a root finder, soWshould go to zero.The root cause of the slow convergence is the fallback to bisection.
lapack/SRC/dlaed4.f
Lines 849 to 856 in 06f5ba3
As we see by the relative position in iteration 5, bisection is really bad in this case and moves the candidate point far away from the root.
I would like to understand why bisection was chosen as the fallback.
Are there any release notes for LAPACK 3.0? Was there any other kind of dissemination, email lists etc, where I could find a hint?