Skip to content

Commit 83d822f

Browse files
Update Readme.md
math formatting for webpage style
1 parent fe78c4a commit 83d822f

File tree

1 file changed

+21
-21
lines changed

1 file changed

+21
-21
lines changed

NLDF/Readme.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -118,11 +118,11 @@ Inside our 'main', we will initialize all our variables and create our handle.
118118
long cpuser = 0;
119119
```
120120

121-
We also define $t$ as an array of 21 points from $0.5$ to $2.5$.
121+
We also define $$t$$ as an array of 21 points from $$0.5$$ to $$2.5$$.
122122

123123
## Single-outlier example
124124

125-
To investigate the robustness aspect, here’s a toy dataset which is generated from $\sin(t)$ and has an outlier at $t=1.5$, which is generated by $5\sin(t)$.
125+
To investigate the robustness aspect, here’s a toy dataset which is generated from $$\sin(t)$$ and has an outlier at $$t=1.5$$, which is generated by $$5\sin(t)$$.
126126

127127
![toy1](images/fig1.png)
128128

@@ -172,22 +172,22 @@ private static class LSQGRD extends E04GN.Abstract_E04GN_LSQGRD {
172172
}
173173
```
174174

175-
### Start with $l_2$-norm loss function - Example 1
176-
Starting with one of the most common loss functions, the $l_2$-norm, we form the problem
175+
### Start with $$l_2$$-norm loss function - Example 1
176+
Starting with one of the most common loss functions, the $$l_2$$-norm, we form the problem
177177

178178
$$
179179
\underset{x \in \mathbb{R}^{2}}{\text{minimize}}~f(x) =\sum_{i=1}^{21} r_i(x)^2
180180
$$
181181

182-
which is just least squares regression. $l_2$-norm loss has low robustness against outliers, so we should expect that the solution will be affected heavily by this one outlier. Let’s solve from a starting point at
182+
which is just least squares regression. $$l_2$$-norm loss has low robustness against outliers, so we should expect that the solution will be affected heavily by this one outlier. Let’s solve from a starting point at
183183

184184
$$
185185
x\ =\ (2.1,1.4)
186186
$$
187187

188188
to see what this outlier does to the minimum.
189189

190-
For this Java example, we set up a function to reset $x$ variable to the starting point, since it gets passed to the solver and returns the solution.
190+
For this Java example, we set up a function to reset $$x$$ variable to the starting point, since it gets passed to the solver and returns the solution.
191191

192192
```java
193193
private static double[] init_x() {
@@ -197,7 +197,7 @@ private static double[] init_x() {
197197
```
198198
We first set up the options parameter to select the loss function and the printing options.
199199

200-
Since we already set up the handle and initialized the loss function to $l2$, we can just set our initial guess and solve.
200+
Since we already set up the handle and initialized the loss function to $$l2$$, we can just set our initial guess and solve.
201201

202202
```java
203203
ifail = 0;
@@ -222,18 +222,18 @@ And the curve this produces looks like this:
222222
223223
![L2](images/fig2.png)
224224
225-
The single outlier was able to disrupt the fit, since $l_2$-norm loss makes outliers contribute heavily to the objective function and search direction.
225+
The single outlier was able to disrupt the fit, since $$l_2$$-norm loss makes outliers contribute heavily to the objective function and search direction.
226226
227-
### Try $l_1$-norm loss function - Example 2
228-
Using $l_1$-norm loss gives us the problem
227+
### Try $$l_1$$-norm loss function - Example 2
228+
Using $$l_1$$-norm loss gives us the problem
229229
230230
$$
231231
\underset{x \in \mathbb{R}^{2}}{\text{minimize}}~f(x) =\sum_{i=1}^{21} |r_i(x)|,
232232
$$
233233
234-
which is more robust against outliers. This means if some large portion of the data is well-fitted by some solution $x^\ast$, there is likely to be a local minimum very close to $x^\ast$ which is relatively undisturbed by the remaining data that is outlying to the solution $x^\ast$. Here’s the solution, again starting at $x=(2.1,1.4)$, using $l_1$ loss.
234+
which is more robust against outliers. This means if some large portion of the data is well-fitted by some solution $$x^\ast$$, there is likely to be a local minimum very close to $x^\ast$ which is relatively undisturbed by the remaining data that is outlying to the solution $$x^\ast$$. Here’s the solution, again starting at $$x=(2.1,1.4)$$, using $$l_1$$ loss.
235235
236-
Now all we need to do is change the loss function parameter, reset $x$, and solve again.
236+
Now all we need to do is change the loss function parameter, reset $$x$$, and solve again.
237237
```java
238238
ifail = 0;
239239
x = init_x();
@@ -264,7 +264,7 @@ We can reuse the handle, the residual function (and gradient). Just changing the
264264

265265
There is a danger in choosing a very robust loss function. During an iterative optimization process, a loss function which is robust against outliers will usually prefer the data which is close to the current model. This means that if the algorithm finds local minima of the objective function, the search can fall into a local minimum when the model fits some subset of the data very well but fits the majority of the data very badly.
266266

267-
To illustrate this, here’s a new dataset which we will try to fit with the same model, again starting at $x= (2.1,1.4)$. Most of the data was generated by $5\sin(t)$, with the 3 data points at either end being generated by $\sin(t)$.
267+
To illustrate this, here’s a new dataset which we will try to fit with the same model, again starting at $$x= (2.1,1.4)$$. Most of the data was generated by $$5\sin(t)$$, with the 3 data points at either end being generated by $$\sin(t)$$.
268268

269269
![toy2](images/fig4.png)
270270

@@ -286,7 +286,7 @@ private static double[] toydata2(double [] t) {
286286
}
287287
```
288288

289-
We will fit this data set using 3 different loss functions with the same model $\varphi(t;x)$ each time and discuss the results under the plots all at once below.
289+
We will fit this data set using 3 different loss functions with the same model $$\varphi(t;x)$$ each time and discuss the results under the plots all at once below.
290290

291291
```java
292292
ifail = 0;
@@ -312,24 +312,24 @@ Here are all the curves plotted together:
312312

313313
![All](images/fig5.png)
314314

315-
In the first row of plots, the data is fitted using $l_2$-norm loss, $l_1$-norm loss, and $\arctan$ loss. Shown below each is the contour plot of the objective function value, where the black circles represent the parameters used to generate the data, the cyan circles represents the starting point for the solver, and the cyan wedges represent the optimized solution found by the solver.
315+
In the first row of plots, the data is fitted using $$l_2$$-norm loss, $$l_1$$-norm loss, and $$\arctan$$ loss. Shown below each is the contour plot of the objective function value, where the black circles represent the parameters used to generate the data, the cyan circles represents the starting point for the solver, and the cyan wedges represent the optimized solution found by the solver.
316316

317317
![Contour](images/nldf_contour.png)
318318

319-
In the $l_2$-norm case in the left column, the outliers generated by $\sin(t)$ have pulled the optimal solution away from $x = (5,1)$. The contour plot for $l_2$-norm loss indicates that we don’t have to worry too much about what starting point to use, since there are no local minima in the region displayed, other than global best solution.
319+
In the $$l_2$$-norm case in the left column, the outliers generated by $$\sin(t)$$ have pulled the optimal solution away from $$x = (5,1)$$. The contour plot for $$l_2$$-norm loss indicates that we don’t have to worry too much about what starting point to use, since there are no local minima in the region displayed, other than global best solution.
320320

321-
The behaviour of the solver is quite different when using an extremely robust loss function like $\arctan$ loss, which looks like
321+
The behaviour of the solver is quite different when using an extremely robust loss function like $$\arctan$$ loss, which looks like
322322

323323
$$
324324
\underset{x \in \mathbb{R}^{2}}{\text{minimize}} ~ f(x) =\sum_{i=1}^{21} \text{arctan}(r_i(x)^2)
325325
$$
326326

327-
The fitted model and corresponding contour plot for the $\arctan$ case are in the middle. Here, there are eight local minima in the contour plot for $\arctan$ loss, with seven of them being substantially worse solutions than the global minimum, and it is one of these we’ve converged to. Therefore, in this case the selection of initial estimation of the parameters is much more important.
327+
The fitted model and corresponding contour plot for the $$\arctan$$ case are in the middle. Here, there are eight local minima in the contour plot for $$\arctan$$ loss, with seven of them being substantially worse solutions than the global minimum, and it is one of these we’ve converged to. Therefore, in this case the selection of initial estimation of the parameters is much more important.
328328

329-
The model fitted with $l_1$-norm loss and the corresponding contour plot are in the right column. Looking at the contour plot, there are still a few local minima that do not correspond to the optimal solution, but the starting point of $x = (2.1,1.4)$ still converges to the global minimum, which lies at
330-
$x = (5,1)$, meaning the part of the dataset generated from $\sin(t)$ is effectively being ignoring. From the plots of the loss functions, we can see that $l_1$-norm loss is more robust than $l_2$-norm loss but less so than $\arctan$ loss.
329+
The model fitted with $$l_1$$-norm loss and the corresponding contour plot are in the right column. Looking at the contour plot, there are still a few local minima that do not correspond to the optimal solution, but the starting point of $$x = (2.1,1.4)$$ still converges to the global minimum, which lies at
330+
$$x = (5,1)$$, meaning the part of the dataset generated from $$\sin(t)$$ is effectively being ignoring. From the plots of the loss functions, we can see that $$l_1$$-norm loss is more robust than $$l_2$$-norm loss but less so than $$\arctan$$ loss.
331331

332-
So, what has happened in each case is: using $l_2$-norm loss, we move to the global minimum which is affected by the whole dataset. Using $l_1$-norm loss, we move to the global minimum which fits most of the data very well and ignores a small portion, treating them as outliers. Using $\arctan$ loss we move to a local minimum which ignores a large portion of the data (treating them as outliers) and fits a small amount of data very well.
332+
So, what has happened in each case is: using $$l_2$$-norm loss, we move to the global minimum which is affected by the whole dataset. Using $$l_1$$-norm loss, we move to the global minimum which fits most of the data very well and ignores a small portion, treating them as outliers. Using $$\arctan$$ loss we move to a local minimum which ignores a large portion of the data (treating them as outliers) and fits a small amount of data very well.
333333

334334
## Conclusion
335335

0 commit comments

Comments
 (0)