Saturday, 24 March 2018

Lagrange Duality and Karush-Kuhn-Tucker review for SVM

1. Lagrange Duality
Solve the optimization problem:
$
\mathbf{minf_{0}(x)}\\
\mathbf{\text{such that }f_{i}(x)\leq 0,\forall i\in 1,...m}\\
\text{Consider the equivalent problem:}\\
J(x)=\left\{\begin{matrix}
f_{0}(x), & \text{if }f_{i}(x)\leq 0\\
\infty , & otherwise
\end{matrix}\right.=f_{0}(x)+\sum_{i}^{ }I[f_{i}(x)]\\
\text{where:}\\
I[u]=\left\{\begin{matrix}
0, & \text{if }u\leq 0\\
\infty , & otherwise
\end{matrix}\right.\\
\text{We replace I[u] with }\lambda u \text{ for } \lambda \geq 0\\
\lambda u \text{ is a lower bound of I[u]}\\
\text{Here we punish the } J(x)=\infty \text{ if a constraint is dissatisfied}
$
Figure: I[u] and $\lambda u$
$
\text{We got new function is called the Lagrangian:}\\
L(x,\lambda )=f_{0}(x)+\sum_{i}^{ }\lambda _{i}f_{i}(x)\
\text{We have:}\\
max_{\lambda }L(x, \lambda )=J(x)\\
\text{because}\\
\text{if }f_{i}(x)\leq 0,\forall i \text{ we can choose }\lambda _{i}=0,\forall i\\
\text{if }f_{i}(x)\geq 0,\text{ for some } i \text{ we can choose }\lambda _{i}=\infty,\text{ for some } i\\
\text{Our problem becomes}\\
min_{x}max_{\lambda }L(x,\lambda )\\
\text{Consider the reversed order of minimization}\\
max_{\lambda }min_{x}L(x,\lambda )=max_{\lambda }g(\lambda )\\
\text{where }g(\lambda ) = min_{x}L(x,\lambda )\\
g(\lambda )\text{ is called dual function}\\
\mathbf{\text{Maximize }g(\lambda )=max_{\lambda }min_{x}L(x,\lambda )\text{ is called dual problem}}\\
\mathbf{\text{While }min_{x}max_{\lambda }L(x,\lambda )\text{ is called primal problem}}\\
\lambda u\text{ is lower bound of I[u] for all }\lambda \geq 0\\
\Rightarrow L(x,\lambda )\leq J(x)\\
\Rightarrow min_{x}L(x,\lambda )=g(\lambda )\leq min_{x}J(x)=p^{*}\\
\Rightarrow d^{*}=max_{\lambda }g(\lambda ) \leq p^{*}\\
p^{*}, d^{*} \text{ are the optimal of the primal and dual problems}\\
\Rightarrow g(\lambda ) \text{ gives lower bound of optimal problem}\\
\mathbf{\Rightarrow max_{\lambda }min_{x}L(x,\lambda )\leq min_{x}max_{\lambda }L(x,\lambda )}\\
\mathbf{\text{This is called weak duality}}\\
\mathbf{p^{*}-d^{*} \text{ is called optimal duality gap}}\\
\mathbf{p^{*}=d^{*} \text{ is called strong duality}}\\
\text{And the solution of primal and dual is equivalent}$
2. Karush-Kuhn-Tucker (KKT) conditions
$\text{If strong duality holds and }(x^{*},\lambda ^{*}) \text{ is optimal then }x^{*}\text{ minimizes }L(x,\lambda ^{*})\\
\text{ We have first KKT condition:}\\
\mathbf{\bigtriangledown _{x}L(x^{*}, \lambda^{*})=\bigtriangledown _{x}f_{0}(x^{*})+\sum_{i}^{ }\lambda _{i}^{*}\bigtriangledown _{x}f_{i}(x^{*})=0}\\
\Rightarrow \text{gradients must be parallel}\\
\text{Moreover, we have:}\\
f_{0}(x^{*})=g(\lambda ^{*})=min_{x}L(x, \lambda ^{*})\leq f_{0}(x^{*})+\sum_{i}^{}\lambda _{i}^{*}f_{i}(x^{*})\leq f_{0}(x^{*})\\
\text{because}\\
g(\lambda ^{*})=max_{\lambda }min_{x}L(x,\lambda )=min_{x}max_{\lambda }L(x,\lambda )=f_{0}(x^{*})\\
f_{0}(x^{*}) = f_{0}(x^{*})+\sum_{i}^{ }I[f_{i}(x^{*})]\\
\Rightarrow \sum_{i}^{ }\lambda _{i}^{*}f_{i}(x^{*})=0\\\text{Since }\lambda _{i}^{*}\geq 0 \text{ and }f_{i}(x^{*})\leq 0,\forall i\\
\text{We have second KKT condition:}\\
\mathbf{\lambda _{i}^{*}f_{i}(x^{*})=0}\\
\text{This condition is called complementary slackness}\\
\text{If }\lambda _{i}^{*}=0 \text{ then }f_{i}(x^{*})\leq 0 \text{ this constraint is not active}\\
\text{If }\lambda _{i}^{*}\geq \text{ then }f_{i}(x^{*})=0\text{ this constraint is active}\\
\text{In both cases the lower bound }\lambda u \text{ and I[u] are tight}\\
\text{The last conditions are primal and dual constraints}\\
\mathbf{\lambda _{i}^{*} \geq 0}$
3. Example
Example 1:
$\text{Minimize } f(x_{1},x_{2})=x_{1}^{2}+x_{2}^{2}-14x_{1}-6x_{2}\\
\text{Subject to: }\\
g_{1}(x_{1}, x_{2})=x_{1}+x_{2}-2\leq 0\\
g_{2}(x_{1}, x_{2})=x_{1}+2x_{2}-3\leq 0$
Solution:
The Lagrangian:
$
L(x_{1}, x_{2},\lambda _{1},\lambda _{2})=x_{1}^{2}+x_{2}^{2}-14x_{1}-6x_{2}+\lambda _{1}(x_{1}+x_{2}-2)+\lambda _{2}(x_{1}+2x_{2}-3)$
Apply KKT conditions:
$\text{KKT 1}\\
2x_{1}-14+\lambda _{1}+\lambda _{2}=0\\
2x_{2}-6+\lambda _{1}+2\lambda _{2}=0\\
\text{KKT 2}\\
\lambda _{1}(x_{1}+x_{2}-2)=0\\
\lambda _{2}(x_{1}+2x_{2}-3)=0\\
\text{KKT 3}\\
\lambda _{1}\geq 0,\lambda _{2}\geq 0
$
There are 4 cases to check:
$
\mathbf{\lambda _{1}=0\text{ and }\lambda _{2}=0}\\
\Rightarrow x_{1}=7,x_{2}=3,g_{1}=8,g_{2}=10 (g_{1},g_{2}\text{ are not satified})\\
\mathbf{\lambda _{1}=0\text{ and }\lambda _{2}>0}\\
\Rightarrow x_{1}=5,x_{2}=-1,\lambda _{2}=4, g_{1}=2,g_{2}=0 (g_{1}\text{ is not satified})\\
\mathbf{\lambda _{1}>0\text{ and }\lambda _{2}=0}\\
\Rightarrow x_{1}=3,x_{2}=-1,\lambda _{1}=8, g_{1}=0,g_{2}=-2,f=-26 (\text{optimal})\\
\mathbf{\lambda _{1}>0\text{ and }\lambda _{2}>0}\\
\Rightarrow x_{1}=1,x_{2}=1,\lambda _{1}=20,\lambda _{1}=-8 (\lambda_{2}\text{ is not satified})$
Example 2:
$\text{Maximize }f=x^{3}-3x\\
\text{Subject to }g=x\leq 2$
The Lagrangian:
$L=-(x^{3}-3x)+\lambda (x-2)\\
\text{We convert maximizing problem to minimizing problem by add '-'}$
Apply KKT conditions:
$
-(3x^{2}-3)+\lambda =0\\
\lambda (x-2)=0)\\
\lambda \geq 0
$
There are 2 cases to check:
$
\mathbf{\lambda = 0}\\
\Rightarrow x=1 \text{ or }x=-1\text{ and }f(1)=-2\text{ or }f(-1)=2\text{(both are satisfied)}\\
\mathbf{\lambda \geq  0}\\
\Rightarrow x=2,\lambda =9 \text{ and } f(2)=2\text{(satisfied)}\\
\text{So we have 2 solutions }x=-1\text{ and }x=2
$

0 comments:

Post a Comment