# Lagrange Duality and Karush-Kuhn-Tucker review for SVM

1. Lagrange Duality
Solve the optimization problem:
$\mathbf{minf_{0}(x)}\\ \mathbf{\text{such that }f_{i}(x)\leq 0,\forall i\in 1,...m}\\ \text{Consider the equivalent problem:}\\ J(x)=\left\{\begin{matrix} f_{0}(x), & \text{if }f_{i}(x)\leq 0\\ \infty , & otherwise \end{matrix}\right.=f_{0}(x)+\sum_{i}^{ }I[f_{i}(x)]\\ \text{where:}\\ I[u]=\left\{\begin{matrix} 0, & \text{if }u\leq 0\\ \infty , & otherwise \end{matrix}\right.\\ \text{We replace I[u] with }\lambda u \text{ for } \lambda \geq 0\\ \lambda u \text{ is a lower bound of I[u]}\\ \text{Here we punish the } J(x)=\infty \text{ if a constraint is dissatisfied}$
Figure: I[u] and $\lambda u$
$\text{We got new function is called the Lagrangian:}\\ L(x,\lambda )=f_{0}(x)+\sum_{i}^{ }\lambda _{i}f_{i}(x)\ \text{We have:}\\ max_{\lambda }L(x, \lambda )=J(x)\\ \text{because}\\ \text{if }f_{i}(x)\leq 0,\forall i \text{ we can choose }\lambda _{i}=0,\forall i\\ \text{if }f_{i}(x)\geq 0,\text{ for some } i \text{ we can choose }\lambda _{i}=\infty,\text{ for some } i\\ \text{Our problem becomes}\\ min_{x}max_{\lambda }L(x,\lambda )\\ \text{Consider the reversed order of minimization}\\ max_{\lambda }min_{x}L(x,\lambda )=max_{\lambda }g(\lambda )\\ \text{where }g(\lambda ) = min_{x}L(x,\lambda )\\ g(\lambda )\text{ is called dual function}\\ \mathbf{\text{Maximize }g(\lambda )=max_{\lambda }min_{x}L(x,\lambda )\text{ is called dual problem}}\\ \mathbf{\text{While }min_{x}max_{\lambda }L(x,\lambda )\text{ is called primal problem}}\\ \lambda u\text{ is lower bound of I[u] for all }\lambda \geq 0\\ \Rightarrow L(x,\lambda )\leq J(x)\\ \Rightarrow min_{x}L(x,\lambda )=g(\lambda )\leq min_{x}J(x)=p^{*}\\ \Rightarrow d^{*}=max_{\lambda }g(\lambda ) \leq p^{*}\\ p^{*}, d^{*} \text{ are the optimal of the primal and dual problems}\\ \Rightarrow g(\lambda ) \text{ gives lower bound of optimal problem}\\ \mathbf{\Rightarrow max_{\lambda }min_{x}L(x,\lambda )\leq min_{x}max_{\lambda }L(x,\lambda )}\\ \mathbf{\text{This is called weak duality}}\\ \mathbf{p^{*}-d^{*} \text{ is called optimal duality gap}}\\ \mathbf{p^{*}=d^{*} \text{ is called strong duality}}\\ \text{And the solution of primal and dual is equivalent}$
2. Karush-Kuhn-Tucker (KKT) conditions
$\text{If strong duality holds and }(x^{*},\lambda ^{*}) \text{ is optimal then }x^{*}\text{ minimizes }L(x,\lambda ^{*})\\ \text{ We have first KKT condition:}\\ \mathbf{\bigtriangledown _{x}L(x^{*}, \lambda^{*})=\bigtriangledown _{x}f_{0}(x^{*})+\sum_{i}^{ }\lambda _{i}^{*}\bigtriangledown _{x}f_{i}(x^{*})=0}\\ \Rightarrow \text{gradients must be parallel}\\ \text{Moreover, we have:}\\ f_{0}(x^{*})=g(\lambda ^{*})=min_{x}L(x, \lambda ^{*})\leq f_{0}(x^{*})+\sum_{i}^{}\lambda _{i}^{*}f_{i}(x^{*})\leq f_{0}(x^{*})\\ \text{because}\\ g(\lambda ^{*})=max_{\lambda }min_{x}L(x,\lambda )=min_{x}max_{\lambda }L(x,\lambda )=f_{0}(x^{*})\\ f_{0}(x^{*}) = f_{0}(x^{*})+\sum_{i}^{ }I[f_{i}(x^{*})]\\ \Rightarrow \sum_{i}^{ }\lambda _{i}^{*}f_{i}(x^{*})=0\\\text{Since }\lambda _{i}^{*}\geq 0 \text{ and }f_{i}(x^{*})\leq 0,\forall i\\ \text{We have second KKT condition:}\\ \mathbf{\lambda _{i}^{*}f_{i}(x^{*})=0}\\ \text{This condition is called complementary slackness}\\ \text{If }\lambda _{i}^{*}=0 \text{ then }f_{i}(x^{*})\leq 0 \text{ this constraint is not active}\\ \text{If }\lambda _{i}^{*}\geq \text{ then }f_{i}(x^{*})=0\text{ this constraint is active}\\ \text{In both cases the lower bound }\lambda u \text{ and I[u] are tight}\\ \text{The last conditions are primal and dual constraints}\\ \mathbf{\lambda _{i}^{*} \geq 0}$
3. Example
Example 1:
$\text{Minimize } f(x_{1},x_{2})=x_{1}^{2}+x_{2}^{2}-14x_{1}-6x_{2}\\ \text{Subject to: }\\ g_{1}(x_{1}, x_{2})=x_{1}+x_{2}-2\leq 0\\ g_{2}(x_{1}, x_{2})=x_{1}+2x_{2}-3\leq 0$
Solution:
The Lagrangian:
$L(x_{1}, x_{2},\lambda _{1},\lambda _{2})=x_{1}^{2}+x_{2}^{2}-14x_{1}-6x_{2}+\lambda _{1}(x_{1}+x_{2}-2)+\lambda _{2}(x_{1}+2x_{2}-3)$
Apply KKT conditions:
$\text{KKT 1}\\ 2x_{1}-14+\lambda _{1}+\lambda _{2}=0\\ 2x_{2}-6+\lambda _{1}+2\lambda _{2}=0\\ \text{KKT 2}\\ \lambda _{1}(x_{1}+x_{2}-2)=0\\ \lambda _{2}(x_{1}+2x_{2}-3)=0\\ \text{KKT 3}\\ \lambda _{1}\geq 0,\lambda _{2}\geq 0$
There are 4 cases to check:
$\mathbf{\lambda _{1}=0\text{ and }\lambda _{2}=0}\\ \Rightarrow x_{1}=7,x_{2}=3,g_{1}=8,g_{2}=10 (g_{1},g_{2}\text{ are not satified})\\ \mathbf{\lambda _{1}=0\text{ and }\lambda _{2}>0}\\ \Rightarrow x_{1}=5,x_{2}=-1,\lambda _{2}=4, g_{1}=2,g_{2}=0 (g_{1}\text{ is not satified})\\ \mathbf{\lambda _{1}>0\text{ and }\lambda _{2}=0}\\ \Rightarrow x_{1}=3,x_{2}=-1,\lambda _{1}=8, g_{1}=0,g_{2}=-2,f=-26 (\text{optimal})\\ \mathbf{\lambda _{1}>0\text{ and }\lambda _{2}>0}\\ \Rightarrow x_{1}=1,x_{2}=1,\lambda _{1}=20,\lambda _{1}=-8 (\lambda_{2}\text{ is not satified})$
Example 2:
$\text{Maximize }f=x^{3}-3x\\ \text{Subject to }g=x\leq 2$
The Lagrangian:
$L=-(x^{3}-3x)+\lambda (x-2)\\ \text{We convert maximizing problem to minimizing problem by add '-'}$
Apply KKT conditions:
$-(3x^{2}-3)+\lambda =0\\ \lambda (x-2)=0)\\ \lambda \geq 0$
There are 2 cases to check:
$\mathbf{\lambda = 0}\\ \Rightarrow x=1 \text{ or }x=-1\text{ and }f(1)=-2\text{ or }f(-1)=2\text{(both are satisfied)}\\ \mathbf{\lambda \geq 0}\\ \Rightarrow x=2,\lambda =9 \text{ and } f(2)=2\text{(satisfied)}\\ \text{So we have 2 solutions }x=-1\text{ and }x=2$