next up previous contents
Next: The final choice of Up: A short review of Previous: Non-Linear constraints   Contents

Subsections

Primal-dual interior point

The material of this section is based on the following references: [NW99,BV04,CGT00f].

Duality

Consider an optimization problem in this form:

$\displaystyle \min f(x) \; \;$    Subject to: $\displaystyle \; \; c_i(x) \geq 0, \; i=1,\ldots,m$ (8.9)

We assume its domain is $ \displaystyle {\cal
D}= \left( \bigcap_{i=1}^m \text{{\bf dom}} c_i(x) \right) \bigcap
\left( \text{{\bf dom}} f(x) \right)$. We define the Lagrangian $ \L : \Re^n \times$   $ \mbox{$\cal R$}$$ ^m \rightarrow \Re$ associated with the problem 8.9 (see section 13.3 about the Lagrangian function):

$\displaystyle \L (x,\lambda)=f(x)-
 \sum_{i=1}^m \lambda_i c_i(x)$ (8.10)

We refer to $ \lambda_i$ as the Lagrange multiplier or dual variables associated with the $ i^{th}$ inequality constraint $ c_i(x) \geq 0$. We define the Lagrange dual function (or just dual function) $ g:
\Re^m \rightarrow \Re$ as the minimum value of the Lagrangian over $ x$:

$\displaystyle g(\lambda)= \inf_{x \in {\cal D}} \L (x,\lambda)$ (8.11)

Lower Bounds on optimal value

The dual function yields lower bounds on the optimal value $ f(x^*)$ of the problem 8.9: for any $ \lambda >
0$, we have

$\displaystyle g(\lambda) \leq f(x^*)$ (8.12)

The proof follow. Suppose $ \bar{x}$ is a feasible point of the problem 8.9. Then we have:

$\displaystyle \sum_{i=1}^m \lambda_i
 c_i(\bar{x}) \geq 0$ (8.13)

since each term is non-negative. And therefore:

$\displaystyle \L (\bar{x},\lambda)=f(\bar{x})-\sum_{i=1}^m
 \lambda_i c_i(\bar{x}) \leq f(\bar{x})$ (8.14)

Hence

$\displaystyle g(\lambda)=\inf_{x \in {\cal D}} \L (x,\lambda) \leq
 \L (\bar{x},\lambda) \leq f(\bar{x})$ (8.15)

Since $ g(\lambda)\leq
f(\bar{x})$ holds for every feasible point $ \bar{x}$, the inequality 8.12 follows.
When the feasible domain is convex and when the objective function is convex, the problem is said to be convex. In this case, we have $ \max_{\lambda} g(\lambda) = \min_x f(x) $. The proof will be skipped. When the problem is not convex, we will define the Duality gap $ \displaystyle = \min_x f(x) - \max_{\lambda}
g(\lambda) \geq 0$.

The Lagrange dual problem of a linear problem in inequality form

Lets calculate the Lagrange dual of an inequality form LP:

$\displaystyle \min c^t x \; \;$    Subject to: $\displaystyle \; \; A x \leq b
 \Leftrightarrow b-Ax \geq 0$ (8.16)

The Lagrangian is

$\displaystyle \L (x,\lambda)=c^t x - \lambda^t (b - Ax)$ (8.17)

So the dual function is

$\displaystyle g(\lambda)= \inf_{x} \L (x,\lambda) = -b^t \lambda
 + \inf_x (A^t \lambda+c)^t x$ (8.18)

The infinitum of a linear function is $ -\infty$, except in the special case when its identically zero, so the dual function is:

$\displaystyle g(\lambda)=
 \begin{cases}-b^t \lambda \; & A^t \lambda +c =0
 \\  -\infty & otherwise. \end{cases}$ (8.19)

The dual variable $ \lambda $ is dual feasible if $ \lambda >
0$ and $ A^t \lambda + c =
0$. The Lagrange dual of the LP 8.16 is to maximize $ g$ over all $ \lambda >
0$. We can reformulate this by explicitly including the dual feasibility conditions as constraints, as in:

$\displaystyle \max b^t \lambda \; \;$    Subject to: $\displaystyle \; \;
 \begin{cases}A^t \lambda +c =0 \\  \lambda \geq 0 \end{cases}$ (8.20)

which is an LP in standard form.
Since the feasible domain is convex and the objective function is also convex, we have a convex problem. The solution of this dual problem is thus equal to the solution of the primal problem.

A primal-dual Algorithm

Consider an optimization problem in this form:

$\displaystyle \min c^t x \; \;$    Subject to: $\displaystyle \; \; Ax=b, x \geq 0,$    

where $ c, x \in \Re^n, b \in \Re^m, A \in \Re^{m \times n}$. We will now compute the dual problem:

$\displaystyle \L (x)=$ $\displaystyle c^t x-\lambda^t (Ax-b)-s^t x$    
$\displaystyle g(\lambda,s)=$ $\displaystyle \inf_{x \in {\cal D}} \L (x,\lambda)$    

Since the problem is convex, we have ( $ \lambda \in \Re^m$ are the Lagrange multipliers associated to the constraints $ Ax=b$ and $ s \in \Re^n$ are the Lagrange multipliers associated to the constraints $ x \geq 0$):


$\displaystyle =$ $\displaystyle \inf_{x \in {\cal D}} x^t (c-A^t \lambda-s) + b^t \lambda
 \min_x c^t
 \;$    Subject to: $\displaystyle \begin{array}{c} Ax=b \\ x \geq 0 \end{array} =$ $\displaystyle \max_{\lambda,s}
 g(\lambda,s) \nonumber$    
$\displaystyle =$ $\displaystyle \max_{\lambda} b^t \lambda \; \;$    Subject to: $\displaystyle \begin{array}{c} A^t \lambda + s=c \\ s \geq 0,
 \end{array}$    

The associated KKT conditions are:
$\displaystyle A^t \lambda +s$ $\displaystyle =$ $\displaystyle c,$ (8.21)
$\displaystyle Ax$ $\displaystyle =$ $\displaystyle b,$ (8.22)
$\displaystyle x_i s_i$ $\displaystyle =$ $\displaystyle 0, \; \; i=1,\ldots,n \;\;$ (8.23)
    $\displaystyle \hspace{1.5cm} ($the complementarity condition for the constraint $\displaystyle x \geq 0)$  
$\displaystyle (x,s)$ $\displaystyle \geq$ 0 (8.24)

Primal dual methods find solutions $ (x^*,\lambda^*,s^*)$ of this system by applying variants of Newton's method (see section 13.6 for Newton's method for non-linear equations) to the three equalities 8.21-8.23 and modifying the search direction and step length so that inequalities $ (x,s)
\geq 0$ are satisfied strictly at every iteration. The nonnegativity condition is the source of all the complications in the design and analysis of interior-point methods. Let's rewrite equations 8.21-8.24 in a slightly different form:
\begin{displaymath}F(x,\lambda,s)= \left[
\begin{array}{c} A^t \lambda +s -c \\ Ax-b \\ X S e \end{array}\right]\end{displaymath} $\displaystyle =$ 0 (8.25)
$\displaystyle (x,s)$ $\displaystyle \geq$ 0 (8.26)

where $ X=diag(x_1,\ldots,x_n),
S=diag(s_1,\ldots,s_n)$ and $ e=(1,\ldots,1)^t$. Primal dual methods generate iterates $ (x^k, \lambda^k, s^k)$ that satisfy the bounds 8.26 strictly. This property is the origin of the term interior-point.
As mentioned above, the search direction procedure has its origin in Newton's method for the set of nonlinear equations 8.25 (see section 13.6 for Newton's method for non-linear equations). We obtain the search direction $ (\Delta
x,\Delta \lambda, \Delta s)$ by solving the following system of linear equations:

$\displaystyle J(x,\lambda, s) \left[ \begin{array}{l} \Delta x \\
 \Delta \lambda \\  \Delta s \end{array} \right]= -F(x,\lambda,s)$ (8.27)

where $ J$ is the Jacobian of $ F$. If the current point is strictly feasible, we have:

$\displaystyle \left[
 \begin{array}{ccc} 0 & A^t & I \\  A & 0 & 0 \\  S & 0 & ...
...rray} \right]=\left[
 \begin{array}{c} 0 \\
 0 \\  -X S e \end{array} \right]$ (8.28)

A full step along this direction is not permissible, since it would violate the bound $ (x,s)>0$. To avoid this difficulty, we perform a line search along the Newton direction so that the new iterate is

$\displaystyle (x,\lambda,s)+\alpha (\Delta x , \Delta \lambda , \Delta s)$ (8.29)

for some line search parameter $ \alpha \in (0,1]$. Unfortunately, we often can only take a small step along this direction (= the affine scaling direction). To overcome this difficult, primal-dual methods modify the basic Newton procedure in two important ways:
  1. They bias the search direction toward the interior of the nonnegative orthant $ (x,s)
\geq 0$, so that we can move further along this direction before one components of $ (x,s)$ becomes negative.
  2. They keep the components of $ (x,s)$ from moving "too close" to the boundary of the nonnegative orthant.
We will consider these two modifications in turn in the next subsections.

Central path

The central path $ \cal C$ is an arc of strictly feasible points that is parameterized by a scalar $ \tau >0$. We can define the central path as

$\displaystyle {\cal C}= \{ (x_\tau,\lambda_\tau,s_\tau) \vert
 \tau >0 \}$ (8.30)

Where each point $ (x_\tau,\lambda_\tau,s_\tau) \in
{\cal C}$ solves the following system, which is a perturbation of the original system 8.25
\begin{displaymath}F(x,\lambda,s)= \left[
\begin{array}{c} A^t \lambda +s -c \\ Ax-b \\ X S e - \tau \end{array}\right]\end{displaymath} $\displaystyle =$ 0 (8.31)
$\displaystyle (x,s)$ $\displaystyle \geq$ 0 (8.32)

A plot of $ \cal C$ for a typical problem, projected into the space of primal variables $ x$, is shown in figure 8.5.

Figure 8.5: Central Path
\begin{figure}
\centering\epsfig{figure=figures/centralpath.eps, width=9cm,
height=5cm}
\end{figure}

The equation 8.31 approximate 8.25 more and more closely as $ \tau$ goes to zero. Primal-Dual algorithms take Newton's steps toward points on $ {\cal C}$ for which $ \tau >0$. Since these steps are biased toward the interior of the nonnegative orthant $ (x,s)
\geq 0$, it is usually possible to take longer steps along them than along pure Newton's steps for $ F$, before violating the positivity condition. To describe the biased search direction, we introduce a duality measure $ \mu$ defined by

$\displaystyle \mu=\frac{1}{n}\sum_{i=1}^n x_i s_i= \frac{x^t
 s}{n}$ (8.33)

which measure the average value of the pairwise products $ x_i s_i$. We also define a centering parameter $ \displaystyle \sigma \in [0,1] = \frac{\tau}{\mu}$. Applying Newton's method to the system 8.31, we obtain:

$\displaystyle \left[
 \begin{array}{ccc} 0 & A^t & I \\  A & 0 & 0 \\  S & 0 & ...
...left[
 \begin{array}{c} 0 \\
 0 \\  -X S e + \sigma \mu e \end{array} \right]$ (8.34)

If $ \sigma=1$, the equations 8.34 define a centering direction, a Newton step toward the point $ (x_\mu,\lambda_\mu,s_\mu) \in {\cal C}$. Centering direction are usually biased strongly toward the interior of the nonnegative orthant and make little progress in reducing the duality measure $ \mu$. However, by moving closer to $ \cal C$, they set the scene for substantial progress on the next iterations. At the other extreme, the value $ \sigma=0$ gives the standard Newton's step.
The choice of centering parameter $ \sigma_k$ and step length $ \alpha_k$ are crucial to the performance of the method. Techniques for controlling these parameters, directly and indirectly, give rise to a wide variety of methods with varying theoretical properties.

Link between Barrier method and Interior point method

In this section we want to find the solution of:

$\displaystyle f(x^*)= \min_x f(x) \; \;$    Subject to: $\displaystyle c_i(x) \geq 0 , \; i=1,\ldots,m$ (8.35)

There exist a value $ \lambda^* \in \Re^m$ of the dual variable and a value $ x^* \in \Re^n$ of the primal variable which satisfy the KKT conditions:
$\displaystyle \nabla f(x^*) - \sum_{i=1}^m \lambda_i^* \nabla c_i(x^*)$ $\displaystyle =$ 0 (8.36)
$\displaystyle c_i(x^*)$ $\displaystyle \geq$ 0 (8.37)
$\displaystyle \lambda_i^* c_i(x^*)$ $\displaystyle =$ 0 (8.38)
$\displaystyle \lambda^*$ $\displaystyle \geq$ 0 (8.39)

Equations 8.36-8.39 are comparable to equations 8.21-8.24. There are the base equations for the primal-dual iterations. In the previous section, we motivated the following perturbation of these equations:
$\displaystyle \nabla f(x^*) + \sum_{i=1}^m \lambda_i^* \nabla c_i(x^*)$ $\displaystyle =$ 0 (8.40)
$\displaystyle c_i(x)$ $\displaystyle \geq$ 0 (8.41)
$\displaystyle \lambda_i^* c_i(x^*)$ $\displaystyle =$ $\displaystyle \tau$ (8.42)
$\displaystyle \lambda^*$ $\displaystyle \geq$ 0 (8.43)

Let's now consider the following equivalent optimization problem (barrier problem):

$\displaystyle \varphi_t(x) := f(x) - t \sum_{i=1}^m ln
 (c_i(x)) \;$    and $\displaystyle t\rightarrow 0$ (8.44)

For a given $ t$, at the optimum $ x^*$ of equation 8.44, we have:

$\displaystyle 0=\nabla \varphi_t(x^*)= \nabla f(x^*)- \sum_{i=1}^m
 \frac{t }{c_i(x^*)} \nabla c_i(x^*)$ (8.45)

If we define

$\displaystyle \lambda_i^*=\frac{t}{c_i(x^*)}$ (8.46)

We will now proof that $ \lambda^*$ is dual feasible. First it's clear that $ \lambda^* \geq 0$ because $ c_i(x^*) \geq 0$ because $ x^*$ is inside the feasible domain. We now have to proof that $ g(\lambda^*)=f(x^*)$. Let's compute the value of the dual function of 8.35 at $ (\lambda^*)$:

$\displaystyle \L (x,\lambda) =$ $\displaystyle f(x)- \sum_{i=1}^m \lambda_i c_i(x)$    
$\displaystyle g(\lambda^*) =$ $\displaystyle f(x^*)- \sum_{i=1}^m \lambda_i^* c_i(x^*)$    

Using equation 8.46:


$\displaystyle =$ $\displaystyle f(x^*)- m t$ (8.47)

When $ t \rightarrow 0$, we will have $ g(\lambda^*)=f(x^*)=p^*$ which concludes the proof. $ m t$ is the duality gap.
Let's define $ p^*$ the minimum value of the original problem 8.35, and $ (x^*,\lambda^*)$ the solution of $ min_x
\varphi_t(x)$ for a given value of $ t$. From equation 8.47, we have the interesting following result: $ f(x^*)-p^* \leq m t$. That is: $ x^*$ is no more than $ m t$ suboptimal.
What's the link between equations 8.40-8.43, which are used inside the primal-dual algorithm and the barrier method? We see that if we combine equations 8.45 and 8.46, we obtain 8.40. The equations 8.42 and 8.46 are the same, except that $ t=\tau $. The "perturbation parameter" $ \tau$ in primal-dual algorithm is simply the barrier parameter $ t$ in barrier algorithms. In fact, barrier method and primal-dual methods are very close. The main difference is: In primal-dual methods, we update the primal variables $ x$ AND the dual variable $ \lambda $ at each iteration. In barrier methods, we only update the primal variables.

A final note on primal-dual algorithms.

The primal-dual algorithms are usually following this scheme:
  1. Set $ l:=0$
  2. Solve approximatively the barrier problem 8.44 for a given value of the centering parameter $ \sigma_l$ (see equation 8.34 about $ \sigma$) using as starting point the solution of the previous iteration.
  3. Update the barrier parameter $ \sigma$. For example: $ \sigma_{l+1} := min \{\; 0.2 \; \sigma_l, \;\sigma_l^{1.5} \; \} $
    Increase the iteration counter $ l:=l+1$
    If not finished, go to step 2.
The principal difficulties in primal-dual methods are: These kinds of algorithm are useful when the number of constraints are huge. In this case, identifying the set of active constraints can be very time consuming (it's a combinatorial problem which can really explode). Barrier methods and primal-dual methods completely avoids this problem. In fact, most recent algorithms for linear programming and for quadratic programming are based on primal-dual methods for this reason. The main field of application of primal-dual methods is thus optimization in very large dimensions (can be more than 10000) and with many constraints (can be more than 6000000). Since we have only a reduced amount of constraints, we can use an active set method without any problem.

SQP Methods

The material of this section is based on the following references: [NW99,Fle87].
SQP stands for ``Sequential quadratic Programming''. We want to find the solution of:

$\displaystyle f(x^*)= \min_x f(x) \; \;$    Subject to: $\displaystyle c_i(x) \geq 0 , \; i=1,\ldots,m$ (8.48)

As in the case of the Newton's method in unconstrained optimization, we will do a quadratical approximation of the function to optimize and search for the minimum of this quadratic. The function to be approximated will be the Lagrangian function $ \L $.
The quadratical approximation of $ \L $ is:
$\displaystyle \L (x_k+\delta_x,\lambda_k+\delta_\lambda) \approx$   $\displaystyle \mbox{$\cal Q$}$$\displaystyle (\delta_x,\delta_\lambda)$ $\displaystyle =$ $\displaystyle \L (x_k,\lambda_k)+
\begin{picture}(.27,.3)
\put(0,0){\makebox(0,...
...bda_k) \left( \begin{array}{c}\delta_x \\
\delta_\lambda \end{array} \right)
+$  
    \begin{displaymath}\frac{1}{2} \left( \begin{array}{cc}\delta_x & \delta_\lambda...
...
\begin{array}{c}\delta_x \\ \delta_\lambda \end{array} \right)\end{displaymath}  


$\displaystyle \Longleftrightarrow$   $\displaystyle \mbox{$\cal Q$}$$\displaystyle (\delta_x,\delta_\lambda)$ $\displaystyle =$ \begin{displaymath}\L (x_k,\lambda_k)+ \left(
\begin{array}{cc} g_k+A_k \lambda_...
...\begin{array}{c} \delta_x \\ \delta_\lambda \end{array}\right)+\end{displaymath}  
    $\displaystyle \frac{1}{2} \left( \begin{array}{cc} \delta_x & \delta_\lambda
\e...
...\right] \left( \begin{array}{c} \delta_x \\
\delta_\lambda \end{array} \right)$ (8.49)

With $ A_k$ : the Jacobian matrix of the constraints evaluated at $ x_k$: we have: $ \displaystyle (A_k)_{i,j}= \frac{\partial c_i(x)}{\partial
j}
 \; \;$ or $ \; \; \displaystyle (A_k)_{i}= \nabla c_i(x)$
and $ H_k$ : the Hessian Matrix of the Langrangian function: $ \nabla_x^2 \L (x_k,
 \lambda_k)$ = $ \displaystyle H_k= \nabla_x^2 f(x_k)- \sum_i
(\lambda_{k})_i \nabla^2 c_i(x_k) $

The full Hessian $ W_k$ of $ \L $ is thus :
$\displaystyle W_k=\left[
 \begin{array}{cc} H_k & -A_k \\  -A_k & 0 \end{array}...
...\nabla$}}
 \put(.16,.17){\circle*{.18}} \end{picture}
 ^2 \L (x_k,\lambda_k)] )$ (8.50)


If we are on the boundary of the $ i^{th}$ constraint, we will have $ (\delta_x)^t \nabla c_i(x) =0$, thus we can write:

on the constraints boundaries: $\displaystyle A_k \delta_x \approx 0$ (8.51)


We want to find $ \delta_x$ which minimize $ \mbox{$\cal Q$}$$ (\delta_x,\delta_\lambda)$, subject to the constraints $ c_i(x) \geq 0, \: i=1,...,m $.
From equation 8.49 and using equation 8.51, we obtain:

\begin{displaymath}\begin{array}{c} \displaystyle \min_{\delta_x} \mbox{$\cal Q$...
...subject to } c_j(x_k+\delta_x) \geq 0, \: j=1,...,r \end{array}\end{displaymath} (8.52)

Using a first order approximation of the constraints around $ x_k$, we have the

\begin{displaymath}\begin{array}{l}
 \text{\em Quadratic Program QP}:\\
 \disp...
...elta_x)^t \nabla c_j(x_k) \geq 0, \; \; j=1, ...,r
 \end{array}\end{displaymath} (8.53)




DEFINITION: a Quadratic Program (QP) is a function which finds the minimum of a quadratic subject to linear constraints.
Note that $ H_k= \nabla_x^2 \L (x_k,\lambda_k) \neq \nabla^2 f(x_k)$.
We can now define the SQP algorithm:
  1. solve the QP subproblem described on equation 8.53 to determine $ \delta_x$ and let $ \lambda_{k+1}$ be the vector of the Lagrange multiplier of the linear constraints obtained from the QP.
  2. set $ x_{k+1}=x_k+\delta_x$
  3. Increment $ k$. Stop if $ \begin{picture}(.27,.3)
\put(0,0){\makebox(0,0)[bl]{$\nabla$}}
\put(.16,.17){\circle*{.18}}
\end{picture}
\L (x_k,\lambda_k)\approx 0$. Otherwise, go to step 1.
This algorithm is the generalization of the ``Newton's method'' for the constrained case. It has the same properties. It has, near the optimum (and under some special conditions), quadratical convergence.
A more robust implementation of the SQP algorithm adjust the length of the steps and performs "second order correction steps" (or "SOC steps").

A small note about the H matrix in the SQP algorithm.

As already mentioned, we have $ H_k= \nabla_x^2 \L (x_k,\lambda_k)
= \nabla_x^2 f(x_k)- \sum_i (\lambda_{k})_i \nabla^2 c_i(x_k) \neq
\nabla^2 f(x_k)$
The inclusion of the second order constraint terms in the subproblem is important in that otherwise second order convergence for nonlinear constraints would not be obtained. This is well illustrated by the following problem:

\begin{displaymath}\begin{array}{l} \text{minimize} - x_1 - x_2 \\  \text{subject to }
 1 - x_1^2-x_2^2 \geq 0 \end{array}\end{displaymath} (8.54)

in which the objective function is linear so that it is only the curvature of the constraint which causes a solution to exist. In this case the sequence followed by the SQP algorithm is only well-defined if the constraint curvature terms are included.
We can obtain a good approximation of this matrix using an extension of the BFGS update to the constrained case.

A final note about the SQP algorithm.

The following table compares the number of function evaluations and gradient evaluations required to solve Colville's (1969) [Col73] first three problems and gives some idea of the relative performances of the algorithms.
  Extrapolated Multiplier SQP method
Problem barrier function penalty function (Powell, 1978a) [Pow77]
TP1 177 47 6
TP2 245 172 17
TP3 123 73 3
The excellent results of the SQP method are very attractive.
next up previous contents
Next: The final choice of Up: A short review of Previous: Non-Linear constraints   Contents
Frank Vanden Berghen 2004-04-19