next up previous contents
Next: The constrained step in Up: Detailed description of the Previous: The QP algorithm   Contents

Subsections

The SQP algorithm

The material of this section is based on the following references: [NW99,Fle87,PT93].
The SQP algorithm is:
  1. set $ k:=1$
  2. Solve the QP subproblem described on equation 8.53 to determine $ \delta_k$ and let $ \lambda_{k+1}$ be the vector of the Lagrange multiplier of the linear constraints obtained from the QP.
  3. Compute the length $ \alpha_k$ of the step and set $ x_{k+1}=x_k+ \alpha_k \delta_k$
  4. Compute $ H_{k+1}$ from $ H_k$ using a quasi-Newton formula
  5. Increment $ k$. Stop if $ \begin{picture}(.27,.3)
\put(0,0){\makebox(0,0)[bl]{$\nabla$}}
\put(.16,.17){\circle*{.18}}
\end{picture}
\L (x_k,\lambda_k)\approx 0$. Otherwise, go to step 2.
We will now give a detailed discussion of each step. The QP subproblem has been described in the previous section.

Length of the step.

From the QP, we got a direction $ \delta_k$ of research. To have a more robust code, we will apply the first Wolf condition (see equation 13.4) which is recalled here:

$\displaystyle f(\alpha)
 \leq f(0) + \rho \alpha f'(0) \; \; \; \;\; \; \;\;\; \; \;\rho
 \in (0, \frac{1}{2})$ (9.18)

where $ f(\alpha)=f(x_k+\alpha \delta_k)$. This condition ensure a ``sufficient enough" reduction of the objective function at each step. Unfortunately, in the constrained case, sufficient reduction of the objective function is not enough. We must also ensure reduction of the infeasibility. Therefore, we will use a modified form of the first Wolf condition where $ f(\alpha)=\phi(x_k+\alpha
\delta_k)$ and $ \phi(x)$ is a merit function. We will use in the optimizer the $ l_1$ merit function:

$\displaystyle \phi (x)=f(x)+
 \frac{1}{\mu} \Vert c(x) \Vert _1$ (9.19)

where $ \mu
>0 $ is called the penalty parameter. A suitable value for the penalty parameter is obtained by choosing a constant $ K>0$ and defining $ \mu$ at every iteration too be

$\displaystyle \mu^{-1}= \Vert
 \lambda_{k+1} \Vert _\infty + K$ (9.20)

In equation 9.18, we must calculate $ f'(0)$: the directional derivative of $ \phi$ in the direction $ \delta_k$ at $ x_k$:

$\displaystyle f'(0)= D_{\delta_k} \phi (x_k)= \delta_k^t g_k - \frac{1}{\mu}
 \...
...} \delta_k^t a_i + \sum_{i \in {\cal
 Z}_k} \max ( -\delta_k^t a_i, 0 ) \right)$ (9.21)

where $ {\cal F}_k=
\{ i: c_i(x_k)<0 \}$ , $ {\cal Z}_k= \{ i: c_i(x_k)=0 \}$ , $ g_k=
\nabla f(x_k)$ , $ a_i= \nabla c_i(x_k) $. The algorithm which computes the length of the step is thus:
  1. Set $ \alpha_k:=1$, $ x_k:=$ current point, $ \delta_k:=$ direction of research
  2. Test the Wolf condition equation 9.18: $ \phi(x_k+\alpha_k \delta_k) \leq $ ? $ \phi(x_k) +
\rho \alpha_k D_{\delta_k} \phi (x_k)$
  3. We have successfully computed the length $ \Vert \alpha_k
\delta_k\Vert$ of the step.

Maratos effect: The SOC step.

Figure 9.5: Maratos effect.
\begin{figure}
\centering\epsfig{figure=figures/maratos.eps, width=9cm,
height=8cm}
\end{figure}

The $ l_1$ merit function $ \phi (x)=f(x)+ \frac{1}{\mu} \Vert c(x)
\Vert _1 $ can sometimes reject steps (=to give $ \alpha_k=0$) that are making good progress towards the solution. This phenomenon is known as the Maratos effect. It is illustrated by the following example:

$\displaystyle \min_{x,y} f(x,y)= 2 (x^2+y^2-1)-x, \; \; \; \;$    Subject to $\displaystyle \; \; x^2+y^2-1=0$ (9.22)

The optimal solution is $ x^*=(1,0)^t$. The situation is illustrated in figure 9.5. The SQP method moves ( $ \delta_1=(1,0)$) from $ x_1=(0,1)^t$ to $ x_2=(1,1)^t$.
In this example, a step $ \delta_1=\delta_k$ along the constraint will always be rejected ( $ \alpha_k=0$) by $ \phi=l_1$ merit function. If no measure are taken, the Maratos effect can dramatically slow down SQP methods. To avoid the Maratos effect, we can use a second-order correction step (SOC) in which we add to $ \delta_k$ a step $ {\delta_k}'$ which is computed at $ c(x_k+\delta_k)$ and which provides sufficient decrease in the constraints. Another solution is to allow the merit function $ \phi$ to increase on certain iterations (watchdog, non-monotone strategy).
Suppose we have found the SQP step $ \delta_k$. $ \delta_k$ is the solution of the following QP problem:

\begin{displaymath}\begin{array}{l} \displaystyle \min_\delta g_k^t \delta +
 \f...
...abla c_i(x_k)^t \delta \geq
 0 \;\;\; i=1,\ldots,m
 \end{array}\end{displaymath} (9.23)

Where we have used a linear approximation of the constraints at point $ x_k$ to find $ \delta_k$. Suppose this first order approximation of the constraint is poor. It's better to replace $ \delta_k$ with $ s_k$, the solution of the following problem, where we have used a quadratical approximation of the constraints:

  $\displaystyle \min_s g_k^t s + \frac{1}{2} s^t H_k s$ (9.24)
  $\displaystyle \displaystyle$   subject to:$\displaystyle \; \; c_i(x_k)+ \nabla
 c_i(x_k)^t s + \frac{1}{2} s^t \nabla^2 c_i(x_k) s \geq 0 \;\;\;
 i=1,\ldots,m$ (9.25)

but it's not practical, even if the hessian of the constraints are individually available, because the subproblem becomes very hard to solve. Instead, we evaluate the constraints at the new point $ x_k+\delta_k$ and make use of the following approximation. Ignoring third-order terms, we have:

$\displaystyle c_i(x_k+\delta_k)= c_i(x_k)+ \nabla c_i(x_k)^t \delta_k
 + \frac{1}{2} \delta_k^t \nabla^2 c_i(x_k) \delta_k$ (9.26)

We will assume that, near $ x_k$, we have:

$\displaystyle \delta_k^t \nabla^2 c_i(x_k) \delta_k \approx
 s \nabla^2 c_i(x_k) s \; \; \; \; \; \; \forall s \; \;$    small (9.27)

Using 9.27 inside 9.26, we obtain:

$\displaystyle \frac{1}{2} s \nabla^2 c_i(x_k) s
 \approx c_i(x_k+\delta_k)- c_i(x_k) - \nabla c_i(x_k)^t \delta_k$ (9.28)

Using 9.28 inside 9.25, we obtain:

$\displaystyle c_i(x_k)+$ $\displaystyle \nabla c_i(x_k)^t s +
 \frac{1}{2} s^t \nabla^2 c_i(x_k) s$    
  $\displaystyle \approx
 c_i(x_k)+ \nabla c_i(x_k)^t s + \Big( c_i(x_k+\delta_k)- c_i(x_k)
 - \nabla c_i(x_k)^t \delta_k \Big)$    
  $\displaystyle =
 \Big(c_i(x_k+\delta_k) - \nabla c_i(x_k)^t \delta_k \Big) + \nabla
 c_i(x_k)^t s$ (9.29)

Combining 9.24 and 9.29, we have:

\begin{displaymath}\begin{array}{l} \displaystyle \min_s
 g_k^t s + \frac{1}{2} ...
...) + \nabla c_i(x_k)^t s \geq 0
 \;\;\; i=1,\ldots,m \end{array}\end{displaymath} (9.30)

Let's define $ s_k$ the solution to problem 9.30. What we really want is $ {\delta_k}'=s_k-\delta_k \Leftrightarrow
s_k=\delta_k+{\delta_k}'$. Using this last equation inside 9.30, we obtain: $ {\delta_k}'$ is the solution of

\begin{displaymath}\begin{array}{l} \displaystyle \min_{{\delta}'} g_k^t
 (\delt...
...la c_i(x_k)^t {\delta}' \geq 0 \;\;\; i=1,\ldots,m
 \end{array}\end{displaymath}    

If we assume that $ \nabla f(x_k+\delta_k) \approx \nabla f(x_k)$ ( $ \Rightarrow H_k \delta_k \approx 0 $ )(equivalent to assumption 9.27), we obtain:

\begin{displaymath}\begin{array}{l} \displaystyle \min_{{\delta}'}
 g_k^t {\delt...
...bla c_i(x_k)^t {\delta}' \geq 0\;\;\; i=1,\ldots,m
 \end{array}\end{displaymath} (9.31)

which is similar to the original equation 9.23 where the constant term of the constraints are evaluated at $ x_k+\delta_k$ instead of $ x_k$. In other words, there has been a small shift on the constraints (see illustration 9.4). We will assume that the active set of the QP described by 9.23 and the QP described by 9.31 are the same. Using the notation of section 9.1.1, we have:

$\displaystyle {\delta_k}' =-Y b' \; \;\;$   where $\displaystyle {b_i}'= c_i(x_k+\delta_k) \;\;\; i=1,\ldots,m$ (9.32)

Where $ Y$ is the matrix calculated during the first QP 9.23 which is used to compute $ \delta_k$. The SOC step is thus not computationally intensive: all what we need is an new evaluation of the active constraints at $ x_k+\delta_k$. The SOC step is illustrated in figure 9.4 and figure 9.5. It's a shift perpendicular to the active constraints with length proportional to the amplitude of the violation of the constraints. Using a classical notation, the SOC step is:

$\displaystyle {\delta_k}'= -A_k^t (A_k
 A_k^t)^{-1} c(x_k+\delta_k)$ (9.33)

where $ A_k$ is the jacobian matrix of the active constraints at $ x_k$.

Update of $ H_k$

$ H_k$ is an approximation of the hessian matrix of the Lagrangian of the optimization problem.

$\displaystyle H_k \approx \nabla_x^2 \L (x_k,\lambda_k)=\nabla_x^2 f(x_k)- \sum_i
 (\lambda_{k})_i \nabla^2 c_i(x_k)$ (9.34)

The QP problem makes the hypothesis that $ H_k$ is definite positive. To obtain a definite positive approximation of $ \nabla_x^2 \L $ we will use the damped-BFGS updating for SQP (with $ H_1=I$):
$\displaystyle s_k$ $\displaystyle :=$ $\displaystyle x_{k+1}- x_k$  
$\displaystyle y_k$ $\displaystyle :=$ $\displaystyle \nabla_x \L (x_{k+1},\lambda_{k+1}) - \nabla_x \L (x_k,\lambda_{k+1})$  
$\displaystyle \theta_k$ $\displaystyle :=$ \begin{displaymath}\begin{cases}1 & \displaystyle \text{if } \; \;
s_k^t y_k \ge...
... s_k}{s_k^t H_k s_k - s_k^t
y_k} & \text{otherwise} \end{cases}\end{displaymath}  
$\displaystyle r_k$ $\displaystyle :=$ $\displaystyle \displaystyle \theta_k y_k + (1-\theta_k) H_k s_k$  
$\displaystyle H_{k+1}$ $\displaystyle :=$ $\displaystyle H_k - \frac{H_k s_k s_k^t H_k}{s_k^t H_k
s_k}+\frac{r_k r_k^t}{s_k^t r_k}$ (9.35)

The formula 9.35 is simply the standard BFGS update formula, with $ y_k$ replaced by $ r_k$. It guarantees that $ H_{k+1}$ is positive definite.

Stopping condition

All the tests are in the form:

$\displaystyle \Vert r \Vert \leq (1+ \Vert V \Vert)\tau$ (9.36)

where $ r$ is a residual and $ V$ is a related reference value.
We will stop in the following conditions:
  1. The length of the step is very small.
  2. The maximum number of iterations is reached.
  3. The current point is inside the feasible space, all the values of $ \lambda $ are positive or null, The step's length is small.

The SQP in detail

The SQP algorithm is:
  1. set $ k:=1$
  2. If termination test is satisfied then stop.
  3. Solve the QP subproblem described on equation 8.53 to determine $ \delta_k$ and let $ \lambda_{k+1}$ be the vector of the Lagrange multiplier of the linear constraints obtained from the QP.
  4. Choose $ \mu_k$ such that $ \delta_k$ is a descent direction for $ \phi$ at $ x_k$, using equation 9.20.
  5. Set $ \alpha_k:=1$
  6. Test the Wolf condition equation 9.18: $ \sigma(x_k+\alpha_k \delta_k) \leq $ ? $ \sigma(x_k) + \rho \alpha_k D_{\delta_k} \phi (x_k)$
  7. Compute $ H_{k+1}$ from $ H_k$ using a quasi-Newton formula
  8. Increment $ k$. Stop if $ \begin{picture}(.27,.3)
\put(0,0){\makebox(0,0)[bl]{$\nabla$}}
\put(.16,.17){\circle*{.18}}
\end{picture}
\L (x_k,\lambda_k)\approx 0$ otherwise, go to step 1.

next up previous contents
Next: The constrained step in Up: Detailed description of the Previous: The QP algorithm   Contents
Frank Vanden Berghen 2004-04-19