Conic Optimization

Introduction

This note introduces some central ideas behind the implementation of efficient interior-point methods for solving conic programs. A larger portion of the theory section is based on the papers for the conic solvers Clarabel11 You can play a around with a few examples using a Wasm compiled Clarabel here [1] and CVXOPT [2]. What this note adds is some essential implementation details that were left out of the original two papers (e.g. iterative refinement, matrix equilibration, and handling of the structure of the scaling matrices).

Primal and dual problems

We will in this text consider optimization problems where the objective function is at most quadratic and the constraints are conic (including the zero cone in order to model equality constraints). The general form of such (primal) problems is the following

where is the decision variable, is the slack variable, and , , , . Furthermore, is a composite i.e. the Cartesian product of cones () with each of the cones of dimension being of the following types

where creates a symmetric matrix from the vector . This can be done in a plethora of ways, but an often used definition is

where the scaling of have been introduced on the off-diagonal elements in order to preserve the inner product, i.e. . Furthermore, we denote the inverse operation by , i.e. and operation that creates a vector from a symmetric matrix . The nonnegative, second-order, and positive semidefinite cones are examples of symmetric cones which are a special class of cones that have a lot of nice properties. The exponential and power cones are examples of nonsymmetric cones which do not have the same nice properties as symmetric cones, but are still very useful in practice. The zero cone is a special case that is used to model equality constraints.

The dual problem

The primal optimization problem of interest can equivalently be written as

where the conic constraint can be understood as a generalized inequality, i.e

However, the standard form (similar to LPs) requires that the inequality is turned the other way. As such we negate the expression so that, resulting in

We now write up the Lagrangian as

Since we must have that is in the dual cone in order for the Lagrangian being a lower bound for any feasible . This comes directly from the definition of the dual cone, i.e.

Now since we want to find a stationary point we compute the gradient of the Lagrangian and set it to zero

Now notice that the Lagrangian can be written as

The thing inside the parentheses, is almost equal to the zero constraint on the gradient. By performing the mathematicians favorite magic trick of adding zero followed by rearranging the expressions we end up with

As such the dual formulation of the problem becomes

Homogenous Embedding

The standard KKT conditions for the primal problem is

However, we can take a different path in order to solve the same problem. First we define the duality gap as the difference between the primal and dual objectives

where we used that (from the gradient constraint) and (from the definition of the equality constraints). Because the duality gap is always nonnegative for any feasible point, and is zero at the optimal point when strong duality holds. We can therefore come up with different optimization problem by combining the constraints of the primal and dual problem and with an objective to minimize the duality gap

An issue with the above is that the duality gap is explicitly set to zero in the first equality constraint. This condition can be relaxed after a change of variables and and instead solve

In [1] they go on to show the existence of solutions and infeasibility detection. This will be skipped here.

Barrier Functions

Barrier functions are functions that are defined only inside of the feasible region (i.e. when ) and tends towards infinity when we move towards the boundary of the feasible set. They are useful as they can be used to approximate the constrained problem by an unconstrained problem

As the solution to the unconstrained problem tend towards the solution of the constrained problem. A naive approach would simply to initially pick to be a large value in order to get a good approximation of the original problem. However, this is not a good idea as the resulting optimization problem is very ill-conditioned and hard to solve. Instead, interior-point methods start with a small value of and then track the solution of the unconstrained problem as increases until we are reasonably close to the optimal solution of the original problem. Methods that rely on this strategy be solving a series of simpler problems are known as homotopy methods. As an example we will now describe the properties of the logarithmic barrier functions for the three symmetric cones22 That is the nonnegative, second-order, and positive semidefinite cone., but later we will also discuss the barrier functions for the exponential and power cones.33 What symmetric and nonsymmetric cones are and how they differ will be explained in a later section.

The logarithmic barrier function for the three symmetric cones

For a composite cone where each component is one of the three symmetric cones we have that the logarithmic barrier function is given by

where with and

meaning that 44 The notation here is a little sloppy as we use that . for all . Furthermore the logarithmic barrier function satisfy the follows relation for where

We refer to as the degree of the cone th cone so that is degree of the composite cone .

Gradients of the barrier function for the three symmetric coness

By linearity the gradient of the logarithmic barrier function is given by the sum of gradients of the individual barrier functions for each symmetric cone given by

where is a -vector of ones. It can be shown that and that for .

Hessians of the barrier function for the three symmetric cones

We denote the Hessian of and as and :

where with

In addition, we have that

Further properties of the Hessian of the logarithmic barrier function is

The Jordan Product for the three symmetric cones

The Jordan product for the three symmetric cones are given by

where is the idempotent for the cones given by

Note that and . In addition, the inverse, square, and square root are defined by the relations , , and .

Logarithmically homogenous self-concordant barrier

The ideas of the logaritmic barrier functions can be extended to the nonsymmetric cones as well. We often refer to the general the class of logarithmically homogenous self-concordant barrier functions, which are a special class of barrier functions that have nice properties that can be exploited in the design of interior-point methods as follows: A function is called a logarithmically homogenous self-concordant barrier with degree (-LHSCB)55 The logarithmic barrier functions from the previous section are all -LHSCB functions. for the convex cone if it satisfies the following properties

The conjugate barrier function is defined as 66 This is not to be confused with the convex conjugate of a function, which is denoted by . The relation between the two conjugates are and .

The function is -LHSCB for if the associated is. The primal gradient satisfies

In cases where the conjugate barrier function is known only through the definition (i.e. rather than a closed-form representation) we can compute its gradient as the solution to

The Central Path

As described previous the idea of interior-point methods is to track the solution of an approximate unconstrained problem as increases until we are reasonably close to the optimal solution of the original constrained problem. The central path denotes this path. In order to describe the central path mathematically we definine the function representing the equality constraints of the Homogenous embedding

Now assume that is a -LHSCB function on with conjugate barrier for some Given any initial , we define the central path of as the unique solution to

now inserting the defintion of while using that it follows that

If the cone is symmetric, then we can replace the condition with a simpler condition defined in terms of the Jordan product on , i.e.

where is the idempotent for each of the symmetric cones.

Scaling Matrices

We now introduce the idea of a scaling matrix . The choice of scaling matrix depends on which way we linearize the central path. For symmetric cones the most common choice is the Nesterov-Todd scaling. For nonsymmetric cones the central path is defined by the set of point satisfying (32). For the zero cone we set .

Symmetric cones

For symmetric cones the central path described with (34) is linearized. The Nesterov-Todd scaling exploits the self-scaled property of symmetric cones to compute a unique scaling point satisfying

We can factorize as .

Nonsymmetric cones

As nonsymmetric cones are not self-scaled, and we can not use (34) when describing the central path. Instead, we must linearize (32) instead. A general primal-dual scaling strategy suitable for nonsymmetric cones was proposed in [3], and used later in [4], which relies on the satisfaction of the two secant equations

We now define shadow iterates as

with

A scaling matrix can be obtained from the rank-4 Broyden-Fletcher-Goldfarb-Shanno (BFGS) update, which is commonly used in quasi-Newton methods,

where , , and is an approximation of the Hessian. In our implementation we choose following [4]. Putting things together means that the computation of reduces to a rank-3-update as

A more detailed description can be found in [4].

The above is defined in terms of the conjugate barrier function . This in contrast to [4] that defines it terms of the standard barrier function.

Computing Step Directions

The overall aim to find the optimal solution to the homogenuous embedding. This amounts to solving the is to solve the nonlinear system of equations with the additional constraint of . The aim of most interior point methods is to track the central path for until reasonably close to the optimum. In order to solve the non-linear sytem Newtons method is applied. This requires a linearization of the central path which can be seen below,

where is the vector of residuals, while and is a positive definite block-diagonal matrix with blocks equal to the scaling matrices 77 Ignoring the sparse form of the scaling matrices that are usually required in practice. This will be explained in a later section.. Note that in order to allow for nonsymmetric cones we use , which is in contrast to [2] that uses with .

We now eliminate from the system resulting in

The above system can be solved by a pair of linear system with a common coefficient matrix, meaning that it only has to be factorized once

From the above solution we can recover the search direction as follows

The actual step is then obtained by computing the maximal step size that ensures that the new update is still in the interior of the conic constraints.

The affine and centering directions

At every interior-point iteration we solve KKT system for two sets of right-hand sides each, corresponding to the so-called affine and centering directions. In short this means solving (43) twice, however since the system matrix stays the same and the right-hand side stays the same the computation requires only a single numerical factorization 88 As the sparsity pattern does not change the symbolic factorization can be reused and three solves.

The two sets of right-hand sides corresponds to a predictor step and a corrector step corrector. The steps are also known as the affine step and the centering step. In our case the affine step has the right-hand side of

while the corrector step have the right-hand side of

where denotes the inverse operator of . Computation of a higher-order correction term is a heuristic technique that is known to accelerate the convergence of IPMs significantly. The choice of this term varies depending on the choice of the scaling matrix and whether a given constraint is symmetric or not. For symmetric cones we use the Mehrotra correction , while for nonsymmetric cones we compute using the 3rd-order correction from [4], i.e.

We set where is the step size of the affine step.

Solver initialization

An essential consideration is how to chose the initial point . The initialization strategies depend on whether the objective is linear or quadratic or not and whether the cones are symmetric or not. However, in all cases the initial scalar values are set as .

Symmetric Cones

The initialization strategies below can be thought of as projections of the equality constraints onto cones in the direction of the idempotent for each symmetric cone.

Quadratic Objective

If the initialization happens by solving the following linear system

with with being a matrix with ones on the diagonal corresponding to zero cones (i.e. equality constraints) but otherwise zero. The above linear system corresponding to solving the following optimization problem

From this solution we project and back to the interior of the

where and .

Linear Objective

In the case of instead solve two linear systems

From this we set

where and .

Nonsymmetric cones

When the composite cone contain any nonsymmetric cone both primal and dual variables are set as (corresponding to ). This is equivalent to solving the unconstrained optimization problem

which is strictly convex and has a unique solution. It yields for symmetric cones, and

for exponential cones, and

for power cones with parameter .

Termination criteria

All termination criteria are based on unscaled problem data and iterates, i.e. after the Ruiz scaling has been reverted.

Feasibility Checks

For checks of primal and dual feasibility we introduce the normalized variables , , and define the primal and dual residual as follows

Likewise, we define the primal and dual objectives as

Using the above definitions we can declare convergence if all the following three holds:

We specify a default value of as well as a weaker threshold of when testing for “near optimality” in cases of early termination (e.g. lack of progress, timeout, iterations limit, etc.).

Infeasibility checks

When testing or infeasibility we do not normalize iterates, but rather work directly with the unscaled variables since infeasibility corresponds to the case where . We declare primal infeasibility if the following holds

Similarly, we define dual infeasibility if the following hold

We set the relative and absolute tolerances as and allow for weaker thresholds to declare “near infeasibility” certificates in cases of early termination.

Implementation Details

Solving the KKT system

The main computational efforts in the interior point method is to solve the Karush-Kuhn-Tucker (KKT) system of the form

The scaling matrix, however, has the unfortunate property that it becomes ill-conditioned when the iterates is close to the boundary. As such one instead solves the regularized problem

where denotes the added diagonal part. Note that while solving (62) is more numerically stable, it is not the system that we actually aim to solve i.e. (61). The regularization is computed as

Regularized Iterative Refinement

As described in [5], the regularization term can be removed by solving a sequence of systems of the form

The sequence converges towards to the true solution provided that it exist. To show this first notice that we can rewrite (64) as

Now assume that is the solution to the equation (which is the equation we aim to solve). Now we perform the mathematicians favorite trick of adding 0 (in this case ), it follows that

We now subtract (66) from (65)

Rewriting the above gives us

The final step is to take norms on both sides resulting in

Thus, if we have that for .

Matrix Equilibration

The equilibration in Clarabel is a modified version of Ruiz scaling [6],scalings, similar as to what is done in the OSQP package [5]. The thing is based on the scaling of the constant part of the KKT system (i.e. the part within the primal-dual scaling matrices) as shown below

The aim is to find a diagonal scaling matrix such that

is better conditioned. The above can be seen as a transformation of input data to a conic optimization problem

where , , , .

The main idea of the equilibration procedure is to scale the rows (and columns) for the matrix so that they have equal -norm. An efficient approximate solution to this problem is the so-called Modified Ruiz scaling described in [5] based on the Ruiz scaling described in [6].

Sparse Solvers

More often than not the main computational time in an interior-point method is solving (62). Luckily, the linear system is often sparse leading to fast factorization algorithms. As such it is important it utilizes an efficient implementation of a direct sparse solver if one wants the interior-point solver to be computationally efficient. In Clarabel it is possible to utilize the following sparse solvers

Sparsity exploitation

Augmented sparsity

In numerical computations one often meet a matrix of the form sparse-plus-low-rank. The general form of this is the following

where is sparse and and with . The bad thing of the above matrix is that forming the outer products (i.e. assembling the plus-low-rank part) will ruin the sparsity of and in turn will be a dense matrix. Given that our aim is to solve a system of the form lets look at the structure of by performing the product

The above together with the constraints and we can solve by solving the following equivalent system

Note that while is sparse it is larger. An important property of the augmented system is that if and then is Quasi-Definite and therefore strongly factorizable, meaning that for every permutation one can find and so that [8].

Second-order cones

The Clarabel approach

Clarabel uses the same approach as in ECOS [9]. In short, they utilize that the structure of for a second-order cone is augmented-sparse, meaning that it can be split as follows

where . We can generalize the above for all second-order cones by defining

meaning that

As such the KKT of the form

can equivalently be solved by solving

The approach is simple as it leaves intact and does not scale . However, one has to deal with computing and , which is not unique. In fact Clarabel and ECOS applies different approaches when calculating and . A detail here is that one wants to compute and in a way for which so that the system remains quasi-definite (and therefore strongly factorizable).

The CVXOPT approach

Another approach is that of CVXOPT. Here the diagonal-plus-low-rank structure of the scaling matrix of a second-order cone (and its inverse) is used, i.e. that

where the and are diagonal. We can again bunch all the scaling matrices together by defining

so that

Note that the and is defined as previous, but are different since the and comes from (81) rather than (76). We now show how the rank-update and the known inverse can be used explicitly in order to preserve sparsity. The KKT system of interest using Nesterov-Todd scaling is the following

By scaling the -variable and multiplying the last row with one find that

For simplicity, we now define

We can now introduce new variables resulting in

Where we have used that to show why the system is also symmetric. Now the above preserves sparsity. For SOCs we have that , so we can reduce the above as

Roated Second-order cones

By rotating the rotated second-order cone back to the second-order cone form we can exploit the sparsity similar to as the regular second-order cone. This rotation can be done by applying the following transformation

Power Cones

The conjugate barrier for certain power cones have a Hessian that, similar to that of second-order cones, can be represented using augmented sparsity [10]. The intuition here is that e.g. the generalized power cone reduce to a second-order cone when . Given that the augmented sparsity is only available for the conjugate barrier the following is only relevant in the case of dual scaling.

The generalized power cone is parametrized by such that

and is defined as

with dual cone

The generalized power cone is parametrized by such that

and is defined as

with dual cone

The relative entropy cone is defined as

with dual cone

Generalized Power Cones

As shown in [10] the Hessian of the conjugate barrier for the generalized power cones are augmented sparse with and , i.e. that

where and with

where

and , and .

Power mean cone

For the power mean cone it is shown in [10] that the Hessian of the conjugate barrier is augmented sparse with and , i.e. that

where and with

where

and

, and .

Relative Entropy Cone

For the relative entropy cone it is shown in [10] that the Hessian of the conjugate barrier is sparse with the following structure

where and

with .

Semi-definite cones

We start by introducing the vectorization of a symmetric matrix defined below

Note that the scaling of is in order to preserve inner-products i.e. that . We equivalently introduce the matrization of a vector (and denote it by ) which unsurprisingly have the property as

Note that the scaling ensures that we preserve inner-products i.e. that .

Computing dense

In the CVXOPT the Nesterov-Todd scaling of a semidefinite cone is described by the product

However, stating it like this gives no insights into the structure of , which is needed when assembling the KKT system (to be precise is needed for the KKT assembly). It turns out that the structure comes from the symmetric Kronecker product given by

Using the above we can see that this means that the Nesterov-Todd scaling of the semidefinite cone is

from which it becomes clear that . We’re not almost ready to describe , but first we need to introduce the following property of the symmetric Kronecker product [11]

Using the above it follows that

Now how do we define the symmetric Hadamard product? First we need to introduce a mapping that works as follows

Using it is now possible to describe the symmetric Kronecker product as

This definite intuitively makes sense. The in front can be seen to lift a to i.e. from a symmetric vector to a regular vector. We then apply this to a regular Kronecker product (which is symmetric due to the sum) and then pull the product down back to using . Note that in general is not symmetric. Instead, it represents a single half a otherwise symmetric product. In the case where , as is the case of NT-scaling, the product is symmetric and in fact

Bibliography