Conic Optimization
Introduction
This note introduces some central ideas behind the implementation of efficient interior-point methods for solving conic programs. A larger portion of the theory section is based on the papers for the conic solvers Clarabel11 You can play a around with a few examples using a Wasm compiled Clarabel here [1] and CVXOPT [2]. What this note adds is some essential implementation details that were left out of the original two papers (e.g. iterative refinement, matrix equilibration, and handling of the structure of the scaling matrices).
Primal and dual problems
We will in this text consider optimization problems where the objective function is at most quadratic and the constraints are conic (including the zero cone in order to model equality constraints). The general form of such (primal) problems is the following
where is the decision variable, is the slack variable, and , , , . Furthermore, is a composite i.e. the Cartesian product of cones () with each of the cones of dimension being of the following types
- The zero cone: (used to model equality constraints).
- The nonnegative cone: .
- The second-order cone: .
- The exponential cone:
- The power cone: .
- The positive semidefinite cone: .
where creates a symmetric matrix from the vector . This can be done in a plethora of ways, but an often used definition is
where the scaling of have been introduced on the off-diagonal elements in order to preserve the inner product, i.e. . Furthermore, we denote the inverse operation by , i.e. and operation that creates a vector from a symmetric matrix . The nonnegative, second-order, and positive semidefinite cones are examples of symmetric cones which are a special class of cones that have a lot of nice properties. The exponential and power cones are examples of nonsymmetric cones which do not have the same nice properties as symmetric cones, but are still very useful in practice. The zero cone is a special case that is used to model equality constraints.
The dual problem
The primal optimization problem of interest can equivalently be written as
where the conic constraint can be understood as a generalized inequality, i.e
However, the standard form (similar to LPs) requires that the inequality is turned the other way. As such we negate the expression so that, resulting in
We now write up the Lagrangian as
Since we must have that is in the dual cone in order for the Lagrangian being a lower bound for any feasible . This comes directly from the definition of the dual cone, i.e.
Now since we want to find a stationary point we compute the gradient of the Lagrangian and set it to zero
Now notice that the Lagrangian can be written as
The thing inside the parentheses, is almost equal to the zero constraint on the gradient. By performing the mathematicians favorite magic trick of adding zero followed by rearranging the expressions we end up with
As such the dual formulation of the problem becomes
Homogenous Embedding
The standard KKT conditions for the primal problem is
However, we can take a different path in order to solve the same problem. First we define the duality gap as the difference between the primal and dual objectives
where we used that (from the gradient constraint) and (from the definition of the equality constraints). Because the duality gap is always nonnegative for any feasible point, and is zero at the optimal point when strong duality holds. We can therefore come up with different optimization problem by combining the constraints of the primal and dual problem and with an objective to minimize the duality gap
An issue with the above is that the duality gap is explicitly set to zero in the first equality constraint. This condition can be relaxed after a change of variables and and instead solve
In [1] they go on to show the existence of solutions and infeasibility detection. This will be skipped here.
Barrier Functions
Barrier functions are functions that are defined only inside of the feasible region (i.e. when ) and tends towards infinity when we move towards the boundary of the feasible set. They are useful as they can be used to approximate the constrained problem by an unconstrained problem
As the solution to the unconstrained problem tend towards the solution of the constrained problem. A naive approach would simply to initially pick to be a large value in order to get a good approximation of the original problem. However, this is not a good idea as the resulting optimization problem is very ill-conditioned and hard to solve. Instead, interior-point methods start with a small value of and then track the solution of the unconstrained problem as increases until we are reasonably close to the optimal solution of the original problem. Methods that rely on this strategy be solving a series of simpler problems are known as homotopy methods. As an example we will now describe the properties of the logarithmic barrier functions for the three symmetric cones22 That is the nonnegative, second-order, and positive semidefinite cone., but later we will also discuss the barrier functions for the exponential and power cones.33 What symmetric and nonsymmetric cones are and how they differ will be explained in a later section.
The logarithmic barrier function for the three symmetric cones
For a composite cone where each component is one of the three symmetric cones we have that the logarithmic barrier function is given by
where with and
meaning that 44 The notation here is a little sloppy as we use that . for all . Furthermore the logarithmic barrier function satisfy the follows relation for where
We refer to as the degree of the cone th cone so that is degree of the composite cone .
Gradients of the barrier function for the three symmetric coness
By linearity the gradient of the logarithmic barrier function is given by the sum of gradients of the individual barrier functions for each symmetric cone given by
where is a -vector of ones. It can be shown that and that for .
Hessians of the barrier function for the three symmetric cones
We denote the Hessian of and as and :
where with
In addition, we have that
Further properties of the Hessian of the logarithmic barrier function is
The Jordan Product for the three symmetric cones
The Jordan product for the three symmetric cones are given by
where is the idempotent for the cones given by
Note that and . In addition, the inverse, square, and square root are defined by the relations , , and .
Logarithmically homogenous self-concordant barrier
The ideas of the logaritmic barrier functions can be extended to the nonsymmetric cones as well. We often refer to the general the class of logarithmically homogenous self-concordant barrier functions, which are a special class of barrier functions that have nice properties that can be exploited in the design of interior-point methods as follows: A function is called a logarithmically homogenous self-concordant barrier with degree (-LHSCB)55 The logarithmic barrier functions from the previous section are all -LHSCB functions. for the convex cone if it satisfies the following properties
The conjugate barrier function is defined as 66 This is not to be confused with the convex conjugate of a function, which is denoted by . The relation between the two conjugates are and .
The function is -LHSCB for if the associated is. The primal gradient satisfies
In cases where the conjugate barrier function is known only through the definition (i.e. rather than a closed-form representation) we can compute its gradient as the solution to
The Central Path
As described previous the idea of interior-point methods is to track the solution of an approximate unconstrained problem as increases until we are reasonably close to the optimal solution of the original constrained problem. The central path denotes this path. In order to describe the central path mathematically we definine the function representing the equality constraints of the Homogenous embedding
Now assume that is a -LHSCB function on with conjugate barrier for some Given any initial , we define the central path of as the unique solution to
now inserting the defintion of while using that it follows that
If the cone is symmetric, then we can replace the condition with a simpler condition defined in terms of the Jordan product on , i.e.
where is the idempotent for each of the symmetric cones.
Scaling Matrices
We now introduce the idea of a scaling matrix . The choice of scaling matrix depends on which way we linearize the central path. For symmetric cones the most common choice is the Nesterov-Todd scaling. For nonsymmetric cones the central path is defined by the set of point satisfying (32). For the zero cone we set .
Symmetric cones
For symmetric cones the central path described with (34) is linearized. The Nesterov-Todd scaling exploits the self-scaled property of symmetric cones to compute a unique scaling point satisfying
We can factorize as .
Nonsymmetric cones
As nonsymmetric cones are not self-scaled, and we can not use (34) when describing the central path. Instead, we must linearize (32) instead. A general primal-dual scaling strategy suitable for nonsymmetric cones was proposed in [3], and used later in [4], which relies on the satisfaction of the two secant equations
We now define shadow iterates as
with
A scaling matrix can be obtained from the rank-4 Broyden-Fletcher-Goldfarb-Shanno (BFGS) update, which is commonly used in quasi-Newton methods,
where , , and is an approximation of the Hessian. In our implementation we choose following [4]. Putting things together means that the computation of reduces to a rank-3-update as
A more detailed description can be found in [4].
The above is defined in terms of the conjugate barrier function . This in contrast to [4] that defines it terms of the standard barrier function.
Computing Step Directions
The overall aim to find the optimal solution to the homogenuous embedding. This amounts to solving the is to solve the nonlinear system of equations with the additional constraint of . The aim of most interior point methods is to track the central path for until reasonably close to the optimum. In order to solve the non-linear sytem Newtons method is applied. This requires a linearization of the central path which can be seen below,
where is the vector of residuals, while and is a positive definite block-diagonal matrix with blocks equal to the scaling matrices 77 Ignoring the sparse form of the scaling matrices that are usually required in practice. This will be explained in a later section.. Note that in order to allow for nonsymmetric cones we use , which is in contrast to [2] that uses with .
We now eliminate from the system resulting in
The above system can be solved by a pair of linear system with a common coefficient matrix, meaning that it only has to be factorized once
From the above solution we can recover the search direction as follows
The actual step is then obtained by computing the maximal step size that ensures that the new update is still in the interior of the conic constraints.
The affine and centering directions
At every interior-point iteration we solve KKT system for two sets of right-hand sides each, corresponding to the so-called affine and centering directions. In short this means solving (43) twice, however since the system matrix stays the same and the right-hand side stays the same the computation requires only a single numerical factorization 88 As the sparsity pattern does not change the symbolic factorization can be reused and three solves.
The two sets of right-hand sides corresponds to a predictor step and a corrector step corrector. The steps are also known as the affine step and the centering step. In our case the affine step has the right-hand side of
while the corrector step have the right-hand side of
where denotes the inverse operator of . Computation of a higher-order correction term is a heuristic technique that is known to accelerate the convergence of IPMs significantly. The choice of this term varies depending on the choice of the scaling matrix and whether a given constraint is symmetric or not. For symmetric cones we use the Mehrotra correction , while for nonsymmetric cones we compute using the 3rd-order correction from [4], i.e.
We set where is the step size of the affine step.
Solver initialization
An essential consideration is how to chose the initial point . The initialization strategies depend on whether the objective is linear or quadratic or not and whether the cones are symmetric or not. However, in all cases the initial scalar values are set as .
Symmetric Cones
The initialization strategies below can be thought of as projections of the equality constraints onto cones in the direction of the idempotent for each symmetric cone.
Quadratic Objective
If the initialization happens by solving the following linear system
with with being a matrix with ones on the diagonal corresponding to zero cones (i.e. equality constraints) but otherwise zero. The above linear system corresponding to solving the following optimization problem
From this solution we project and back to the interior of the
where and .
Linear Objective
In the case of instead solve two linear systems
From this we set
where and .
Nonsymmetric cones
When the composite cone contain any nonsymmetric cone both primal and dual variables are set as (corresponding to ). This is equivalent to solving the unconstrained optimization problem
which is strictly convex and has a unique solution. It yields for symmetric cones, and
for exponential cones, and
for power cones with parameter .
Termination criteria
All termination criteria are based on unscaled problem data and iterates, i.e. after the Ruiz scaling has been reverted.
Feasibility Checks
For checks of primal and dual feasibility we introduce the normalized variables , , and define the primal and dual residual as follows
Likewise, we define the primal and dual objectives as
Using the above definitions we can declare convergence if all the following three holds:
We specify a default value of as well as a weaker threshold of when testing for “near optimality” in cases of early termination (e.g. lack of progress, timeout, iterations limit, etc.).
Infeasibility checks
When testing or infeasibility we do not normalize iterates, but rather work directly with the unscaled variables since infeasibility corresponds to the case where . We declare primal infeasibility if the following holds
Similarly, we define dual infeasibility if the following hold
We set the relative and absolute tolerances as and allow for weaker thresholds to declare “near infeasibility” certificates in cases of early termination.
Implementation Details
Solving the KKT system
The main computational efforts in the interior point method is to solve the Karush-Kuhn-Tucker (KKT) system of the form
The scaling matrix, however, has the unfortunate property that it becomes ill-conditioned when the iterates is close to the boundary. As such one instead solves the regularized problem
where denotes the added diagonal part. Note that while solving (62) is more numerically stable, it is not the system that we actually aim to solve i.e. (61). The regularization is computed as
Regularized Iterative Refinement
As described in [5], the regularization term can be removed by solving a sequence of systems of the form
The sequence converges towards to the true solution provided that it exist. To show this first notice that we can rewrite (64) as
Now assume that is the solution to the equation (which is the equation we aim to solve). Now we perform the mathematicians favorite trick of adding 0 (in this case ), it follows that
We now subtract (66) from (65)
Rewriting the above gives us
The final step is to take norms on both sides resulting in
Thus, if we have that for .
Matrix Equilibration
The equilibration in Clarabel is a modified version of Ruiz scaling [6],scalings, similar as to what is done in the OSQP package [5]. The thing is based on the scaling of the constant part of the KKT system (i.e. the part within the primal-dual scaling matrices) as shown below
The aim is to find a diagonal scaling matrix such that
is better conditioned. The above can be seen as a transformation of input data to a conic optimization problem
where , , , .
The main idea of the equilibration procedure is to scale the rows (and columns) for the matrix so that they have equal -norm. An efficient approximate solution to this problem is the so-called Modified Ruiz scaling described in [5] based on the Ruiz scaling described in [6].
Sparse Solvers
More often than not the main computational time in an interior-point method is solving (62). Luckily, the linear system is often sparse leading to fast factorization algorithms. As such it is important it utilizes an efficient implementation of a direct sparse solver if one wants the interior-point solver to be computationally efficient. In Clarabel it is possible to utilize the following sparse solvers
- (Rust) fear-rs. Multithreaded. Wasm compatible.
- (Rust and Julia) Built-in Quasi-Definite LDL (QDLDL) based on [7]. Single threaded.
- (Rust and Julia) PanuaPardiso (Requires license). Multithreaded.
- (Rust and Julia) oneMKL-Pardiso (Only Intel CPUs). Multithreaded.
- (Julia) cuDSS (CUDA Direct Sparse Solver). GPU-based.
- (Julia) MA57 (Requires license). Multithreaded.
- (Julia) CHOLMOD (Standard Julia Solver). Multithreaded.
Sparsity exploitation
Augmented sparsity
In numerical computations one often meet a matrix of the form sparse-plus-low-rank. The general form of this is the following
where is sparse and and with . The bad thing of the above matrix is that forming the outer products (i.e. assembling the plus-low-rank part) will ruin the sparsity of and in turn will be a dense matrix. Given that our aim is to solve a system of the form lets look at the structure of by performing the product
The above together with the constraints and we can solve by solving the following equivalent system
Note that while is sparse it is larger. An important property of the augmented system is that if and then is Quasi-Definite and therefore strongly factorizable, meaning that for every permutation one can find and so that [8].
Second-order cones
The Clarabel approach
Clarabel uses the same approach as in ECOS [9]. In short, they utilize that the structure of for a second-order cone is augmented-sparse, meaning that it can be split as follows
where . We can generalize the above for all second-order cones by defining
meaning that
As such the KKT of the form
can equivalently be solved by solving
The approach is simple as it leaves intact and does not scale . However, one has to deal with computing and , which is not unique. In fact Clarabel and ECOS applies different approaches when calculating and . A detail here is that one wants to compute and in a way for which so that the system remains quasi-definite (and therefore strongly factorizable).
The CVXOPT approach
Another approach is that of CVXOPT. Here the diagonal-plus-low-rank structure of the scaling matrix of a second-order cone (and its inverse) is used, i.e. that
where the and are diagonal. We can again bunch all the scaling matrices together by defining
so that
Note that the and is defined as previous, but are different since the and comes from (81) rather than (76). We now show how the rank-update and the known inverse can be used explicitly in order to preserve sparsity. The KKT system of interest using Nesterov-Todd scaling is the following
By scaling the -variable and multiplying the last row with one find that
For simplicity, we now define
We can now introduce new variables resulting in
Where we have used that to show why the system is also symmetric. Now the above preserves sparsity. For SOCs we have that , so we can reduce the above as
Roated Second-order cones
By rotating the rotated second-order cone back to the second-order cone form we can exploit the sparsity similar to as the regular second-order cone. This rotation can be done by applying the following transformation
Power Cones
The conjugate barrier for certain power cones have a Hessian that, similar to that of second-order cones, can be represented using augmented sparsity [10]. The intuition here is that e.g. the generalized power cone reduce to a second-order cone when . Given that the augmented sparsity is only available for the conjugate barrier the following is only relevant in the case of dual scaling.
The generalized power cone is parametrized by such that
and is defined as
with dual cone
The generalized power cone is parametrized by such that
and is defined as
with dual cone
The relative entropy cone is defined as
with dual cone
Generalized Power Cones
As shown in [10] the Hessian of the conjugate barrier for the generalized power cones are augmented sparse with and , i.e. that
where and with
where
and , and .
Power mean cone
For the power mean cone it is shown in [10] that the Hessian of the conjugate barrier is augmented sparse with and , i.e. that
where and with
where
and
, and .
Relative Entropy Cone
For the relative entropy cone it is shown in [10] that the Hessian of the conjugate barrier is sparse with the following structure
where and
with .
Semi-definite cones
We start by introducing the vectorization of a symmetric matrix defined below
Note that the scaling of is in order to preserve inner-products i.e. that . We equivalently introduce the matrization of a vector (and denote it by ) which unsurprisingly have the property as
Note that the scaling ensures that we preserve inner-products i.e. that .
Computing dense
In the CVXOPT the Nesterov-Todd scaling of a semidefinite cone is described by the product
However, stating it like this gives no insights into the structure of , which is needed when assembling the KKT system (to be precise is needed for the KKT assembly). It turns out that the structure comes from the symmetric Kronecker product given by
Using the above we can see that this means that the Nesterov-Todd scaling of the semidefinite cone is
from which it becomes clear that . We’re not almost ready to describe , but first we need to introduce the following property of the symmetric Kronecker product [11]
Using the above it follows that
Now how do we define the symmetric Hadamard product? First we need to introduce a mapping that works as follows
Using it is now possible to describe the symmetric Kronecker product as
This definite intuitively makes sense. The in front can be seen to lift a to i.e. from a symmetric vector to a regular vector. We then apply this to a regular Kronecker product (which is symmetric due to the sum) and then pull the product down back to using . Note that in general is not symmetric. Instead, it represents a single half a otherwise symmetric product. In the case where , as is the case of NT-scaling, the product is symmetric and in fact
Bibliography
- [1] P. J. Goulart and Y. Chen, “Clarabel: An interior-point solver for conic programs with quadratic objectives.” [Online]. Available: https://arxiv.org/abs/2405.12762
- [2] L. Vandenberghe, “The CVXOPT linear and quadratic cone program solvers.” [Online]. Available: https://www.seas.ucla.edu/~vandenbe/publications/coneprog.pdf
- [3] L. Tuncel, “Generalization of Primal—Dual Interior-Point Methods to Convex Optimization Problems in Conic Form,” Foundations of Computational Mathematics, no. 3, 2001, doi: 10.1007/s002080010009.
- [4] J. Dahl and E. D. Andersen, “A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization,” Mathematical Programming, 2022, doi: 10.1007/s10107-021-01631-4.
- [5] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: an operator splitting solver for quadratic programs,” Mathematical Programming Computation, vol. 12, no. 4, 2020, doi: 10.1007/s12532-020-00179-2.
- [6] D. Ruiz, “A scaling algorithm to equilibrate both rows and columns norms in matrices.” [Online]. Available: https://cerfacs.fr/wp-content/uploads/2017/06/14_DanielRuiz.pdf
- [7] T. A. Davis, “Algorithm 849: A concise sparse Cholesky factorization package,” Acm Transactions on Mathematical Software, 2005, doi: 10.1145/1114268.1114277.
- [8] R. J. Vanderbei, “Symmetric Quasidefinite Matrices,” SIAM Journal on Optimization, 1995, doi: 10.1137/0805005.
- [9] A. Domahidi, E. Chu, and S. Boyd, “ECOS: An SOCP solver for embedded systems,” European Control Conference, 2013, doi: 10.23919/ECC.2013.6669541.
- [10] Y. Chen and P. Goulart, “An Efficient IPM Implementation for A Class of Nonsymmetric Cones.” [Online]. Available: https://arxiv.org/abs/2305.12275
- [11] K. Schacke, “On the Kronecker Product.” [Online]. Available: https://www.math.uwaterloo.ca/~hwolkowi/henry/reports/kronthesisschaecke04.pdf