GLPK/Modeling tips

This page provides suggestions for formulating optimization problems in GLPK — be they expressed as MathProg statements plus any required GLPSOL options or coded up as custom programs which use GLPK functions either directly or via bindings.

Big M

The big M method is a method of formulating certain types of problem by introducing an artificial quantity $M$ and assigning it a sufficiently large constant value. Take, for instance, the following MathProg model whereby $x$ has to be either less than parameter $x1$ or greater than parameter $x2$ and also where $x1 < x2$:

param x1;
param x2;
param M;
var x;
var b, binary;
# constraint is active when b = 0
s.t. c1 : x <= x1 + b * M;
# constraint is active when b = 1
s.t. c2 : x >= x2 - (1 - b) * M;


$M$ should be chosen such that it is close to the smallest value that does not reduce the solution space. Selecting too large an $M$ in the circumstances may result in a low quality solution due to numerical imprecision. Further information on the big M method can be found in textbooks.

Identifying all solutions

From time to time, the question arises of identifying all feasible solutions or all optimal solutions to a given LP, MIP, or IP optimization problem. GLPK can only return one solution, which incidentally is guaranteed to be globally optimal by virtue of the feasible region being both convex and bounded.

This April 2011 thread (one of several) discusses the merits of adding solution enumeration features to GLPK. One reason for providing all optimal (or even close to optimal) solutions is to allow subsequent human intervention to select which solution to deploy, based on secondary criteria not readily amenable to formal modeling.

All optimal solutions

In some cases, the optimal solution is not unique.

GLPK cannot enumerate all the optimal solutions. Nevertheless, with the use of external coding, one can create an iterative method by which the current optimal solution is made infeasible by applying a spurious binary constraint and the modified problem then resubmitted. On termination, the various solutions can be collected for further analysis. This method is not deemed inelegant by any means. Khachiyan et al (2008)[1] and Provan (1994)[2] provide further general background. See also this January 2010 posting which suggests the use of symphony.

The N best solutions

A variation on this theme for IP and MIP problems is the requirement to identify the $N$ best solutions. One strategy involves keeping a dynamic pool of good solutions, discarding and replacing these as better solutions are revealed. The feasibility of this method has yet to be established but this May 2011 thread should offer insights.

All feasible solutions

In some circumstances, all the feasible solutions are sought.

GLPK cannot enumerate all the feasible solutions. A similar method to that described above can be used, but with the objective function set to null (or omitted). GLPK will naturally stop at the first feasible solution when the objective function is null.

An alternative is to use SCIP, which is also capable of counting the number of feasible solutions. The project homepage has more details. SCIP is released under the ZIB academic license, which precludes commercial use.

Close-to-zero values

Automatically generated problems can easily become badly scaled if the routine that generates them also creates coefficients that are numerically close-to-zero — say around 1.0e−10 in value — either as an artifact of floating point arithmetic, by poor design, or both. Equally, a solution, whether from a hand crafted or generated problem, can be hard to interpret when the results are peppered with close-to-zero values.

GLPK does not offer support for input rounding — the setting of any close-to-zero values to true zero on problem building. That task remains the responsibility of the client code. Neither does GLPK offer support for output rounding — the setting of close-to-zero values to true zero on reporting.[3]

The choice of a suitable tolerance at which to test and reset close-to-zero values is a decision for the client, but 1.0e−09 is sometimes suggested.

MathProg output

To omit close-to-zero valued variables when writing out model results with MathProg — in this case, from the matrix x[i,j] at a tolerance of 1.0e−03 — use the following statement as a template:

printf{i in I, j in J: abs(x[i,j]) >= 0.001} "...", x[i,j], ... ;


You can also invoke other conditionals to select or suppress output. And you can call display instead of printf.

Non-convex functions

A non-convex function — be it a constraint or an objective — cannot be represented directly using mixed-integer (linear) programming. But in some circumstances, a non-convex function may be captured satisfactorily using a piecewise linear approximation. Croxton et al (2002) describe three equivalent formulations.[4]

The general method uses binary variables to "switch in" the required piecewise segment and relies on special ordered set of type 2 (SOS2) constraints. The following document, glpk-sos2.pdf, explains in detail how to model non-convex piecewise linear functions in GLPK.

Nonlinear objective functions

Extremum terms

A nonlinear objective function in the form

$\mathrm{minimize}\ z = \max(x_1, x_2) + \max(x_3, x_4) + \dots$

can be modeled as an LP (the objective function is still convex) using GLPK MathProg as follows:

z =  max_x1_x2 + max_x3_x4 + ...

s.t. max_x1_x2 >= x1
max_x1_x2 >= x2
max_x3_x4 >= x3
max_x3_x4 >= x4
. . .


The more complicated case

$\mathrm{minimize}\ z = \max(x_1, x_2) + \max(x_3, x_4) - \min(x_5, x_6) - \dots$

can be tackled by substituting $+\max(-x_5, -y_6)$ for $-\min(x_5, y_6)$ as the two are equivalent. Using a similar idea, the case of maximization

$\mathrm{maximize}\ z = \min(x_1, x_2) + \min(x_3, x_4) - \max(x_5, x_6) - \dots$

can be similarly transformed by multiplying through by minus one (not shown). Refer to this April 2011 thread for more information.

SLP techniques

Successive linear programming (SLP) employs Taylor series expansions to approximate nonlinear objective functions. See this April 2011 thread for a discussion on SLP techniques and GLPK and, in particular, the management of GLPK problem objects and use of warm starts. Note also the C# example file t1.cs in the GLPK examples directory.

Special ordered sets

SOS1 : special ordered set of type 1

A special ordered set of type one (SOS1), sometimes notated $\mathrm{SOS1}(x_1, x_2, \ldots, x_n)$, means that, at most, only one of the variables can be non-zero.

If all the variables are binary, SOS1 is equivalent to, in MathProg:

x[1] + x[2] + ... + x[n] <= 1


If the variables are continuous, assuming that 0 <= x[j] <= u[j], where u[j] is an upper bound of x[j], SOS1 can be modeled in MathProg thus:

x[j] <= u[j] * z[j]  for j = 1,...,n,
z[1] + ... + z[n] <= 1,


where z[j] are auxiliary binary variables.

SOS2 : special ordered sets of type 2

As an example of SOS2 modeling of a piecewise linear function in mathprog, here is a simple example of creating a piecewise linear model of electricity pool price based on published price sensitivities in the Queensland region in the Australian National Electricity Market. Each half hour in the market, the price sensitivity is calculated for every half hour trading interval prior to the actual trading interval. For the period ending 2012-11-22 13:00 the following price sensitivities were published:

Demand offset (MW) Price
-500 $49.47 -200$51.54
-100 $51.97 +100$56.09
+200 $57.25 +500$155.01
+1000 \$450.34

To determine the price at any demand offset -500 MW <= x <= +1000 MW then the following GLPK mathprog model can be run:

#
#  SOS2 example
#
#  Piecewise linear approximation of 2012-11-22 13:00
#  QLD price sensitivities
#
#  Dr. H J Mackenzie
#  HARD software
#  hjm@hardsoftware.com
#

param n >= 0;

set NO_POINTS := 1..n;
set NO_SEGMENTS := 1..(n-1);

param x{NO_POINTS};
param y{NO_POINTS};

param x_value;

var z{i in NO_SEGMENTS} binary;
var s{i in NO_SEGMENTS} >= 0;
var y_value;

s.t. z_constraint :
1 = sum {i in NO_SEGMENTS} z[i];

s.t. s_constraint {i in NO_SEGMENTS} :
s[i] <= z[i];

s.t. x_constraint :
x_value = sum {i in NO_SEGMENTS} ((z[i] * x[i]) + ((x[i+1] - x[i]) * s[i]));

s.t. y_constraint :
y_value = sum {i in NO_SEGMENTS} ((z[i] * y[i]) + ((y[i+1] - y[i]) * s[i]));

maximize total : y_value;
solve;

printf "\nx value = %10.2f\n", x_value;
printf "y value = %10.2f\n\n", y_value;
printf "\n\n";

data;

param n := 7;

param : x y :=
1 -500  49.47
2 -200  51.54
3 -100  51.97
4  100  56.09
5  200  57.25
6  500  155.01
7 1000  450.34
;

param x_value := 260;

end;


Network programming problems

The term "network programming" derives from operations research. The underlying mathematics falls under graph theory and, in particular, that part devoted to digraphs.

One commonly encountered network programming task is the minimum cost flow problem (mincost) which seeks a set of network flows that minimize a given set of flow-dependent specific costs. The mincost problem can be used to model a surprising wide range of practical optimization questions.

GLPK offers specialist support for network programming through the DIMACS graph data format and various network algorithms.

DIMACS graph format

GLPK supports a variation of the DIMACS format for graph theory problems. The details are covered comprehensively in the official documentation and are therefore not repeated here. Additional background can be found on the interoperability page.

API calls

GLPK supports the out-of-kilter mincost algorithm through the function glp_mincost_okalg. Again consult the official documentation for more details.

In terms of relative complexity, one user reports (April 2011) that, in round terms, out-of-kilter runs about 5 times faster than simplex, for the same problem, and uses 2/3 the memory.

Version 4.49 of GLPK adds the relaxation method from Bertsekas and Tseng for solving minimum cost flow problems. On large instances, RELAX-IV should be 100–1000 times faster than the standard primal simplex method. This feature is a translation from the original Fortran code and is now licensed under GPLv3. The API call is through function glp_mincost_relax4. The algorithm itself was developed by Dimitri Bertsekas at the Massachusetts Institute of Technology.[5]

Neither glp_mincost_relax4 nor glp_mincost_okalg allow non-integer data — all flow lower bounds, arc capacities, and per-unit costs should be integer-valued (although they are passed to their respective functions as type double).

Minimum cost flow (MCF) problem benchmarks

Below are some benchmarks for a subset of Klingman's standard minimum cost flow instances (generated by glp_netgen and glp_netgen_prob).

Minimum cost flow benchmark times in seconds
Problem Nodes Arcs Optimum Primal iterations simplex time [s] OKALG time [s] RELAX-IV time [s]
101 5000 25536 6191726 17560 104.4 32.5 0.1
102 5000 25387 72337144 23633 152.3 49.5 0.2
103 5000 25355 218947553 28101 186.4 68.4 0.3
104 5000 25344 −19100371 16808 104.3 117.6 0.2
105 5000 25332 31192578 17239 107.4 28.0 0.2
106 5000 12870 4314276 11401 44.2 16.6 0.1
107 5000 37832 7393769 20315 171.0 48.5 0.2
108 5000 50309 8405738 22806 244.5 76.7 0.3
109 5000 75299 9190300 27267 411.9 110.7 0.5
110 5000 12825 8975048 11215 44.2 17.3 0.1

OKALG is the out-of-kilter algorithm.

See this posting for more details.

GLPSOL

Unfortunately, GLPSOL — the command-line interface to the GLPK solver — does not provide access to the network programming algorithms. The --mincost option currently only indicates that the data to be input is in DIMACS format. As of April 2013, GLPSOL solves a mincost problem by converting it to a linear program and then calling the simplex solver. The out-of-kilter and RELAX-IV algorithms are not available in this context.

Third-party initiatives

LEMON is an open source graph library written in C++ which, because it is part of the COIN-OR project, can communicate with the GLPK solver. The LEMON interface supports combinatorial optimization tasks related specifically to graphs and networks and can recognize an number common graph-based data structures. LEMON is release under the Boost free software license.

Sets and numbers

A set contains objects — such as numbers, strings, and other sets. A set cannot be said to hold numeric entries, but param's can. So the trick is to create a param array that translates the number objects in a set into their numeric value. Consider the following code:

model;
param tmax;
set T:= 1..tmax;
param tt{t in T} := t;

data;
param tmax := 720;
end;


Later on in the model section, use the construct:

tt[t]


to recover a genuine value — one that can be manipulated or printed as a floating point number.

References

1. Khachiyan, Leonid; Boros, Endre; Borys, Konrad; Elbassioni, Khaled M; Gurvich, Vladimir (2008). "Generating all vertices of a polyhedron is hard". Discrete and Computational Geometry 39: 174-190. doi:10.1007/s00454-008-9050-5. ISSN 0179-5376.
2. Provan, J Scott (1994). "Efficient enumeration of the vertices of polyhedra associated with network LP's". Mathematical Programming 63 (1-3): 47-64. doi:10.1007/BF01582058. ISSN 0025-5610.
3. While a call to the deprecated lpx_set_int_parm(lp, LPX_K_ROUND, 1) will still compile, it has no effect.
4. Croxton, Keely L; Gendron, Bernard; Magnanti, Thomas L (July 2002). A comparison of mixed-integer programming models for non-convex piecewise linear cost minimization problems — Operations Research Center Working Paper OR 363-02. Boston, USA: Operations Research Center, Massachusetts Institute of Technology.
5. Bertsekas, Dimitri P; Tseng, Paul (November 1994). RELAX-IV: a faster version of the RELAX code for solving minimum cost flow problems. Boston, USA: LIDS, Massachusetts Institute of Technology.