Optimization is an important concept in engineering. Finding any solution to a problem is not nearly as good as finding the one "optimal solution" to the problem. Optimization problems are typically reformatted so they become minimization problems, which are well-studied problems in the field of mathematics.
Typically, when optimizing a system, the costs and benefits of that system are arranged into a cost function. It is the engineers job then to minimize this cost function (and thereby minimize the cost of the system). It is worth noting at this point that the word "cost" can have multiple meanings, depending on the particular problem. For instance, cost can refer to the actual monetary cost of a system (number of computer units to host a website, amount of cable needed to connect Philadelphia and New York), the delay of the system (loading time for a website, transmission delay for a communication network), the reliability of the system (number of dropped calls in a cellphone network, average lifetime of a car transmission), or any other types of factors that reduce the effectiveness and efficiency of the system.
Because optimization typically becomes a mathematical minimization problem, we are going to discuss minimization here.
Minimization is the act of finding the numerically lowest point in a given function, or in a particular range of a given function. Students of mathematics and calculus may remember using the derivative of a function to find the maxima and minima of a function. If we have a function f(x), we can find the maxima, minima, or saddle-points (points where the function has zero slope, but is not a maxima or minima) by solving for x in the following equation:
In other words, we are looking for the roots of the derivative of the function f plus those points where f has a corner. Once we have the so called critical points of the function (if any), we can test them to see if they are relatively high (maxima), or relatively low (minima). Some words to remember in this context are:
- Global Minima
- A global minimum of a function is the lowest value of that function anywhere. If the domain of the function is restricted, say A < x < B, then the minima can also occur at the boundary, here A or B.
- Local Minima
- A local minimum of a function is the lowest value of that function within a small range. A value can thus be a local minimum even though there are smaller function values, but not in a small neighborhood.
Unconstrained Minimization refers to the minimization of the given function without having to worry about any other rules or caveats. Constrained Minimization, on the other hand, refers to minimization problems where other relations called constraints must be satisfied at the same time.
Beside the method above (where we take the derivative of the function and set that equal to zero), there are several numerical methods that we can use to find the minima of a function. For these methods there are useful computational tools such as Matlab.
The function has a local minima at a point x if the Hessian matrix H(x) is positive definite:
Where x is a vector of all the independant variables of the function. If x is a scalar variable, the hessian matrix reduces to the second derivative of the function f.
The Newton-Raphson Method of computing the minima of a function f uses an iterative computation. We can define the sequence:
As we repeat the above computation, plugging in consecutive values for n, our solution will converge on the true solution. However, this process will take infinitely many iterations to converge, but if an approximation of the true solution will suffices, you can stop after only few iterations, because the sequence converges rather quickly (quadratic).
Steepest Descent Method
The Newton-Raphson method can be tricky because it relies on the second derivative of the function f, and this can oftentimes be difficult (if not impossible) to accurately calculate. The Steepest Descent Method, however, does not require the second derivative, but it does require the selection of an appropriate scalar quantity ε, which cannot be chosen arbitrarily (but which can also not be calculated using a set formula). The Steepest Descent method is defined by the following iterative computation:
Where epsilon needs to be sufficiently small. If epsilon is too large, the iteration may diverge. If this happens, a new epsilon value needs to be chosen, and the process needs to be repeated.
Conjugate Gradient Method
Constrained Minimization' is the process of finding the minimum value of a function under a certain number of additional rules called constraints. For instance, we could say "Find the minium value of f(x), but g(x) must equal 10". These kinds of problems are more difficult, but the Khun-Tucker theorem, and also the Karush-Khun-Tucker theorem help to solve them.
There are two different types of constraints: equality constraints and inequality constraints. We will consider them individually, and then mixed constraints.
The Khun-Tucker Theorem is a method for minimizing a function f(x) under the equality constraint g(x). The theorem reads as follows:
Given the cost function f, and an equality constraint g in the following form:
Then we can convert this problem into an unconstrained minimization problem by constructing the Lagrangian function of f and g:
Where Λ is the lagrange multiplier, and < , > denotes the scalar product of the vector space Rn (where n is the number of equality constraints). We will discuss scalar products in more detail later. If we differentiate this equation with respect to x, we can find the minimum of this whole function L(x,Λ), and that will be the minimum of our function f.
This is a set of n+k equations with n+k unknown variables (n Λs and k xs).
Similar to the method above, let us say that we have a cost function f, and an inequality constraint in the following form:
Then we can take the Lagrangian of this again:
But we now must use the following three equations/ inequalities in determining our solution:
These last second equation can be interpreted in the following way:
- if , then
- if , then
Using these two additional equations/ inequalities, we can solve in a similar manner as above.
If we have a set of equality and inequality constraints
we can combine them into a single Lagrangian with two additional conditions:
Infinite Dimensional Minimization
The above methods work well if the variables involved in the analysis are finite-dimensional vectors, like those in the RN. However, when we are trying to minimize something that is more complex than a vector, i.e. a function we need the following concept. We consider functions that live in a subspace of L2(RN), which is an infinite-dimensional vector space. We will define the term functional as follows:
- A functional is a map that takes one or more functions as arguments, and which returns a scalar value.
Let us say that we consider functions x of time t (N=1). Suppose further we have a fixed function f in two variables. With that function, we can associate a cost functional J:
Where we are explicitly taking account of t in the definition of f. To minimize this function, like all minimization problems, we need to take the derivative of the function, and set the derivative to zero. However, we need slightly more sophisticated version of derivative, because x is a function. This is where the Gateaux Derivative enters the field.
We can define the Gateaux Derivative in terms of the following limit:
Which is similar to the classical definition of the derivative in the direction h. In plain words, we took the derivative of F with respect to x in the direction of h. h is an arbitrary function of time, in the same space as x (here we are talking about the space L2). Analog to the one-dimensional case a function is differentiable at x iff the above limit exists. We can use the Gateaux derivative to find the minimization of our functional above.
We will now use the Gateaux derivative, discussed above, to find the minimizer of the following types of function:
We thus have to find the solutions to the equation:
The solution is the Euler-Lagrange Equation:
The partial derivatives are done in an ordinary way ignoring the fact that x is a function of t. Solutions to this equation are either maxima, minima, or saddle points of the cost functional J.
Example: Shortest Distance
We've heard colloquially that the shortest distance between two points is a straight line. We can use the Euler-Lagrange equation to prove this rule.
If we have two points in R2, a, and b, we would like to find the minimum curve (x,y(x)) that joins these two points. Line element ds reads:
Our function that we are trying to minimize then is defined as:
We can take the Gateaux derivative of the function J and set it equal to zero to find the minimum function between these two points. Denoting the square root as f, we get
Knowing that the line element will be finite this boils down to the equation
with the well known solution