Mechanics considered using forces
In Newtonian mechanics, a mechanical system is always made up of point masses or rigid bodies, and these are subject to known forces. One must therefore specify the composition of the system and the nature of forces that act on the various bodies. Then one writes the equations of motion for the system. Here are some examples of how one describes mechanical systems in Newtonian mechanics (these examples are surely known to you from school-level physics).
- Example: a free mass point.
This is the most trivial of all mechanical systems: a mass point that does not interact with any other bodies and is subject to no forces. Introduce the coordinates to describe the position of the mass point. Since the force is always equal to zero, the equations of motion are . The general solution of these equations describes a linear motion with constant velocity: , etc.
- Example: two point masses with springs attached to a motionless wall.
| | \/\/\/ (m1) \/\/\ (m2) ----> x |
Two masses can move along a line (the axis) without friction. The mass is attached to the wall by a spring, and the mass is attached to the mass by a spring. Both springs have spring constant and the unstretched length .
To write the equations of motion, we first introduce the two coordinates and then consider the forces acting on the two masses. The force on the mass is the sum of the leftward-pointing force from the left spring and the rightward-pointing force from the right spring. The force on is a leftward-pointing . By definition of a "spring" we have and . Therefore we write the equations for the accelerations of the two masses:
At this point we are finished describing the system; we now need to solve these equations for particular initial conditions and determine the actual motion of this system.
Introducing the action principle
The Lagrangian description of a mechanical system is rather different: First, we do not ask for the evolution of the system given some initial conditions, but instead assume that the position of the system at two different time moments and is known and fixed. For convenience, let us collect all coordinates (such as or above) into one array of "generalized coordinates" and denote them by . So the "boundary conditions" that we impose on the system are and , where are fixed numbers. We now ask: how does the system move between the time moments and . The Lagrangian description answers: during that time, the system must move in such a way as to give the minimum value to the integral , where is a known function called the Lagrange function or Lagrangian. For example, the Lagrangian for a free mass point is
The Lagrangian for the above example with two masses attached to the wall is
For instance, according to the Lagrangian description, the free point mass moves in such a way that the functions give the minimum value to the integral , where the values of at times are fixed.
In principle, to find the minimum value of the integral one would have to evaluate that integral for each possible trajectory and then choose the "optimal" trajectory for which this integral has the smallest value. (Of course, we shall learn and use a much more efficient mathematical approach to determine this "optimal" trajectory instead of trying every possible set of functions .) The value of the mentioned integral is called the action corresponding to a particular trajectory . Therefore the requirement that the integral should have the smallest value is often called "the principle of least action" or just action principle.
At this point, we need to answer the pressing question:
- How can it be that the correct trajectory is found not by considering the forces but by requiring that some integral should have the minimum value? How does each point mass "know" that it needs to minimize some integral when it moves around?
The short answer is that the least action requirement is mathematically equivalent to the consideration of forces if the Lagrangian is chosen correctly. The condition that some integral has the minimum value (when the integral is correctly chosen) is mathematically the same as the Newtonian equations for the acceleration. The point masses perhaps "know" nothing about this integral. It is simply mathematically convenient to formulate the mechanical laws in one sentence rather than in many sentences. (We shall see another, more intuitive explanation below.)
Suppose that we understand how the requirement that an integral has the minimum value can be translated into equations for the acceleration. Obviously the form of the integral needs to be different for each mechanical system since the equations of motion are different. Then the second question presents itself:
- How can we find the Lagrange function corresponding to each mechanical system?
This is a more complicated problem and one needs to study many examples to gain a command of this approach. (In brief: the Lagrange function is the kinetic energy minus the potential energy.)
Before considering Lagrange functions, we shall look at how the mathematical requirement of "least action" can be equivalent to equations of motion such as given in the examples above.
Variation of a functional
A function is a map from numbers into numbers; a functional is a map from functions into numbers. An application of a functional to a function is usually denoted by square brackets, e.g. .
Random examples of functionals, just to illustrate the concept:
In principle, a functional can be anything that assigns a number to any function. In practice, only some functionals are interesting and have applications in physics.
Since the action integral maps trajectories into numbers, we can call it the action functional. The action principle is formulated as follows: the trajectory must be such that the action functional evaluated on this trajectory has the minimum value among all trajectories.
This may appear to be similar to the familiar condition for the mechanical equilibrium: the coordinates are such that the potential energy has the minimum value. However, there is a crucial difference: when we minimize the potential energy, we vary the three numbers until we find the minimum value; but when we minimize a functional, we have to vary the whole function until we find the minimum value of the functional.
The branch of mathematics known as calculus of variations studies the problem of minimizing (maximizing, extremizing) functionals. One needs to learn a little bit of variational calculus at this point. Let us begin by solving some easy minimization problems involving functions of many variables; this will prepare us for dealing with functionals which can be thought of as functions of infinitely many variables. You should try the examples yourself before looking at the solutions.
Example 1: Minimize the function with respect to .
Solution: Compute the partial derivatives of with respect to . These derivatives must both be equal to zero. This can only happen if .
Example 2: Minimize the function with respect to all .
Solution: Compute the partial derivatives of with respect to all , where . These derivatives must all be equal to zero. This can only happen if all .
Example 3: Minimize the function with respect to all subject to the restrictions .
Solution: Compute the partial derivatives of with respect to , where . These derivatives must all be equal to zero. This can only happen if for . The values are known, therefore we find .
Let us now consider the problem of minimizing the functional with respect to all functions subject to the restrictions . We shall first perform the minimization in a more intuitive but approximate way, and then we shall see how the same task is handled more elegantly by the variational calculus.
Let us imagine that we are trying to minimize the integral with respect to all functions using a digital computer. The first problem is that we cannot represent "all functions" on a computer because we can only store finitely many values in an array within the computer memory. So we split the time interval into a large number of discrete steps , where the step size is small; in other words, . We can describe the function by its values at the points , assuming that the function is a straight line between these points. The time moments will be kept fixed, and then the various values will correspond to various possible functions . (In this way we definitely will not describe all possible functions , but the class of functions we do describe is broad enough so that we get the correct results in the limit . Basically, any function can be sufficiently well approximated by one of these "piecewise-linear" functions when the step size is small enough.)
Since we have discretized the time and reduced our attention to piecewise-linear functions, we have
within each interval . So we can express the integral as the finite sum,
where we have defined for convenience .
At this point we can perform the minimization of quite easily. The functional is now a function of variables , i.e. , so the minimum is achieved at the values where the derivatives of with respect to each are zero. This problem is now quite similar to the Example 3 above, so the solution is . Now we recall that is the value of the unknown function at the point . Therefore the minimum of the functional is found at the values such that would correspond to the function . As we increase the number of intervals, we still obtain the same function , therefore the same function is obtained in the limit . We conclude that the function minimizes the functional with the restrictions .
The above calculation has the advantage of being more intuitive and visual: it makes clear that minimization of a functional with respect to a function is quite similar to the minimization of a function with respect to a large number of variables in the limit of infinitely many such variables. However, the formalism of variational calculus provides a much more efficient computational procedure. Here is how one calculates the function that minimizes .
Let us consider a very small change in the function and see how the functional changes:
(In many textbooks, the change in is denoted by , and generally the change of any quantity is denoted by . We chose to write instead of for clarity.)
The functional is called the variation of the functional with respect to the change in the function . The variation is itself a functional depending on two functions, and . When is very small, we expect that the variation will be linear in , just like the variation in the value of a normal function is linear in the amount of change in the argument, e.g. for small . So we expect that the variation of the functional will be a linear functional of . To understand what a linear functional looks like, consider a linear function depending on several variables , . This function can always be written as
where are suitable constants. Since a functional is like a function of infinitely many variables, the index becomes a continuous variable , the variables and the constants become functions , while the sum over becomes an integral over . Thus, a linear functional of can be written as an integral,
where is a suitable function. In the case of the usual function , the "suitable constant " is the derivative . By analogy we call above the variational derivative of the functional and denote it by .
A function has a minimum (or maximum, or extremum) at a point where its derivative vanishes. So a functional has a minimum (or maximum, or extremum) at the function where the functional derivative vanishes. We shall justify this statement below, and for now let us now compute the functional derivative of the functional .
Substituting instead of into the functional, we get
where we are going to neglect terms quadratic in and so we didn't write them out. We now need to rewrite this integral so that no derivatives of appear there; so we integrate by parts and find
Since in our case the values are fixed, the function must be such that , so the boundary terms vanish. The variational derivative is therefore
The functional has an extremum when its variation under an arbitrary change is second-order in . However, above we have obtained the variation as a first-order quantity, linear in ; so this first-order quantity must vanish for where the functional has an extremum. An integral such as can vanish for arbitrary only if the function vanishes for all . In our case, the "function ," i.e. the variational derivative , is equal to . Therefore the function on which the functional has an extremum must satisfy or more simply . This differential equation has the general solution , and with the additional restrictions we immediately get the solution .
To summarize: the requirement that the functional must have an extremum at the function leads to a differential equation on the unknown function . This differential equation is found as
The procedure is quite similar to finding on extremum of a function , where the point of the extremum is found from the equation .
Suppose that we are now asked to minimize the functional subject to the restrictions ; in mechanics we shall mostly be dealing with functionals of this kind. We might try to discretize the function , as we did above, but this is difficult. Moreover, for a different functional everything will have to be computed anew. Rather than go through the above procedure again and again, let us now derive the formula for the functional derivative for all functionals of this form, namely
where is a given function of the coordinates and velocities (assuming that there are coordinates, so ). This function is called the Lagrange function or simply the Lagrangian.
We introduce the infinitesimal changes into the functions and express the variation of the functional first through and ,
Then we integrate by parts, discard the boundary terms and obtain
Thus the variational derivatives can be written as
- Further reading: Notes on functionals by B. Svetitsky
Consider again the condition for a functional to have an extremum at : the first-order variation must vanish. We have derived the above formula for the variation . Since all are completely arbitrary (subject only to the boundary conditions ), the first-order variation vanishes only if the functions in square brackets all vanish at all . Therefore we obtain the Euler-Lagrange equations
These are the differential equations that express the mathematical requirement that the functional has an extremum at the set of functions . There are as many equations as unknown functions , one equation for each .
Note that the Euler-Lagrange equations involve partial derivatives of the Lagrangian with respect to coordinates and velocities. The derivatives with respect to velocities are sometimes written as which might at first sight appear confusing. However, all that is meant by this notation is the derivative of the function with respect to its second argument.
The Euler-Lagrange equations also involve the derivative with respect to the time. This is not a partial derivative with respect to but a total derivative. In other words, to compute , we need to substitute the functions and into the expression , thus obtain a function of time only, and then take the derivative of this function with respect to time.
Remark: If the Lagrangian contains higher derivatives (e.g. the second derivative), the Euler-Lagrange formula is different. For example, if the Lagrangian is , then the Euler-Lagrange equation is
Note that this equation may be up to fourth-order in time derivatives! Usually, one does not encounter such Lagrangians in studies of classical mechanics because ordinary systems are described by Lagrangians containing only first-order derivatives.
Summary: In mechanics, one specifies a system by writing a Lagrangian and pointing out the unknown functions in it. From that, one derives the equations of motion using the Euler-Lagrange formula. You need to know that formula really well and to understand how to apply it. This comes only with practice.
How to choose the Lagrangian
The basic rule is that the Lagrangian is equal to the kinetic energy minus the potential energy. (Both should be measured in an inertial system of reference! In a non-inertial system, this rule may fail.)
It can be shown that this rule works for an arbitrary mechanical system made up of point masses, springs, ropes, frictionless rails, etc., regardless of how one introduces the generalized coordinates. We shall not study the proof of this statement, but instead go directly to the examples of Lagrangians for various systems.
Examples of Lagrangians
- The Lagrangian for a free point mass moving along a straight line with coordinate :
- A point mass moving along a straight line with coordinate , in a force field with potential energy :
- A point mass moving in three-dimensional space with coordinates , in a force field with potential energy :
- A point mass constrained to move along the circle in the gravitational field near the Earth (the axis is vertical). It is convenient to introduce the angle as the coordinate, with . Then the potential energy is , while the kinetic energy is . So the Lagrangian is
Note that we have written the Lagrangian (and therefore we can derive the equations of motion) without knowing the force needed to keep the mass moving along the circle. This shows the great conceptual advantage of the Lagrangian approach; in the traditional Newtonian approach, the first step would be to determine this force, which is initially unknown, from a system of equations involving an unknown acceleration of the point mass.
- Two (equal) point masses connected by a spring with length :
- A mathematical pendulum, i.e. a massless rigid stick of length with a point mass attached at the end, that can move only in the plane in the gravitational field near the Earth (vertical axis). As the coordinate, we choose the angle between the stick and the axis. The Lagrangian is
- A point mass sliding without friction along an inclined plane that makes an angle with the horizontal, in the gravitational field of the Earth. As the coordinate, we choose , where is parallel to the incline. The height is then , so the potential energy is . The kinetic energy is computed as
Hence, the Lagrangian is
Exercise: You should now determine the Euler-Lagrange equations that follow from each the above Lagrangians and verify that these equations are the same as would be obtained from school-level Newtonian considerations for the respective physical systems. This should occupy you for at most an hour or two. Only then you will begin to appreciate the power of the Lagrangian approach.
For more examples of setting up Lagrangians for mechanical systems and for deriving the Euler-Lagrange equations, ask your physics teacher or look up in any theoretical mechanics problem book. Much of the time, the Euler-Lagrange equations for some complicated system (say, a pendulum attached to the endpoint of another pendulum) would be too difficult to solve, but the point is to gain experience deriving them. Their derivation would be much less straightforward in the old Newtonian approach using forces.
If this is your first time looking at Lagrangians, you might be still asking yourself: how could the motion of a system be described by saying that some integral has the minimal value? Is it a purely formal mathematical trick, and if not, how can one get a more visually intuitive understanding? A partial answer is here.