Classical Mechanics/Constrained

What are constrained systems?

In many cases in mechanics, the motion of bodies is constrained in some way: for example, a massive bead may be constrained to move along a bent wire of certain shape; a massive cylinder may be rolling along a surface (but not sliding or flying around); or two masses may be connected by a rigid stick of fixed length.

In each of these cases, there are forces acting on the constrained bodies. In the above examples, the wire produces a force on the bead, the plane acts by the force of friction on the cylinder, and the stick pulls or pushes on the two masses. These forces may vary in time and we do not know the magnitude of these forces in advance. We know, however, that these forces are at every time exactly such as to guarantee that the constraints hold. The bead would fly away if there were no forces acting on it, but the wire provides a force that keeps the bead in place. The two masses connected by a rigid stick experience a force from the stick that is exactly necessary to keep them at a constant distance from each other. (This is what it means that the stick is "rigid".)

In the Newtonian approach to mechanics, these systems are treated by introducing variables $F_{1},F_{2},...$ representing the unknown forces, and solving the system of equations for the unknown forces and accelerations. This procedure might be complicated; moreover, we are not always interested in the magnitudes of these unknown forces.

In the Lagrangian approach, there are two straightforward ways to treat constrained systems:

The method of solving the constraints. In this method, we introduce the generalized coordinates in such a way that the constraints are automatically satisfied. For example, suppose a point mass is constrained to move along a circle of radius $R$ . We might describe this situation by saying that the Cartesian coordinates $x,y$ are constrained to satisfy the equation $x^{2}+y^{2}=R^{2}$ . Now we can introduce the angle $\phi$ as the "generalized coordinate" and express the Cartesian coordinates of the point mass as $x=R\cos \phi ,y=R\sin \phi$ . These coordinates solve the constraint for all $\phi$ . The power of the Lagrangian approach is that any generalized coordinates are good enough; so we can now directly write the Lagrangian in terms of the function $\phi (t)$ and forget about the fact that the system is constrained. We shall automatically obtain the correct equations of motion.
The method of Lagrange multipliers. In this method, we do not try to introduce new generalized coordinates that solve the constraints. (This may be difficult; not all algebraic equations can be solved!) Instead, we formulate the variational problem in the presence of constraints: the correct trajectory $q_{i}(t)$ is such that the action functional has an extremum while the constraint equations are satisfied. For example, if a point mass is constrained to move along a circle of radius $R$ , but is otherwise unforced, then the Lagrangian is $L={\frac {m}{2}}({\dot {x}}^{2}+{\dot {y}}^{2})$ and the constraint is $x^{2}+y^{2}=R^{2}$ . The correct trajectory $x(t),y(t)$ will be such that the integral $\int Ldt$ has the minimum value while the constraint holds at every $t$ . Thus we need to solve the problem of conditional minimization.

Conditional minimization and Lagrange multipliers

A conditional minimization problem can be solved by the method of Lagrange multipliers. One takes a different, specially modified Lagrangian that describes the fact that the system is constrained. The modified Lagrangian is equal to the normal Lagrangian plus special terms containing Lagrange multipliers. Let us now explain this method.

For simplicity, consider the minimization of a function $F(x,y)$ with respect to variables $x,y$ , subject to the constraint that $G(x,y)=0$ .

First recall how the problem would be solved without the constraint: the minimum (or, more generally, an extremum) of $F$ would be the point $(x,y)$ where the partial derivatives of $F(x,y)$ vanish:

{\frac {\partial F(x,y)}{\partial x}}=0,\quad {\frac {\partial F(x,y)}{\partial y}}=0.

This would give a system of two equations that determines the two unknowns $x_{*},y_{*}$ .

With the constraint, the above system of equations will not give the correct answer because the solution $x_{*},y_{*}$ most probably will not satisfy the constraint: $G(x_{*},y_{*})\neq 0$ . Let us look at the problem geometrically. The constraint $G(x,y)=0$ determines a curve or several curves in the $(x,y)$ plane; we are looking for the point on that curve where the function $F$ has an extremum. Let us imagine the level lines of the function $F$ , i.e. the lines $F(x,y)=A$ for various values of the constant $A$ . The constraint curve $G(x,y)=0$ may go across the level lines of $F$ ; it means that the value of $F$ changes along the curve. It is clear from this geometric consideration that the extremum of $F$ along the constraint curve will be the point where the constraint curve is tangent to some level line of $F$ . A condition for two curves to be tangent is that their normal vectors are parallel. The normal vector to a curve $G(x,y)=0$ at a point $(x,y)$ has components $(\partial G(x,y)/\partial x,\partial G(x,y)/\partial y)$ . The normal vector to the surface line $F(x,y)=A$ has components $(\partial F(x,y)/\partial x,\partial F(x,y)/\partial y)$ . These two vectors are parallel if there exists a number $\lambda$ such that

{\frac {\partial F(x,y)}{\partial x}}=\lambda {\frac {\partial G(x,y)}{\partial x}},\quad {\frac {\partial F(x,y)}{\partial y}}=\lambda {\frac {\partial G(x,y)}{\partial y}}.

This is, together with the constraint $G(x,y)=0$ , a system of three equations that determines the three unknowns $x,y,\lambda$ . In this way we can solve the conditional minimization problem.

Note that the equations are the same as for minimization of the function $K(x,y,\lambda )\equiv F(x,y)-\lambda G(x,y)$ with respect to the three variables $x,y,\lambda$ without any constraints. Therefore, the conditional minimization problem is equivalent to a normal minimization problem for the different function, $K(x,y,\lambda )$ . This new function is built by adding the original function $F$ and the constraint $G$ multiplied by an extra variable $\lambda$ . This variable is called the Lagrange multiplier.

Example of using Lagrange multipliers

Here is a worked example. Suppose we need to maximize the function $F(x,y)=5x+12y$ under the constraint $x^{2}+y^{2}=1$ .

First, we write the constraint in the form $G(x,y)=0$ . For instance, we may take $G(x,y)=1-x^{2}-y^{2}$ . (It does not matter how we choose the function $G$ , as long as the constraint is equivalent to the equation $G(x,y)=0$ . Then we make a new function

K(x,y,\lambda )=F(x,y)-\lambda G(x,y)=5x+12y+\lambda (x^{2}+y^{2}-1).

Then we need to minimize this function with respect to the three variables $x,y,\lambda$ . We obtain the system of equations:

5+2\lambda x=0,\quad 12+2\lambda y=0,\quad x^{2}+y^{2}-1=0.

It is easy to solve these equations:

x={\frac {5}{13}},\quad y={\frac {12}{13}},\quad \lambda =-{\frac {13}{2}}.

These are the required values of $x$ and $y$ . The value of the Lagrange multiplier $\lambda$ is useless for us now (but it will be useful when we apply this method to problems in mechanics!).

General case

The general form of the constrained optimization problem is the following. We need to find an extremum (or all extrema) of a given function $F(x_{1},...,x_{n})$ , where ${\boldsymbol {x}}\equiv \{x_{1},...,x_{n}\}$ is an array of variables satisfying $m$ different constraints $G_{1}({\boldsymbol {x}})=0,...,G_{m}({\boldsymbol {x}})=0$ .

The geometric consideration that I showed you for the simple case (the example with functions $F(x,y),G(x,y)$ above) can be generalized to many dimensions and many constraints: one considers level surfaces of $F$ and surfaces given by the constraints. A constrained extremum will be at a point ${\boldsymbol {x}}$ if the level surface of $F$ are tangent to the constraint surface at that point. The constraint surface is an intersection of $m$ different surfaces $G_{j}=0$ , each having its own normal vector ${\boldsymbol {n}}_{j}=\partial G_{j}/\partial {\boldsymbol {x}}$ . It can be shown using elementary vector algebra (I omit the proof) that the normal vector $\partial F/\partial {\boldsymbol {x}}$ to the level surface of $F$ must be a linear combination of the $m$ normal vectors ${\boldsymbol {n}}_{j}$ . Therefore, the conditions for the constrained extremum to be located at a point ${\boldsymbol {x}}$ are that (1) ${\boldsymbol {x}}$ must satisfy all the constraints and (2) that there should exist $m$ numbers $\lambda _{1},...,\lambda _{m}$ such that

{\frac {\partial F}{\partial {\boldsymbol {x}}}}=\lambda _{1}{\frac {\partial G_{1}}{\partial {\boldsymbol {x}}}}+...+\lambda _{m}{\frac {\partial G_{m}}{\partial {\boldsymbol {x}}}}

It is easy to see that these conditions are equivalent to the conditions for an extremum of a new function

K({\boldsymbol {x}},{\boldsymbol {\lambda }})\equiv F({\boldsymbol {x}})-\lambda _{1}G_{1}({\boldsymbol {x}})-...-\lambda _{m}G_{m}({\boldsymbol {x}}),

with respect to $(m+n)$ variables $x_{1},...,x_{n},\lambda _{1},...,\lambda _{n}$ , without any constraints.

Let us then formulate the recipe to solve the problem of constrained optimization. We introduce an array ${\boldsymbol {\lambda }}$ of $m$ different Lagrange multipliers ${\boldsymbol {\lambda }}\equiv \{\lambda _{1},...,\lambda _{m}\}$ and build a new function

K({\boldsymbol {x}},{\boldsymbol {\lambda }})\equiv F({\boldsymbol {x}})-\lambda _{1}G_{1}({\boldsymbol {x}})-...-\lambda _{m}G_{m}({\boldsymbol {x}}).

We then find an extremum of this function with respect to the total set of $(m+n)$ variables $x_{1},...,x_{n},\lambda _{1},...,\lambda _{m}$ . In order to do that, we need to solve a system of $(m+n)$ equations:

{\frac {\partial K}{\partial x_{1}}}=0,...,{\frac {\partial K}{\partial x_{n}}}=0,{\frac {\partial K}{\partial \lambda _{1}}}=0,...,{\frac {\partial K}{\partial \lambda _{n}}}=0.

By solving these equations, we will obtain a set of values $x_{1},...,x_{n}$ which we are interested in. The values of the auxiliary variables $\lambda _{j}$ can be discarded.

Motion constrained to a surface

Let us now consider the Constrained Mechanical Problem Number One: A point mass is moving in a potential $V({\boldsymbol {x}})$ and, additionally, is constrained to move along a surface given by an equation $G({\boldsymbol {x}})=0$ . (This can be physically realized by e.g. a mass point sliding without friction on top of a curved surface.)

According to the Lagrangian approach, we must find an extremum of the action, $S=\int L({\boldsymbol {x}},{\dot {\boldsymbol {x}}})dt$ , under the condition that $G({\boldsymbol {x}}(t))=0$ for all times $t$ . We can apply the method of Lagrange multipliers. Note that we have, in effect, infinitely many constraints---one constraint for each moment of time $t$ . Therefore, we need to introduce a set of infinitely many Lagrange multipliers, one Lagrange multiplier for each $t$ . It is convenient to arrange this set of Lagrange multipliers into a function $\lambda (t)$ .

According to the method of Lagrange multipliers, we need to build a "modified action" which is equal to old action minus the sum of all the constraints multiplied by their respective Lagrange multipliers. Therefore, the new action is

{\tilde {S}}[{\boldsymbol {x}}(t),\lambda (t)]=\int L({\boldsymbol {x}},{\dot {\boldsymbol {x}}})dt-\int \lambda (t)G({\boldsymbol {x}}(t))dt.

Solving the constrained optimization problem is equivalent to finding an extremum of the functional ${\tilde {S}}$ with respect to arbitrary ${\boldsymbol {x}}(t)$ and $\lambda (t)$ .

It should now be clear how to approach the "Constrained Problem Number One" in principle. What remains is some technical work:

Deriving the Euler-Lagrange equations from the modified action ${\tilde {S}}$ and solving them. This will be a system of equations for the unknown functions ${\boldsymbol {x}}(t)$ and $\lambda (t)$ .
Interpreting the function $\lambda (t)$ . It will turn out that $\lambda (t)$ is related to the force that is needed to keep the point mass moving only along the surface $G({\boldsymbol {x}})=0$ . So the Lagrange multipliers have a direct physical interpretation in this case. Namely, we shall show that the time-dependent force ${\boldsymbol {F}}(t)$ exerted by the surface is equal to $-\lambda (t)\partial G/\partial {\boldsymbol {x}}(t)$ .

Example

A massive bead is set on a wire curved in the vertical plane (coordinates $x,z$ ) as a plot of the function $z=Cx^{2}$ , where $C$ is a given constant. The only external force is the gravitational field of the Earth. We would like to determine the equation of motion for the position of the bead.

Choose the constraint function as $G(x,z)=z-Cx^{2}$ . Then the modified Lagrangian is

{\tilde {L}}={\frac {1}{2}}(m{\dot {x}}^{2}+m{\dot {z}}^{2})-mgz-\lambda (z-Cx^{2}).

The Euler-Lagrange equations are derived in the standard way:

Variation w.r.t. $x(t)$ gives: $m{\ddot {x}}(t)=\lambda (t)2Cx(t)$
Variation w.r.t. $z(t)$ gives: $m{\ddot {z}}(t)=-mg-\lambda (t)$
Variation w.r.t. $\lambda (t)$ gives: $0=z(t)-Cx^{2}(t)$

It is not easy to solve these equations by hand, but deriving them is straightforward and requires "no thinking", as physicists say. (That is, we simply follow general rules, and we do not need to make any special decisions or find special tricks for each particular situation.)

Exercise: There is an arbitrary choice in selecting the constraint function $G$ . For example, we could have selected $G(x,z)=2x^{2}-2z/C$ or even $G(x,y)=x-{\sqrt {z/C}}$ , and the constraint line $G(x,z)=0$ remains the same. Show that the equations of motion also remain the same, up to a change in the definition of $\lambda (t)$ .

Lagrange multipliers and constraining forces

We see from the equation for $z(t)$ in the above example that $-\lambda (t)$ looks like the component of the normal force in the $z$ direction. So it is clear that the Lagrange multiplier $\lambda$ is somehow related to the unknown constraining force. We shall now derive this relationship in a more general case.

Consider again the Constrained Problem Number One. We solved it by using the modified Lagrangian ${\tilde {L}}=L-\lambda (t)G({\boldsymbol {x}})$ . The term $-\lambda (t)G({\boldsymbol {x}})$ looks like an extra piece of potential energy, although it depends on time through the factor $\lambda (t)$ ). So this is a somewhat weird kind of potential energy, but let us examine it in more detail. As long as the point mass remains within the constraint surface, we have $G({\boldsymbol {x}})=0$ and this "extra potential energy" is equal to zero. But if the point mass could move a little bit off the constraint surface, say in a direction given by a small vector $\delta {\boldsymbol {x}}$ , then this "extra potential energy" would change by

\delta V=\delta {\boldsymbol {x}}\cdot {\frac {\partial (\lambda G)}{\partial {\boldsymbol {x}}}}.

This looks like work done by a force. The force is directed orthogonally to the constraint surface and is equal to ${\boldsymbol {F}}=-\lambda {\frac {\partial G}{\partial {\boldsymbol {x}}}}.$ But we expect precisely this kind of force to act on the mass point by the constraining device.

Let us verify more formally that ${\boldsymbol {F}}$ is in fact the constraining force we were looking for. The Euler-Lagrange equations that follow from the Lagrangian ${\tilde {L}}$ are of the form

"mass

\cdot

acceleration"

\equiv {\frac {d}{dt}}{\frac {\partial L}{\partial {\dot {\boldsymbol {x}}}}}={\frac {\partial L}{\partial {\boldsymbol {x}}}}-\lambda (t){\frac {\partial G}{\partial {\boldsymbol {x}}}}.

The term $\partial L/\partial {\boldsymbol {x}}$ describes the usual "free" forces due to potential energy in the original Lagrangian $L$ . Now it is clear that the term $-\lambda {\frac {\partial G}{\partial {\boldsymbol {x}}}}$ describes the additional forces due to constraints.

Exercise: There is, of course, an arbitrary choice in defining the constraint function $G$ . For instance, we may choose the constraint function as $17G$ or $G^{3}$ or some other function $f(G)$ instead of $G$ . Show that the constraining force does not depend on the choice of the function $G$ (because $\lambda$ would change appropriately for every different choice of $G$ !).

Exercise: Figure out how to compute the constraining force ${\boldsymbol {F}}$ if there are several independent constraints $G_{1}({\boldsymbol {x}})$ , ..., $G_{n}({\boldsymbol {x}})$ .

Constraints involving velocities

So far we considered only constraints that are expressed by functions of coordinates, such as $G(x,y)=0$ . This form of the constraints covers a wide range of applications. However, there exist important cases where physical constraints cannot be expressed in this way. For example, the motion of a massive ball that is rolling on a surface without sliding, or the motion of a skater who is sliding on ice, can be described only using complicated constraints that involve velocities and coordinates at the same time. Such constraints, which are not equivalent to a simple function of coordinates, are called nonintegrable or nonholonomic constraints, whereas the constraints of the type we considered are called integrable or holonomic.

One would think that nonholonomic constraints could be simply added to the Lagrangian with Lagrange multipliers. It turns out, however, that the result is not the correct equations of motion for the problem! The main reason is that velocities ${\dot {\boldsymbol {x}}}$ are not varied independently from the coordinates ${\boldsymbol {x}}$ , so the standard procedure involving the Lagrange multipliers is not the correct way to implement nonholonomic constraints. A special theory (based on the so-called Appell equation) was developed to derive the equations of motion for systems with nonholonomic constraints. However, this theory is beyond the scope of the minimal standard course.