# Calculus/Inverse function theorem, implicit function theorem

 ← The chain rule and Clairaut's theorem Calculus Vector calculus → Inverse function theorem, implicit function theorem

In this chapter, we want to prove the inverse function theorem (which asserts that if a function has invertible differential at a point, then it is locally invertible itself) and the implicit function theorem (which asserts that certain sets are the graphs of functions).

## Banach's fixed point theorem

Theorem:

Let ${\displaystyle (M,d)}$ be a complete metric space, and let ${\displaystyle f:M\to M}$ be a strict contraction; that is, there exists a constant ${\displaystyle 0\leq \lambda <1}$ such that

${\displaystyle \forall m,n\in M:d(f(m),f(n))\leq \lambda d(m,n)}$.

Then ${\displaystyle f}$ has a unique fixed point, which means that there is a unique ${\displaystyle x\in M}$ such that ${\displaystyle f(x)=x}$. Furthermore, if we start with a completely arbitrary point ${\displaystyle y\in M}$, then the sequence

${\displaystyle y,f(y),f(f(y)),f(f(f(y))),\ldots }$

converges to ${\displaystyle x}$.

Proof:

First, we prove uniqueness of the fixed point. Assume ${\displaystyle x,y}$ are both fixed points. Then

${\displaystyle d(x,y)=d(f(x),f(y))\leq \lambda d(x,y)\Rightarrow (1-\lambda )d(x,y)=0}$.

Since ${\displaystyle 0\leq \lambda <1}$, this implies ${\displaystyle d(x,y)=0\Rightarrow x=y}$.

Now we prove existence and simultaneously the claim about the convergence of the sequence ${\displaystyle y,f(y),f(f(y)),f(f(f(y))),\ldots }$. For notation, we thus set ${\displaystyle z_{0}:=y}$ and if ${\displaystyle z_{n}}$ is already defined, we set ${\displaystyle z_{n+1}=f(z_{n})}$. Then the sequence ${\displaystyle (z_{n})_{n\in \mathbb {N} }}$ is nothing else but the sequence ${\displaystyle y,f(y),f(f(y)),f(f(f(y))),\ldots }$.

Let ${\displaystyle n\geq 0}$. We claim that

${\displaystyle d(z_{n+1},z_{n})\leq \lambda ^{n}d(z_{1},z_{0})}$.

Indeed, this follows by induction on ${\displaystyle n}$. The case ${\displaystyle n=0}$ is trivial, and if the claim is true for ${\displaystyle n}$, then ${\displaystyle d(z_{n+2},z_{n+1})=d(f(z_{n+1}),f(z_{n}))\leq \lambda d(z_{n+1},z_{n})\leq \lambda \cdot \lambda ^{n}d(z_{1},z_{0})}$.

Hence, by the triangle inequality,

{\displaystyle {\begin{aligned}d(z_{n+m},z_{n})&\leq \sum _{j=n+1}^{n+m}d(z_{j},z_{j-1})\\&\leq \sum _{j=n+1}^{n+m}\lambda ^{j-1}d(z_{1},z_{0})\\&\leq \sum _{j=n+1}^{\infty }\lambda ^{j-1}d(z_{1},z_{0})\\&=d(z_{1},z_{0})\lambda ^{n}{\frac {1}{1-\lambda }}\end{aligned}}}.

The latter expression goes to zero as ${\displaystyle n\to \infty }$ and hence we are dealing with a Cauchy sequence. As we are in a complete metric space, it converges to a limit ${\displaystyle x}$. This limit further is a fixed point, as the continuity of ${\displaystyle f}$ (${\displaystyle f}$ is Lipschitz continuous with constant ${\displaystyle \lambda }$) implies

${\displaystyle x=\lim _{n\to \infty }z_{n}=\lim _{n\to \infty }f(z_{n-1})=f(\lim _{n\to \infty }z_{n-1})=f(x)}$.${\displaystyle \Box }$

A corollary to this important result is the following lemma, which shall be the main ingredient for the proof of the inverse function theorem:

Lemma:

Let ${\displaystyle g:{\overline {B_{r}(0)}}\to {\overline {B_{r}(0)}}}$ (${\displaystyle {\overline {B_{r}(0)}}\subset \mathbb {R} ^{n}}$ denoting the closed ball of radius ${\displaystyle r}$) be a function which is Lipschitz continuous with Lipschitz constant less or equal ${\displaystyle 1/2}$ such that ${\displaystyle g(0)=0}$. Then the function

${\displaystyle f:{\overline {B_{r}(0)}}\to \mathbb {R} ^{n},f(x):=g(x)+x}$

is injective and ${\displaystyle B_{r/2}(0)\subseteq f(B_{r}(0))}$.

Proof:

First, we note that for ${\displaystyle y\in B_{r/2}(0)}$ the function

${\displaystyle h:{\overline {B_{r}(0)}}\to \mathbb {R} ^{n},h(z):=y-g(z)}$

is a strict contraction; this is due to

${\displaystyle \|y-g(z)-(y-g(z'))\|=\|g(z')-g(z)\|\leq {\frac {1}{2}}\|z-z'\|}$.

Furthermore, it maps ${\displaystyle {\overline {B_{r}(0)}}}$ to itself, since for ${\displaystyle z\in {\overline {B_{r}(0)}}}$

${\displaystyle \|y-g(z)\|\leq \|y\|+\|g(z-0)\|\leq {\frac {r}{2}}+{\frac {1}{2}}\|z\|\leq r}$.

Hence, the Banach fixed-point theorem is applicable to ${\displaystyle h}$. Now ${\displaystyle x}$ being a fixed point of ${\displaystyle h}$ is equivalent to

${\displaystyle f(x)=y}$,

and thus ${\displaystyle B_{r/2}(0)\subseteq f(B_{r}(0))}$ follows from the existence of fixed points. Furthermore, if ${\displaystyle f(x)=f(x')}$, then

${\displaystyle {\frac {1}{2}}\|x-x'\|\geq \|g(x)-g(x')\|=\|f(x)-x-(f(x')-x')\|=\|x-x'\|}$

and hence ${\displaystyle x=x'}$. Thus injectivity.${\displaystyle \Box }$

## The inverse function theorem

Theorem:

Let ${\displaystyle f:\mathbb {R} ^{n}\to \mathbb {R} ^{n}}$ be a function which is continuously differentiable in a neighbourhood ${\displaystyle x_{0}\in \mathbb {R} ^{n}}$ such that ${\displaystyle f'(x_{0})}$ is invertible. Then there exists an open set ${\displaystyle U\subseteq \mathbb {R} ^{n}}$ with ${\displaystyle x_{0}\in U}$ such that ${\displaystyle f|_{U}}$ is a bijective function with an inverse ${\displaystyle f^{-1}:f(U)\to U}$ which is differentiable at ${\displaystyle x_{0}}$ and satisfies

${\displaystyle (f^{-1})'(x_{0})=(f'(x_{0}))^{-1}}$.

Proof:

We first reduce to the case ${\displaystyle f(x_{0})=0}$, ${\displaystyle x_{0}=0}$ and ${\displaystyle f'(x_{0})={\text{Id}}}$. Indeed, suppose for all those functions the theorem holds, and let now ${\displaystyle h}$ be an arbitrary function satisfying the requirements of the theorem (where the differentiability is given at ${\displaystyle x_{0}}$). We set

${\displaystyle {\tilde {h}}(x):=h'(x_{0})^{-1}(h(x_{0}-x)-h(x_{0}))}$

and obtain that ${\displaystyle {\tilde {h}}}$ is differentiable at ${\displaystyle 0}$ with differential ${\displaystyle {\text{Id}}}$ and ${\displaystyle {\tilde {h}}(0)=0}$; the first property follows since we multiply both the function and the linear-affine approximation by ${\displaystyle h'(x_{0})^{-1}}$ and only shift the function, and the second one is seen from inserting ${\displaystyle x=0}$. Hence, we obtain an inverse of ${\displaystyle {\tilde {h}}}$ with it's differential at ${\displaystyle {\tilde {h}}(0)=0}$, and if we now set

${\displaystyle h^{-1}(y):=({\tilde {h}}^{-1}(h'(x_{0})^{-1}(y-h(x_{0})))-x_{0})}$,

it can be seen that ${\displaystyle h^{-1}}$ is an inverse of ${\displaystyle h}$ with all the required properties (which is a bit of a tedious exercise, but involves nothing more than the definitions).

Thus let ${\displaystyle f}$ be a function such that ${\displaystyle f(0)=0}$, ${\displaystyle f}$ is invertible at ${\displaystyle 0}$ and ${\displaystyle f'(0)={\text{Id}}}$. We define

${\displaystyle g(x):=f(x)-x}$.

The differential of this function is zero (since taking the differential is linear and the differential of the function ${\displaystyle x\mapsto x}$ is the identity). Since the function ${\displaystyle g}$ is also continuously differentiable at a small neighbourhood of ${\displaystyle 0}$, we find ${\displaystyle \delta >0}$ such that

${\displaystyle {\frac {\partial g}{\partial x_{j}}}(x)<{\frac {1}{2n^{2}}}}$

for all ${\displaystyle j\in \{1,\ldots ,n\}}$ and ${\displaystyle x\in B_{\delta }(0)}$. Since further ${\displaystyle g(0)=f(0)-0=0}$, the general mean-value theorem and Cauchy's inequality imply that for ${\displaystyle k\in \{1,\ldots ,n\}}$ and ${\displaystyle x\in B_{\delta }(0)}$,

${\displaystyle |g_{k}(x)|=|\langle x,{\frac {\partial g}{\partial x_{j}}}(t_{k}x)\rangle |\leq \|x\|n{\frac {1}{2n^{2}}}}$

for suitable ${\displaystyle t_{k}\in [0,1]}$. Hence,

${\displaystyle \|g(x)\|\leq |g_{1}(x)|+\cdots +|g_{n}(x)|\leq {\frac {1}{2}}\|x\|}$ (triangle inequality),

and thus, we obtain that our preparatory lemma is applicable, and ${\displaystyle f}$ is a bijection on ${\displaystyle {\overline {B_{\delta }(0)}}}$, whose image is contained within the open set ${\displaystyle {\overline {B_{\delta /2}(0)}}}$; thus we may pick ${\displaystyle U:=f^{-1}(B_{\delta /2}(0))}$, which is open due to the continuity of ${\displaystyle f}$.

Thus, the most important part of the theorem is already done. All that is left to do is to prove differentiability of ${\displaystyle f^{-1}}$ at ${\displaystyle 0}$. Now we even prove the slightly stronger claim that the differential of ${\displaystyle f^{-1}}$ at ${\displaystyle x_{0}}$ is given by the identity, although this would also follow from the chain rule once differentiability is proven.

Note now that the contraction identity for ${\displaystyle g}$ implies the following bounds on ${\displaystyle f}$:

${\displaystyle {\frac {1}{2}}\|x\|\leq \|f(x)\|\leq {\frac {3}{2}}\|x\|}$.

The second bound follows from

${\displaystyle \|f(x)\|\leq \|f(x)-x\|+\|x\|=\|g(x)\|+\|x\|\leq {\frac {3}{2}}\|x\|}$,

and the first bound follows from

${\displaystyle \|f(x)\|\geq |\|f(x)-x\|-\|x\||=\left|\|g(x)\|-\|x\|\right|\geq {\frac {1}{2}}\|x\|}$.

Now for the differentiability at ${\displaystyle 0}$. We have, by subsitution of limits (as ${\displaystyle f}$ is continuous and ${\displaystyle f(0)=0}$):

{\displaystyle {\begin{aligned}\lim _{\mathbf {h} \to 0}{\frac {\|f^{-1}(\mathbf {h} )-f^{-1}(0)-\operatorname {Id} (\mathbf {h} -0)\|}{\|\mathbf {h} \|}}&=\lim _{\mathbf {h} \to 0}{\frac {\|f^{-1}(f(\mathbf {h} ))-f(\mathbf {h} )\|}{\|f(\mathbf {h} )\|}}\\&=\lim _{\mathbf {h} \to 0}{\frac {\|\mathbf {h} -f(\mathbf {h} )\|}{\|f(\mathbf {h} )\|}},\end{aligned}}}

where the last expression converges to zero due to the differentiability of ${\displaystyle f}$ at ${\displaystyle 0}$ with differential the identity, and the sandwhich criterion applied to the expressions

${\displaystyle {\frac {\|\mathbf {h} -f(\mathbf {h} )\|}{{\frac {3}{2}}\|\mathbf {h} \|}}}$

and

${\displaystyle {\frac {\|\mathbf {h} -f(\mathbf {h} )\|}{{\frac {1}{2}}\|\mathbf {h} \|}}}$.${\displaystyle \Box }$

## The implicit function theorem

Theorem:

Let ${\displaystyle f:\mathbb {R} ^{n}\to \mathbb {R} }$ be a continuously differentiable function, and consider the set

${\displaystyle S:=\{(x_{1},\ldots ,x_{n})\in \mathbb {R} ^{n}|f(x_{1},\ldots ,x_{n})=0\}}$.

If we are given some ${\displaystyle y\in S}$ such that ${\displaystyle \partial _{n}f(y)\neq 0}$, then we find ${\displaystyle U\subseteq \mathbb {R} ^{n-1}}$ open with ${\displaystyle (y_{1},\ldots ,y_{n-1})\in U}$ and ${\displaystyle g:U\to S}$ such that

${\displaystyle y=g(y_{1},\ldots ,y_{n-1})}$ and ${\displaystyle \{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}\subseteq S}$,

where ${\displaystyle \{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}}$ is open with respect to the subspace topology of ${\displaystyle U}$.

Furthermore, ${\displaystyle g}$ is a differentiable function.

Proof:

We define a new function

${\displaystyle F:\mathbb {R} ^{n}\to \mathbb {R} ^{n},F(x_{1},\ldots ,x_{n}):=(x_{1},\ldots ,x_{n-1},f(x_{1},\ldots ,x_{n}))}$.

The differential of this function looks like this:

${\displaystyle F'(x)={\begin{pmatrix}1&0&\cdots &&0\\0&1&&&\vdots \\\vdots &&\ddots &&\\0&\cdots &0&1&0\\\partial _{1}f(x)&&\cdots &&\partial _{n}f(x)\end{pmatrix}}}$

Since we assumed that ${\displaystyle \partial _{n}f(y)\neq 0}$, ${\displaystyle F'(y)}$ is invertible, and hence the inverse function theorem implies the existence of a small open neighbourhood ${\displaystyle {\tilde {V}}\subseteq \mathbb {R} ^{n}}$ containing ${\displaystyle y}$ such that restricted to that neighbourhood ${\displaystyle F}$ is itself invertible, with a differentiable inverse ${\displaystyle F^{-1}}$, which is itself defined on an open set ${\displaystyle {\tilde {U}}}$ containing ${\displaystyle F(y)}$. Now set first

${\displaystyle U:=\{(x_{1},\ldots ,x_{n-1})|(x_{1},\ldots ,x_{n-1},0)\in {\tilde {U}}\}}$,

which is open with respect to the subspace topology of ${\displaystyle \mathbb {R} ^{n-1}}$, and then

${\displaystyle g:U\to \mathbb {R} ,g(x_{1},\ldots ,x_{n-1}):=\pi _{n}(F^{-1}(x_{1},\ldots ,x_{n-1},0))}$,

the ${\displaystyle n}$-th component of ${\displaystyle F^{-1}(x_{1},\ldots ,x_{n-1},0)}$. We claim that ${\displaystyle g}$ has the desired properties.

Indeed, we first note that ${\displaystyle F^{-1}(x_{1},\ldots ,x_{n-1},0)=(x_{1},\ldots ,x_{n-1},g(x_{1},\ldots ,x_{n-1}))}$, since applying ${\displaystyle F}$ leaves the first ${\displaystyle n-1}$ components unchanged, and thus we get the identity by observing ${\displaystyle F(F^{-1}(x))=x}$. Let thus ${\displaystyle (z_{1},\ldots ,z_{n-1})\in U}$. Then

{\displaystyle {\begin{aligned}f(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))&=(\pi _{n}\circ F)(F^{-1}(z_{1},\ldots ,z_{n-1},0))\\&=\pi _{n}((F\circ F^{-1})(z_{1},\ldots ,z_{n-1},0))=0\end{aligned}}}.

Furthermore, the set

${\displaystyle \{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}}$

is open with respect to the subspace topology on ${\displaystyle S}$. Indeed, we show

${\displaystyle \{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}=S\cap {\tilde {V}}}$.

For ${\displaystyle \subseteq }$, we first note that the set on the left hand side is in ${\displaystyle S}$, since all points in it are mapped to zero by ${\displaystyle f}$. Further,

${\displaystyle F(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))=(z_{1},\ldots ,z_{n-1},0)\in {\tilde {U}}}$

and hence ${\displaystyle \subseteq }$ is completed when applying ${\displaystyle F^{-1}}$. For the other direction, let a point ${\displaystyle (x_{1},\ldots ,x_{n})}$ in ${\displaystyle S\cap {\tilde {V}}}$ be given, apply ${\displaystyle F}$ to get

${\displaystyle F((x_{1},\ldots ,x_{n}))=(x_{1},\ldots ,x_{n-1},0)\in {\tilde {U}}}$

and hence ${\displaystyle (x_{1},\ldots ,x_{n-1})\in U}$; further

${\displaystyle (x_{1},\ldots ,x_{n-1},g(x_{1},\ldots ,x_{n-1}))=(x_{1},\ldots ,x_{n})}$

by applying ${\displaystyle F}$ to both sides of the equation.

Now ${\displaystyle g}$ is automatically differentiable as the component of a differentiable function.${\displaystyle \Box }$

Informally, the above theorem states that given a set ${\displaystyle \{x\in \mathbb {R} ^{n}|f(x)=0\}}$, one can choose the first ${\displaystyle n-1}$ coordinates as a "base" for a function, whose graph is precisely a local bit of that set.