# Calculus/Derivatives of multivariate functions

 ← Multivariable calculus Calculus The chain rule and Clairaut's theorem → Derivatives of multivariate functions

## The matrix of a linear transformation

Theorem

A linear transformation ${\displaystyle L:\mathbb {R} ^{n}\to \mathbb {R} ^{m}}$ amounts to multiplication by a uniquely defined matrix; that is, there exists a unique matrix ${\displaystyle A\in \mathbb {R} ^{m\times n}}$ such that

${\displaystyle \forall {\vec {v}}\in \mathbb {R} ^{n}:L({\vec {v}})=A{\vec {v}}}$
Proof

We set the column vectors

${\displaystyle {\begin{pmatrix}a_{1,j}\\a_{2,j}\\\vdots \\a_{n,j}\end{pmatrix}}:=L({\vec {e}}_{j})}$

where ${\displaystyle \{{\vec {e}}_{1},\ldots ,{\vec {e}}_{n}\}}$ is the standard basis of ${\displaystyle \mathbb {R} ^{n}}$ . Then we define from this

${\displaystyle A:={\begin{pmatrix}a_{1,1}&\cdots &a_{1,n}\\\vdots &\ddots &\vdots \\a_{n,1}&\cdots &a_{n,n}\end{pmatrix}}}$

and note that for any vector ${\displaystyle {\vec {v}}=(v_{1},\ldots ,v_{n})^{t}}$ of ${\displaystyle \mathbb {R} ^{n}}$ we obtain

${\displaystyle A{\vec {v}}=A\left(\sum _{j=1}^{n}v_{j}{\vec {e}}_{j}\right)=\sum _{j=1}^{n}Av_{j}{\vec {e}}_{j}=\sum _{j=1}^{n}v_{j}L({\vec {e}}_{j})=L\left(\sum _{j=1}^{n}v_{j}{\vec {e}}_{j}\right)=L({\vec {v}})}$

Thus, we have shown existence. To prove uniqueness, suppose there were any other matrix ${\displaystyle B\in \mathbb {R} ^{m\times n}}$ with the property that ${\displaystyle \forall {\vec {v}}\in \mathbb {R} ^{n}:L({\vec {v}})=B{\vec {v}}}$ . Then in particular,

${\displaystyle B{\vec {e}}_{j}=L({\vec {e}}_{j})}$

which already implies that ${\displaystyle A=B}$ (since all the columns of both matrices are identical).${\displaystyle \Box }$

## How to generalise the derivative

It is not immediately straightforward how one would generalize the derivative to higher dimensions. For, if we take the definition of the derivative at a point ${\displaystyle x_{0}}$

${\displaystyle \lim _{h\to 0}{\frac {f(x_{0}+h)-f(x_{0})}{h}}}$

and insert vectors for ${\displaystyle h}$ and ${\displaystyle x_{0}}$ , we would divide the whole thing by a vector. But this is not defined.

Hence, we shall rephrase the definition of the derivative a bit and cast it into a form where it can be generalized to higher dimensions.

Theorem

Let ${\displaystyle f:\mathbb {R} \to \mathbb {R} }$ be a one-dimensional function and let ${\displaystyle x_{0}\in \mathbb {R} }$ . Then ${\displaystyle f}$ is differentiable at ${\displaystyle x_{0}}$ if and only if there exists a linear function ${\displaystyle l:\mathbb {R} \to \mathbb {R} }$ such that

${\displaystyle \lim _{h\to 0}{\frac {{\Big |}f(x_{0}+h)-{\big (}f(x_{0})+l(h){\big )}{\Big |}}{|h|}}=0}$

We note that according to the above, linear functions ${\displaystyle l:\mathbb {R} \to \mathbb {R} }$ are given by multiplication by a ${\displaystyle 1\times 1}$-matrix, that is, a scalar.

Proof

First assume that ${\displaystyle f}$ is differentiable at ${\displaystyle x_{0}}$. We set ${\displaystyle l(h):=f'(x_{0})\cdot h}$ and obtain

${\displaystyle {\frac {{\Big |}f(x_{0}+h)-{\big (}f(x_{0})+l(h){\big )}{\Big |}}{|h|}}=\left|{\frac {f(x_{0}+h)-f(x_{0})}{h}}-f'(x_{0})\right|}$

which converges to 0 due to the definition of ${\displaystyle f'(x_{0})}$ .

Assume now that we are given an ${\displaystyle l:\mathbb {R} \to \mathbb {R} }$ such that

${\displaystyle \lim _{h\to 0}{\frac {{\Big |}f(x_{0}+h)-{\big (}f(x_{0})+l(h){\big )}{\Big |}}{|h|}}=0}$

Let ${\displaystyle c}$ be the scalar associated to ${\displaystyle l}$ . Then by an analogous computation ${\displaystyle f'(x_{0})=c}$ .${\displaystyle \Box }$

With the latter formulation of differentiability from the above theorem, we may readily generalize to higher dimensions, since division by the Euclidean norm of a vector is defined, and linear mappings are also defined in higher dimensions.

Definition

A function ${\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}$ is called differentiable or totally differentiable at a point ${\displaystyle x_{0}\in \mathbb {R} ^{m}}$ if and only if there exists a linear function ${\displaystyle L:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}$ such that

${\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+L({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}$

We have already proven that this definition coincides with the usual one in the one-dim. case (that is ${\displaystyle m=n=1}$).

We have the following theorem:

Theorem

Let ${\displaystyle S\subseteq \mathbb {R} ^{m}}$ be a set, let ${\displaystyle x_{0}\in {\overset {\circ }{S}}}$ be an interior point of ${\displaystyle S}$ , and let ${\displaystyle f:S\to \mathbb {R} ^{m}}$ be a function differentiable at ${\displaystyle x_{0}}$ . Then the linear map ${\displaystyle L}$ such that

${\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+L({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}$

is unique; that is, there exists only one such map ${\displaystyle L}$ .

Proof

Since ${\displaystyle x_{0}}$ is an interior point of ${\displaystyle S}$, we find ${\displaystyle r>0}$ such that ${\displaystyle B_{r}(x_{0})\subseteq S}$ . Let now ${\displaystyle K:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}$ be any other linear mapping with the property that

${\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+K({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}$

We note that for all vectors of the standard basis ${\displaystyle \{e_{1},\ldots ,e_{n}\}}$ , the numbers ${\displaystyle \lambda e_{j}}$ for ${\displaystyle 0\leq \lambda are contained within ${\displaystyle S}$ . Hence, we obtain by the triangle inequality

${\displaystyle {\Big \|}L({\vec {e}}_{j})-K({\vec {e}}_{j}){\Big \|}={\frac {{\bigl \|}L(\lambda {\vec {e}}_{j})-K(\lambda {\vec {e}}_{j}){\bigr \|}}{\|\lambda {\vec {e}}_{j}\|}}\leq {\frac {{\Big \|}f(x_{0}+\lambda {\vec {e}}_{j})-{\big (}f(x_{0})+L(\lambda {\vec {e}}_{j}){\big )}{\Big \|}}{\|\lambda {\vec {e}}_{j}\|}}+{\frac {{\Big \|}f(x_{0}+\lambda {\vec {e}}_{j})-{\big (}f(x_{0})+K(\lambda {\vec {e}}_{j}){\big )}{\Big \|}}{\|\lambda {\vec {e}}_{j}\|}}}$

Taking ${\displaystyle \lambda \to 0}$ , we see that ${\displaystyle L({\vec {e}}_{j})=K({\vec {e}}_{j})}$ . Thus, ${\displaystyle L}$ and ${\displaystyle K}$ coincide on all basis vectors, and since every other vector can be expressed as a linear combination of those, by linearity of ${\displaystyle L}$ and ${\displaystyle K}$ we obtain ${\displaystyle L=K}$ .${\displaystyle \Box }$

Thus, the following definition is justified:

Definition

Let ${\displaystyle f:S\to \mathbb {R} ^{n}}$ be a function (where ${\displaystyle S\subseteq \mathbb {R} ^{m}}$ is a subset of ${\displaystyle \mathbb {R} ^{m}}$), and let ${\displaystyle x_{0}}$ be an interior point of ${\displaystyle S}$ such that ${\displaystyle f}$ is differentiable at ${\displaystyle x_{0}}$ . Then the unique linear function ${\displaystyle L}$ such that

${\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+L({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}$

is called the differential of ${\displaystyle f}$ at ${\displaystyle x_{0}}$ and is denoted ${\displaystyle f(x_{0}):=L}$ .

## Directional and partial derivatives

We shall first define directional derivatives.

Definition

Let ${\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}$ be a function, and let ${\displaystyle {\vec {v}}\in \mathbb {R} ^{m}}$ be a vector. If the limit

${\displaystyle \lim _{h\to 0}{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{h}}}$

exists, it is called directional derivative of ${\displaystyle f}$ in direction ${\displaystyle {\vec {v}}}$ . We denote it by ${\displaystyle D_{\vec {v}}f(x_{0})}$ .

The following theorem relates directional derivatives and the differential of a totally differentiable function:

Theorem

Let ${\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}$ be a function that is totally differentiable at ${\displaystyle x_{0}}$, and let ${\displaystyle {\vec {v}}\in \mathbb {R} ^{m}\setminus \{0\}}$ be a nonzero vector. Then ${\displaystyle D_{\vec {v}}f(x_{0})}$ exists and is equal to ${\displaystyle f'(x_{0}){\vec {v}}}$ .

Proof

According to the very definition of total differentiability,

${\displaystyle \lim _{h\to 0}\left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{|h|\cdot \|{\vec {v}}\|}}-{\frac {f'(x_{0}){\vec {v}}}{|h|\cdot \|{\vec {v}}\|}}\right\|=0}$

Hence,

${\displaystyle \lim _{h\to 0}\left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{|h|}}-{\frac {f'(x_{0}){\vec {v}}}{|h|}}\right\|=0}$

by multiplying the above equation by ${\displaystyle \|{\vec {v}}\|}$ . Noting that

${\displaystyle \left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{|h|}}-{\frac {f'(x_{0}){\vec {v}}}{|h|}}\right\|=\left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{h}}-{\frac {f'(x_{0}){\vec {v}}}{h}}\right\|}$

the theorem follows.${\displaystyle \Box }$

A special case of directional derivatives are partial derivatives:

Definition

Let ${\displaystyle \{{\vec {e}}_{1},\ldots ,{\vec {e}}_{m}\}}$ be the standard basis of ${\displaystyle \mathbb {R} ^{m}}$ , let ${\displaystyle x_{0}\in \mathbb {R} ^{m}}$ and let ${\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}$ be a function such that the directional derivatives ${\displaystyle D_{{\vec {e}}_{j}}f(x_{0})}$ all exist. Then we set

${\displaystyle {\frac {\partial f}{\partial x_{j}}}:=D_{{\vec {e}}_{j}}f(x_{0})}$

and call it the partial derivative in the direction of ${\displaystyle x_{j}}$ .

In fact, by writing down the definition of ${\displaystyle D_{{\vec {e}}_{j}}f(x_{0})}$ , we see that the partial derivative in the direction of ${\displaystyle x_{j}}$ is nothing else than the derivative of the function ${\displaystyle y\mapsto f(x_{0,1},\ldots ,x_{0,j-1},y,x_{0,j+1},\ldots ,x_{0,m})}$ in the variable ${\displaystyle y}$ at the place ${\displaystyle x_{0,j}}$ . That is, for instance, if

${\displaystyle f(x,y,z)=x^{2}+4z^{3}+3xy}$

then

${\displaystyle {\frac {\partial f}{\partial x}}=2x+3y\ ,\ {\frac {\partial f}{\partial y}}=3x\ ,\ {\frac {\partial f}{\partial z}}=12z^{2}}$

that is, when forming a partial derivative, we regard the other variables as constant and derive only with respect to the variable we are considering.

## The Jacobian matrix

From the above, we know that the differential of a function ${\displaystyle f'(x_{0})}$ has an associated matrix representing the linear map thus defined. Under a condition, we can determine this matrix from the partial derivatives of the component functions.

Theorem

Let ${\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}$ be a function such that all partial derivatives exist at ${\displaystyle x_{0}}$ and are continuous in each component on ${\displaystyle B_{r}(x_{0})}$ for a possibly very small, but positive ${\displaystyle r>0}$ . Then ${\displaystyle f}$ is totally differentiable at ${\displaystyle x_{0}}$ and the differential of ${\displaystyle f}$ is given by left multiplication by the matrix

${\displaystyle J_{f}(x_{0}):={\begin{pmatrix}{\dfrac {\partial f_{1}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{1}}{\partial x_{m}}}\\\vdots &\ddots &\vdots \\{\dfrac {\partial f_{n}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{n}}{\partial x_{m}}}\end{pmatrix}}}$

where ${\displaystyle f=(f_{1},\ldots ,f_{n})}$ .

The matrix ${\displaystyle J_{f}(x_{0})}$ is called the Jacobian matrix.

Proof
 ${\displaystyle {\frac {{\Big \|}f(x_{0}+{\vec {h}})-(f(x_{0})+J_{f}(x_{0}){\vec {h}}){\Big \|}}{\|{\vec {h}}\|}}}$ ${\displaystyle ={\frac {\left\|\displaystyle \sum _{j=1}^{n}f_{j}(x_{0}+{\vec {h}}){\vec {e}}_{j}-\sum _{j=1}^{n}\left(f_{j}(x_{0})+\sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right){\vec {e}}_{j}\right\|}{\|{\vec {h}}\|}}}$ ${\displaystyle \leq \sum _{j=1}^{n}{\frac {\left\|f_{j}(x_{0}+h)-\left(f_{j}(x_{0})+\displaystyle \sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right)\right\|}{\|{\vec {h}}\|}}}$

We shall now prove that all summands of the last sum go to 0.

Indeed, let ${\displaystyle j\in \{1,\ldots ,n\}}$ . Writing again ${\displaystyle {\vec {h}}=(h_{1},\ldots ,h_{m})}$ , we obtain by the one-dimensional mean value theorem, first applied in the first variable, then in the second and so on, the succession of equations

${\displaystyle f_{j}(x_{0}+h_{1}{\vec {e}}_{1})-f_{j}(x_{0})=\overbrace {(x_{0,1}+h_{1}-x_{0,1})} ^{=h_{1}}{\frac {\partial f_{j}}{\partial x_{1}}}(x_{0}+t_{1}{\vec {e}}_{1})}$
${\displaystyle f_{j}(x_{0}+h_{1}{\vec {e}}_{1}+h_{2}{\vec {e}}_{2})-f_{j}(x_{0}+h_{1}{\vec {e}}_{1})=\overbrace {(x_{0,2}+h_{2}-x_{0,2})} ^{=h_{2}}{\frac {\partial f_{j}}{\partial x_{2}}}(x_{0}+h_{1}{\vec {e}}_{1}+t_{2}{\vec {e}}_{2})}$
${\displaystyle \vdots }$
${\displaystyle f_{j}(x_{0}+h_{1}{\vec {e}}_{1}+\cdots +h_{m}{\vec {e}}_{m})-f_{j}(x_{0}+h_{1}{\vec {e}}_{1}+\cdots +h_{m-1}{\vec {e}}_{m-1})=\overbrace {(x_{0,m}+h_{m}-x_{0,m})} ^{=h_{m}}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0}+h_{1}{\vec {e}}_{1}+\cdots +h_{m-1}{\vec {e}}_{m-1}+t_{n}{\vec {e}}_{m})}$

for suitably chosen ${\displaystyle t_{k}\in [x_{0,k},x_{0,k}+h_{k}]}$ . We can now sum all these equations together to obtain

${\displaystyle f_{j}(x_{0}+{\vec {h}})-f(x_{0})=\sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{k}}}\left(x_{0}+\sum _{l=1}^{k-1}h_{l}{\vec {e}}_{l}+t_{k}{\vec {e}}_{k}\right)}$

Let now ${\displaystyle \delta >0}$ . Using the continuity of the ${\displaystyle {\frac {\partial f_{j}}{\partial x_{k}}}}$ on ${\displaystyle B_{r}(x_{0})}$ , we may choose ${\displaystyle \delta _{k}>0}$ such that

${\displaystyle \left|{\frac {\partial f_{j}}{\partial x_{k}}}\left(x_{0}+\sum _{l=1}^{k-1}h_{l}{\vec {e}}_{l}+t_{k}e_{k}\right)-{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right|<{\frac {\epsilon }{m}}}$

for ${\displaystyle |h_{k}|<\delta _{k}}$ , given that ${\displaystyle {\vec {h}}\in B_{r}(0)}$ (which we may assume as ${\displaystyle {\vec {h}}\to {\vec {0}}}$). Hence, we obtain

${\displaystyle {\frac {\left\|f_{j}(x_{0}+h)-\left(f_{j}(x_{0})+\displaystyle \sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right)\right\|}{\|{\vec {h}}\|}}\leq {\frac {\|{\vec {h}}\|\cdot m\cdot {\frac {\epsilon }{m}}}{\|{\vec {h}}\|}}}$

and thus the theorem.${\displaystyle \Box }$

Corollary

If ${\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}$ is continuously differentiable at ${\displaystyle x_{0}\in \mathbb {R} ^{m}}$ and ${\displaystyle {\vec {v}}\in \mathbb {R} ^{m}\setminus \{0\}}$ , then

${\displaystyle D_{\vec {v}}f(x_{0})=\sum _{j=1}^{m}v_{j}{\frac {\partial f}{\partial x_{j}}}(x_{0})}$
Proof
${\displaystyle D_{\vec {v}}f(x_{0})=f'(x_{0})({\vec {v}})=J_{f}(x_{0}){\vec {v}}=\sum _{j=1}^{m}v_{j}{\frac {\partial f}{\partial x_{j}}}(x_{0})}$${\displaystyle \Box }$