# Commutative Algebra/The Cayley–Hamilton theorem and Nakayama's lemma

## Determinants within a commutative ring

We shall now derive the notion of a determinant in the setting of a commutative ring.

Definition 7.1 (Determinant):

Let ${\displaystyle R}$ be a commutative ring, and let ${\displaystyle n\in \mathbb {N} }$. A determinant is a function ${\displaystyle \det :R^{n\times n}\to R}$ satisfying the following three axioms:

1. ${\displaystyle \det I_{n}=1}$, where ${\displaystyle I_{n}}$ is the ${\displaystyle n\times n}$ identity matrix.
2. If ${\displaystyle A}$ is a matrix such that two adjacent columns are equal, then ${\displaystyle \det A=0}$.
3. For each ${\displaystyle j\in \{1,\ldots ,n\}}$ we have ${\displaystyle \det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j}+c\mathbf {b} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})=\det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})+c\det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {b} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})}$, where ${\displaystyle \mathbf {a} _{1},\ldots ,\mathbf {a} _{n},\mathbf {b} _{j}}$ are columns and ${\displaystyle c\in R}$.

We shall later see that there exists exactly one determinant.

Theorem 7.2 (Properties of a (the) determinant):

1. If ${\displaystyle A\in R^{n\times n}}$ has a column consisting entirely of zeroes, then ${\displaystyle \det A=0}$.
2. If ${\displaystyle A}$ is a matrix, and one adds a multiple of one column to an adjacent column, then ${\displaystyle \det A}$ does not change.
3. If two adjacent columns of ${\displaystyle A}$ are exchanged, then ${\displaystyle \det A}$ is multiplied by ${\displaystyle -1}$.
4. If any two columns of a matrix ${\displaystyle A}$ are exchanged, then ${\displaystyle \det A}$ is multiplied by ${\displaystyle -1}$.
5. If ${\displaystyle A}$ is a matrix, and one adds a multiple of one column to any other column, then ${\displaystyle \det A}$ does not change.
6. If ${\displaystyle A}$ is a matrix that has two equal columns, then ${\displaystyle \det A=0}$.
7. Let ${\displaystyle \sigma \in S_{n}}$ be a permutation, where ${\displaystyle S_{n}}$ is the ${\displaystyle n}$-th symmetric group. If ${\displaystyle A=(\mathbf {a} _{1},\ldots ,\mathbf {a} _{n})}$, then ${\displaystyle \det(\mathbf {a} _{\sigma (1)},\ldots ,\mathbf {a} _{\sigma (n)})=\operatorname {sgn} \sigma \det A}$.

Proofs:

1. Let ${\displaystyle A=(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})}$, where the ${\displaystyle j}$-th column ${\displaystyle \mathbf {a} _{j}}$ is the zero vector. Then by axiom 3 for the determinant setting ${\displaystyle c=-1}$,

${\displaystyle \det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})=\det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j}-\mathbf {a} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})=\det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})-\det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})=0}$.

Alternatively, we may also set ${\displaystyle c=1}$ and ${\displaystyle \mathbf {b} _{j}=\mathbf {a} _{j}=\mathbf {0} }$ to obtain

${\displaystyle \det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j}+c\mathbf {b} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})=(1+c)\det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})}$,

from which the theorem follows by subtracting ${\displaystyle \det A}$ from both sides.

Those proofs correspond to the proofs for ${\displaystyle T0=0}$ for a linear map ${\displaystyle T}$ (in whatever context).

2. If we set ${\displaystyle \mathbf {b} _{j}=\mathbf {a} _{j+1}}$ or ${\displaystyle \mathbf {b} _{j}=\mathbf {a} _{j-1}}$ (dependent on whether we add the column left or the column right to the current column), then axiom 3 gives us

${\displaystyle \det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j}+c\mathbf {b} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})=\det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})+c\det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {b} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})}$,

where the latter determinant is zero because we have to adjacent equal columns.

3. Consider the two matrices ${\displaystyle A:=(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j},\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})}$ and ${\displaystyle B:=(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j+1},\mathbf {a} _{j},\ldots ,\mathbf {a} _{n})}$. By 7.2, 2. and axiom 3 for determinants, we have

{\displaystyle {\begin{aligned}\det B&=\det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j+1}+\mathbf {a} _{j},\mathbf {a} _{j},\ldots ,\mathbf {a} _{n})\\&=\det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j+1}+\mathbf {a} _{j},-\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})\\&=\det(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j},-\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})\\&=-\det A\end{aligned}}}.

4. We exchange the ${\displaystyle j}$-th and ${\displaystyle k}$-th column by first moving the ${\displaystyle j}$-th column successively to spot ${\displaystyle k}$ (using ${\displaystyle |j-k|}$ swaps) and the ${\displaystyle k}$-th column, which is now one step closer to the ${\displaystyle j}$-th spot, to spot ${\displaystyle j}$ using ${\displaystyle |j-k|-1}$ swaps. In total, we used an odd number of swaps, and all the other columns are in the same place since they moved once to the right and once to the left. Hence, 4. follows from applying 3. to each swap.

5. Let's say we want to add ${\displaystyle c\cdot \mathbf {a} _{k}}$ to the ${\displaystyle j}$-th column. Then we first use 4. to put the ${\displaystyle j}$-th column adjacent to ${\displaystyle \mathbf {a} _{k}}$, then use 2. to do the addition without change to the determinant, and then use 4. again to put the ${\displaystyle j}$-th column back to its place. In total, the only change our determinant has suffered was twice multiplication by ${\displaystyle -1}$, which cancels even in a general ring.

6. Let's say that the ${\displaystyle j}$-th column and the ${\displaystyle k}$-th column are equal, ${\displaystyle k\neq j}$. Then we subtract column ${\displaystyle j}$ from column ${\displaystyle k}$ (or, indeed, the other way round) without change to the determinant, obtain a matrix with a zero column and apply 1.

7. Split ${\displaystyle \sigma }$ into swaps, use 4. repeatedly and use further that ${\displaystyle \operatorname {sgn} }$ is a group homomorphism.${\displaystyle \Box }$

Note that we have only used axioms 2 & 3 for the preceding proof.

The following lemma will allow us to prove the uniqueness of the determinant, and also the formula ${\displaystyle \det(AB)=\det A\det B}$.

Lemma 7.3:

Let ${\displaystyle A=(a_{i,j})_{1\leq i,j\leq n}}$ and ${\displaystyle B=(b_{i,j})_{1\leq i,j\leq n}}$ be two ${\displaystyle n\times n}$ matrices with entries in a commutative ring ${\displaystyle R}$. Then

${\displaystyle \det(AB)=\det A\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )b_{1,\sigma (1)}\cdots b_{n,\sigma (n)}}$.

Proof:

The matrix ${\displaystyle AB}$ has ${\displaystyle k}$-th columns ${\displaystyle \sum _{\nu =1}^{n}b_{\nu ,k}\mathbf {a} _{\nu }}$. Hence, by axiom 3 for determinants and theorem 7.2, 7. and 6., we obtain, denoting ${\displaystyle AB=:C=(c_{i,j})_{1\leq i,j\leq n}=(\mathbf {c} _{1},\ldots ,\mathbf {c} _{n})}$:

{\displaystyle {\begin{aligned}\det(AB)&=\sum _{\nu _{1}=1}^{n}b_{\nu _{1},1}\det(\mathbf {a} _{\nu _{1}},\mathbf {c} _{2},\ldots ,\mathbf {c} _{n})\\&=\sum _{\nu _{1}=1}^{n}\sum _{\nu _{2}}^{n}b_{\nu _{1},1}b_{\nu _{2},2}\det(\mathbf {a} _{\nu _{1}},\mathbf {a} _{\nu _{2}},\mathbf {c} _{3},\ldots ,\mathbf {c} _{n})\\&=\cdots =\sum _{\nu _{1},\ldots ,\nu _{n}=1}^{n}b_{\nu _{1},1}\cdots b_{\nu _{n},n}\det(\mathbf {a} _{\nu _{1}},\ldots ,\mathbf {a} _{\nu _{n}})\\&=\det A\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )b_{1,\sigma (1)}\cdots b_{n,\sigma (n)}\end{aligned}}}${\displaystyle \Box }$

Theorem 7.4 (Uniqueness of the determinant):

For each commutative ring, there is at most one determinant, and if it exists, it equals

${\displaystyle \det C=\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )c_{1,\sigma (1)}\cdots c_{n,\sigma (n)}}$.

Proof:

Let ${\displaystyle C\in R^{n\times n}}$ be an arbitrary matrix, and set ${\displaystyle A=I_{n}}$ and ${\displaystyle B=C}$ in lemma 7.3. Then we obtain by axiom 1 for determinants (the first time we use that axiom)

${\displaystyle \det C=\det(I_{n}C)=1\cdot \sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )c_{1,\sigma (1)}\cdots c_{n,\sigma (n)}}$.${\displaystyle \Box }$

Theorem 7.5 (Multiplicativity of the determinant):

If ${\displaystyle \det }$ is a determinant, then

${\displaystyle \det(AB)=\det A\det B}$.

Proof:

From lemma 7.3 and theorem 7.4 we may infer

${\displaystyle \det(AB)=\det(A)\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )b_{1,\sigma (1)}\cdots b_{n,\sigma (n)}=\det(A)\det(B)}$.${\displaystyle \Box }$

Theorem 7.6 (Existence of the determinant):

Let ${\displaystyle R}$ be a commutative ring. Then

${\displaystyle \det(A):=\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )a_{1,\sigma (1)}\cdots a_{n,\sigma (n)}}$

is a determinant.

Proof:

First of all, ${\displaystyle I_{n}}$ has nonzero entries everywhere except on the diagonal. Hence, if ${\displaystyle I_{n}=(a_{i,j})_{1\leq i,j\leq n}}$, then ${\displaystyle a_{1,\sigma (1)}\cdots a_{n,\sigma (n)}}$ vanishes except ${\displaystyle \sigma (1)=1,\ldots ,\sigma (n)=n}$, i.e. ${\displaystyle \sigma }$ is the identity. Hence ${\displaystyle \det(I_{n})=1}$.

Let now ${\displaystyle A}$ be a matrix whose ${\displaystyle j}$-th and ${\displaystyle j+1}$-th columns are equal. The function

${\displaystyle f:S_{n}\to S_{n},f(\sigma )=k\mapsto {\begin{cases}\sigma (k)&k\notin \{j,j+1\}\\\sigma (j)&k=j+1\\\sigma (j+1)&k=j\end{cases}}}$

is bijective, since the inverse is given by ${\displaystyle f}$ itself. Furthermore, since ${\displaystyle f}$ amounts to composing ${\displaystyle \sigma }$ with another swap, it is sign reversing. Hence, we have

{\displaystyle {\begin{aligned}\det(A)&=\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )a_{1,\sigma (1)}\cdots a_{n,\sigma (n)}\\&=\sum _{\operatorname {sgn} \sigma =1}a_{1,\sigma (1)}\cdots a_{n,\sigma (n)}-\sum _{\operatorname {sgn} \sigma =-1}a_{1,\sigma (1)}\cdots a_{n,\sigma (n)}\\&=\sum _{\operatorname {sgn} \sigma =1}a_{1,\sigma (1)}\cdots a_{n,\sigma (n)}-\sum _{\operatorname {sgn} \sigma =1}a_{1,f(\sigma )(1)}\cdots a_{n,f(\sigma )(n)}\end{aligned}}}.

Now since the ${\displaystyle j}$-th and ${\displaystyle j+1}$-th column of ${\displaystyle A}$ are identical, ${\displaystyle \forall k,l\in \mathbb {N} :a_{k,\sigma (l)}=a_{k,f(\sigma )(l)}}$. Hence ${\displaystyle \det A=0}$.

Linearity follows from the linearity of each summand:

${\displaystyle \sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )a_{1,\sigma (1)}\cdots (a_{\sigma ^{-1}(j),j}+cb_{\sigma ^{-1}(j),j})\cdots a_{n,\sigma (n)}=\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )a_{1,\sigma (1)}\cdots a_{\sigma ^{-1}(j),j}\cdots a_{n,\sigma (n)}+c\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )a_{1,\sigma (1)}\cdots b_{\sigma ^{-1}(j),j}\cdots a_{n,\sigma (n)}}$.${\displaystyle \Box }$

Theorem 7.7:

The determinant of any matrix equals the determinant of the transpose of that matrix.

Proof:

Observe that inversion is a bijection on ${\displaystyle S_{n}}$ the inverse of which is given by inversion (${\displaystyle (\sigma ^{-1})^{-1}=\sigma }$). Further observe that ${\displaystyle \operatorname {sgn} (\sigma )=\operatorname {sgn} (\sigma ^{-1})}$, since we just apply all the transpositions in reverse order. Hence,

${\displaystyle \det A=\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )a_{1,\sigma (1)}\cdots a_{n,\sigma (n)}=\sum _{\sigma ^{-1}\in S_{n}}\operatorname {sgn} (\sigma ^{-1})a_{1,\sigma ^{-1}(1)}\cdots a_{n,\sigma ^{-1}(n)}=\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )a_{\sigma (1),1}\cdots a_{\sigma (n),n}=\det A^{t}}$.${\displaystyle \Box }$

Theorem 7.8 (column expansion):

Let ${\displaystyle A}$ be an ${\displaystyle n\times n}$ matrix over a commutative ring ${\displaystyle R}$. For ${\displaystyle 1\leq i,j\leq n}$ define ${\displaystyle A_{i,j}}$ to be the ${\displaystyle (n-1)\times (n-1)}$ matrix obtained by crossing out the ${\displaystyle i}$-th row and ${\displaystyle j}$-th column from ${\displaystyle A}$. Then for any ${\displaystyle k\in \{1,\ldots ,n\}}$ we have

${\displaystyle \det A=\sum _{\nu =1}^{n}(-1)^{\nu +k}a_{\nu ,k}\det A_{\nu ,k}}$.

Proof 1:

We prove the theorem from the formula for the determinant given by theorems 7.5 and 7.6.

Let ${\displaystyle k\in \{1,\ldots ,n\}}$ be fixed. For each ${\displaystyle \nu \in \{1,\ldots ,n\}}$, we define

${\displaystyle f:S_{n-1}\to S_{n},f(\sigma ):=m\mapsto {\begin{cases}\nu &m=\nu \\\sigma (m)&m<\nu \wedge \sigma (m)<\nu \\\sigma (m)+1&m<\nu \wedge \sigma (m)\geq \nu \\\sigma (m-1)&m>\nu \wedge \sigma (m)<\nu \\\sigma (m-1)+1&m>\nu \wedge \sigma (m)\geq \nu \end{cases}}}$.

Then

{\displaystyle {\begin{aligned}\sum _{\nu =1}^{n}a_{\nu ,k}\det A_{\nu ,k}&=\sum _{\nu =1}^{n}(-1)^{\nu +k}a_{\nu ,k}\sum _{\sigma \in S_{n-1}}\operatorname {sgn} (\sigma )a_{1,f(\sigma )(1)}\cdots a_{k-1,f(\sigma )(k-1)}a_{k+1,f(\sigma )(k+1)}\cdots a_{n,f(\sigma )(n)}\\&=\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )a_{1,\sigma (1)}\cdots a_{n,\sigma (n)}.\end{aligned}}}${\displaystyle \Box }$

Proof 2:

We note that all of the above derivations could have been done with rows instead of columns (which amounts to nothing more than exchanging ${\displaystyle a_{i,j}}$ with ${\displaystyle a_{j,i}}$ each time), and would have ended up with the same formula for the determinant since

${\displaystyle \sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )a_{1,\sigma (1)}\cdots a_{n,\sigma (n)}=\sum _{\sigma ^{-1}\in S_{n}}\operatorname {sgn} (\sigma ^{-1})a_{1,\sigma ^{-1}(1)}\cdots a_{n,\sigma ^{-1}(n)}=\sum _{\sigma \in S_{n}}\operatorname {sgn} (\sigma )a_{\sigma (1),1}\cdots a_{\sigma (n),n}}$

as argued in theorem 7.7.

Hence, we prove that the function ${\displaystyle R^{n\times n}\to R}$ given by the formula ${\displaystyle \sum _{\nu =1}^{n}(-1)^{\nu +k}a_{\nu ,k}\det A_{\nu ,k}}$ satisfies 1 - 3 of 7.1 with rows instead of columns, and then apply theorem 7.4 with rows instead of columns.

1.

Set ${\displaystyle A=I_{n}}$ to obtain

${\displaystyle \sum _{\nu =1}^{n}a_{\nu ,k}(-1)^{\nu +k}\det A_{\nu ,k}=(-1)^{2k}a_{k,k}\det A_{k,k}=1\cdot 1=1}$.

2.

Let ${\displaystyle A}$ have two equal adjacent rows, the ${\displaystyle j}$-th and ${\displaystyle j+1}$-th, say. Then

${\displaystyle \sum _{\nu =1}^{n}a_{\nu ,k}(-1)^{\nu +k}\det A_{\nu ,k}=(-1)^{j+k}\det A_{j,k}+(-1)^{j+1+k}\det A_{j+1,k}=0}$,

since each of the ${\displaystyle A_{\nu ,k}}$ has two equal adjacent rows except for possibly ${\displaystyle \nu =j}$ and ${\displaystyle \nu =j+1}$, which is why, by theorem 7.6, the determinant is zero in all those cases, and further ${\displaystyle A_{j,k}=A_{j+1,k}}$, since in both we deleted "the same" row.

3.

Define ${\displaystyle B:=(b_{i,j})_{1\leq i,j\leq n}:=(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {a} _{j}+c\mathbf {b} _{j}\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})^{t}}$, and for each ${\displaystyle \nu ,k\in \{1,\ldots ,n\}}$ define ${\displaystyle C_{\nu ,k}}$ as the matrix obtained by crossing out the ${\displaystyle \nu }$-th row and the ${\displaystyle k}$-th column from the matrix ${\displaystyle C:=(c_{i,j})_{1\leq i,j\leq n}:=(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},\mathbf {b} _{j}\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})^{t}}$. Then by theorem 7.6 and axiom 3 for the determinant,

{\displaystyle {\begin{aligned}\sum _{\nu =1}^{n}b_{\nu ,k}(-1)^{\nu +k}\det B_{\nu ,k}&=\sum _{\nu =1}^{j-1}a_{\nu ,k}(-1)^{\nu +k}(\det A_{\nu ,k}+c\det C_{\nu ,k})+(-1)^{j+k}(a_{j,k}+cb_{j,k})\det A_{j,k}+\sum _{\nu =j+1}^{n}a_{\nu ,k}(-1)^{\nu +k}(\det A_{\nu ,k}+c\det C_{\nu ,k})\\&=\sum _{\nu =1}^{n}a_{\nu ,k}(-1)^{\nu +k}\det A_{\nu ,k}+c\sum _{\nu =1}^{n}c_{\nu ,k}(-1)^{\nu +k}\det C_{\nu ,k}\\&=\det A+c\det C\end{aligned}}}.

Hence follows linearity by rows.${\displaystyle \Box }$

For the sake of completeness, we also note the following lemma:

Lemma 7.9:

Let ${\displaystyle A}$ be an invertible matrix. Then ${\displaystyle \det(A)}$ is invertible.

Proof:

Indeed, ${\displaystyle \det(A)^{-1}=\det(A^{-1})}$ due to the multiplicativity of the determinant.${\displaystyle \Box }$

The converse is also true and will be proven in the next subsection.

### Exercises

• Exercise 7.1.1: Argue that the determinant, seen as a map from the set of all matrices (where scalars are ${\displaystyle 1\times 1}$-matrices), is idempotent.

## Cramer's rule in the general case

Theorem 7.10 (Cramer's rule, solution of linear equations):

Let ${\displaystyle R}$ be a commutative ring, let ${\displaystyle A=(a_{i,j})_{1\leq i,j\leq n}}$ be a matrix with entries in ${\displaystyle R}$ and let ${\displaystyle \mathbf {b} =(b_{1},\ldots ,b_{n})^{t}}$ be a vector. If ${\displaystyle A}$ is invertible, the unique solution to ${\displaystyle Ax=\mathbf {b} }$ is given by

${\displaystyle x_{j}={\frac {\det A_{j}}{\det A}}}$,

where ${\displaystyle A_{j}}$ is obtained by replacing the ${\displaystyle j}$-th column of ${\displaystyle A}$ by ${\displaystyle \mathbf {b} }$.

Proof 1:

Let ${\displaystyle j\in \{1,\ldots ,n\}}$ be arbitrary but fixed. The determinant of ${\displaystyle A}$ is linear in the first column, and hence constitutes a linear map in the first column ${\displaystyle L_{j}:R^{n}\to R}$ mapping any vector to the determinant of ${\displaystyle A}$ with the ${\displaystyle j}$-th column replaced by that vector. If ${\displaystyle \mathbf {a} _{j}}$ is the ${\displaystyle j-}$-th column of ${\displaystyle A}$, ${\displaystyle L_{j}(\mathbf {a} _{j})=\det(A)}$. Furthermore, if we insert a different column ${\displaystyle \mathbf {a} _{k}}$ into ${\displaystyle L_{j}}$, we obtain zero, since we obtain the determinant of a matrix where the column ${\displaystyle \mathbf {a} _{k}}$ appears twice. We now consider the system of equations

${\displaystyle {\begin{cases}a_{1,1}x_{1}+\cdots +a_{1,n}x_{n}&=b_{1}\\&\vdots \\a_{n,1}x_{1}+\cdots +a_{n,n}x_{n}&=b_{n},\end{cases}}}$

where ${\displaystyle (x_{1},\ldots ,x_{n})^{T}}$ is the unique solution of the system ${\displaystyle Ax=b}$, which exists since it is given by ${\displaystyle A^{-1}b}$ since ${\displaystyle A}$ is invertible. Since ${\displaystyle L_{j}}$ is linear, we find an ${\displaystyle 1\times n}$ matrix ${\displaystyle (c_{1},\ldots ,c_{n})}$ such that for all ${\displaystyle \mathbf {v} \in R^{n}}$

${\displaystyle (c_{1},\ldots ,c_{n})\cdot \mathbf {v} =L_{j}(\mathbf {v} )}$;

in fact, due to theorem 7.8, ${\displaystyle c_{k}=(-1)^{j+k}\det(A_{j,k})}$. We now add up the lines of the linear equation system above in the following way: We take ${\displaystyle c_{1}}$ times the first row, add ${\displaystyle c_{2}}$ times the second row and so on. Due to our considerations, this yields the result

${\displaystyle \det(A)x_{j}=L_{j}(\mathbf {b} )}$.

Due to lemma 7.9, ${\displaystyle \det(A)}$ is invertible. Hence, we get

${\displaystyle x_{j}=(\det(A))^{-1}L_{j}(\mathbf {b} )=(\det(A))^{-1}\det(A_{j})}$

and hence the theorem.${\displaystyle \Box }$

Proof 2:

For all ${\displaystyle j\in \{1,\ldots ,n\}}$, we define the matrix

${\displaystyle X_{j}:={\begin{pmatrix}1&0&\cdots &0&x_{1}&0&\cdots &&0\\0&1&\cdots &0&x_{2}&0&\cdots &&0\\\vdots &&\ddots &&\vdots &&&&\vdots \\\vdots &&&&\vdots &\ddots &&&\vdots \\0&&\cdots &0&x_{n}&0&\cdots &1&0\\0&&\cdots &0&x_{n}&0&\cdots &0&1\end{pmatrix}};}$

this matrix shall represent a unit matrix, where the ${\displaystyle j}$-th column is replaced by the vector ${\displaystyle (x_{1},\ldots ,x_{n})^{\mathbf {T} }}$. By expanding the ${\displaystyle j}$-th column, we find that the determinant of this matrix is given by ${\displaystyle \det(X_{j})=x_{j}}$.

We now note that if ${\displaystyle A=(\mathbf {a} _{1},\ldots ,\mathbf {a} _{n})}$, then ${\displaystyle X_{j}=A^{-1}(\mathbf {a} _{1},\ldots ,\mathbf {a} _{j-1},A\mathbf {b} ,\mathbf {a} _{j+1},\ldots ,\mathbf {a} _{n})=A^{-1}A_{j}}$. Hence

${\displaystyle x_{j}=\det(A^{-1}A_{j})=\det(A^{-1})\det(A_{j})=\det(A)^{-1}\det(A_{j})}$,

where the last equality follows as in lemma 7.9.${\displaystyle \Box }$

Theorem 7.11 (Cramer's rule, matrix inversion):

Let ${\displaystyle A}$ be an ${\displaystyle n\times n}$ matrix with entries in a ring ${\displaystyle R}$. We recall that the cofactor matrix ${\displaystyle \operatorname {Cof} A}$ of ${\displaystyle A}$ is the matrix with ${\displaystyle (i,j)}$-th entry

${\displaystyle (-1)^{i+j}\det(A_{i,j})}$,

where ${\displaystyle A_{i,j}}$ is obtained from ${\displaystyle A}$ by crossing out the ${\displaystyle i}$-th row and ${\displaystyle j-}$-th column. We further recall that the adjugate matrix ${\displaystyle \operatorname {adj} (A)}$ was given by

${\displaystyle \operatorname {adj} (A):=\operatorname {Cof} (A)^{\mathsf {T}}}$.

With this definition, we have

${\displaystyle \operatorname {adj} (A)A=\det(A)I_{n}}$.

In particular, if ${\displaystyle \det(A)}$ is a unit within ${\displaystyle R}$, then ${\displaystyle A}$ is invertible and

${\displaystyle A^{-1}={\frac {1}{\det(A)}}\operatorname {adj} (A)}$.

Proof:

For ${\displaystyle j\in \{1,\ldots ,n\}}$, we set ${\displaystyle \mathbf {b} _{j}:=e_{j}=(0,\ldots ,0,1,0,\ldots ,0)^{T}}$, where the zero is at the ${\displaystyle j}$-th place. Further, we set ${\displaystyle L_{j}}$ to be the linear function from proof 1 of theorem 7.10, and ${\displaystyle M_{j}}$ its matrix. Then ${\displaystyle \operatorname {adj} (A)}$ is given by

${\displaystyle \operatorname {adj} (A)={\begin{pmatrix}-M_{1}-\\\vdots \\-M_{n}-\end{pmatrix}}}$

due to theorem 7.8. Hence,

${\displaystyle \operatorname {adj} (A)A={\begin{pmatrix}-M_{1}A-\\\vdots \\-M_{n}A-\end{pmatrix}}={\begin{pmatrix}\det(A)&0&\cdots &\cdots &0\\0&\det(A)&&0&\vdots \\\vdots &&\ddots &&\vdots \\\vdots &0&&\det(A)&0\\0&\cdots &\cdots &0&\det(A)\end{pmatrix}},}$

where we used the properties of ${\displaystyle L_{j}}$ established in proof 1 of theorem 7.10.${\displaystyle \Box }$

## The theorems

Now we may finally apply the machinery we have set up to prove the following two fundamental theorems.

Theorem 7.12 (the Cayley–Hamilton theorem):

Let ${\displaystyle M}$ be a finitely generated ${\displaystyle R}$-module, let ${\displaystyle \phi :M\to M}$ be a module morphism and let ${\displaystyle I\leq R}$ be an Ideal of ${\displaystyle R}$ such that ${\displaystyle \phi (M)\subseteq IM}$. Then there exist ${\displaystyle n\in \mathbb {N} }$ and ${\displaystyle a_{n-1},\ldots ,a_{1},a_{0}\in I}$ such that

${\displaystyle \phi ^{n}+a_{n-1}\phi ^{n-1}+\cdots +a_{1}\phi +a_{0}=0}$;

this equation is to be read as

${\displaystyle \forall m\in M:\phi ^{n}(m)+a_{n-1}\phi ^{n-1}(m)+\cdots +a_{1}\phi (m)+a_{0}m=0}$,

where ${\displaystyle \phi ^{n}(m)}$ means applying ${\displaystyle \phi }$ to ${\displaystyle m}$ ${\displaystyle n}$ times.

Note that the polynomial in ${\displaystyle \phi }$ is monic, that is, the leading coefficient is ${\displaystyle 1}$, the unit of the ring in question.

Proof: Assume that ${\displaystyle \{m_{1},\ldots ,m_{n}\}}$ is a generating set for ${\displaystyle M}$. Since ${\displaystyle \phi (M)\subseteq IM}$, we may write

${\displaystyle \phi (m_{j})=\sum _{k=1}^{n}b_{j,k}x_{k},~j\in \{1,\ldots ,n\}}$ (*),

where ${\displaystyle b_{k}\in I}$ for each ${\displaystyle k}$. We now define a new commutative ring as follows:

${\displaystyle {\tilde {R}}:=\{\phi ^{k}|k\in \mathbb {N} \}\cup R}$,

where we regard each element ${\displaystyle r}$ of ${\displaystyle R}$ as the endomorphism ${\displaystyle m\mapsto rm}$ on ${\displaystyle M}$. That is, ${\displaystyle {\tilde {R}}}$ is a subring of the endomorphism ring of ${\displaystyle M}$ (that is, multiplication is given by composition). Since ${\displaystyle \phi }$ is ${\displaystyle R}$-linear, ${\displaystyle {\tilde {R}}}$ is commutative.

Now to every ${\displaystyle n\times n}$ matrix ${\displaystyle A}$ with entries in ${\displaystyle {\tilde {R}}}$ we may associate a function

${\displaystyle A():M^{n}\to M^{n},A\left((x_{1},\ldots ,x_{n})^{T}\right):=\left(\sum _{k=1}^{n}a_{1,k}(x_{1}),\ldots ,\sum _{k=1}^{n}a_{1,k}(x_{1})\right)}$.

By exploiting the linearities of all functions involved, it is easy to see that for another ${\displaystyle n\times n}$ matrix with entries in ${\displaystyle {\tilde {R}}}$ called ${\displaystyle B}$, the associated function of ${\displaystyle AB}$ equals the composition of the associated functions of ${\displaystyle A}$ and ${\displaystyle B}$; that is, ${\displaystyle (AB)(x)=A(B(x))}$.

Now with this in mind, we may rewrite the system (*) as follows:

${\displaystyle A(x)=0}$,

where ${\displaystyle A}$ has ${\displaystyle j,k}$-th entry ${\displaystyle \delta _{j,k}\phi -b_{j,k}\in {\tilde {R}}}$. Now define ${\displaystyle B:=\operatorname {adj} (A)}$. From Cramer's rule (theorem 7.11) we obtain that

${\displaystyle BA=I_{n}\det(A)}$,

which is why

${\displaystyle (\det Ax_{1},\ldots ,\det Ax_{n})^{t}=(BA)(x)=B(A(x))=B(0)=\mathbf {0} }$, the zero vector.

Hence, ${\displaystyle \det A\in {\tilde {R}}}$ is the zero mapping, since it sends all generators to zero. Now further, as can be seen e.g. from the representation given in theorem 7.4, it has the form

${\displaystyle \phi ^{n}+a_{n-1}\phi ^{n-1}+\cdots +a_{1}\phi +a_{0}}$

for suitable ${\displaystyle a_{n-1},\ldots ,a_{0}\in I}$.${\displaystyle \Box }$

Theorem 7.13 (Nakayama's lemma):

Let ${\displaystyle R}$ be a ring, ${\displaystyle M}$ a finitely generated ${\displaystyle R}$-module and ${\displaystyle I\leq R}$ an ideal such that ${\displaystyle IM=M}$. Then there exists an ${\displaystyle x\equiv 1\mod I}$ such that ${\displaystyle xM=0}$.

Proof:

Choose ${\displaystyle \phi =\operatorname {Id_{M}} }$ in theorem 7.12 to obtain for ${\displaystyle m\in M}$ that

${\displaystyle \phi ^{n}(m)+a_{n-1}\phi ^{n-1}(m)+\cdots +a_{1}\phi (m)+a_{0}m=(1+a_{n-1}+\cdots +a_{0})m=0}$

for suitable ${\displaystyle a_{n-1},\ldots ,a_{0}\in I}$, since the identity is idempotent.${\displaystyle \Box }$