Mathematical Proof and the Principles of Mathematics/Logic/Axioms and equality
We've now covered most of what is known as first order logic, or at least we've covered what we'll need for the rest of this book. There are second and higher order logics which we'll only describe briefly. Second order logic concerns predicates that takes first order predicates as arguments rather than the objects of the universe of discourse. For example a relation is called reflexive if
- For all ,
- is a reflexive relation
is then a second order predicate. Some care must be used in this direction to avoid the possibility of predicates being applied to themselves, otherwise self-referential statements such as the Liar's paradox discussed earlier may appear, contradictions may ensue, and the whole system may collapse. We'll see more on this later.
While we will, from time to time, make definitions in the form of second order predicates, but these should be thought of here as linguistic shortcuts which can be replaced by their definitions.
We've gone about as far as possible without the introduction of axioms, but we'll need them to go forward so it's time to talk about them in detail.
So far, using rules of inference alone, we've only proved tautologies, in other words statements which don't really say anything meaningful. They may work as exercises, but in terms of saying anything interesting about the universe they aren't useful. It was realized even before Euclid that, in order to avoid circular argument, some initial assumptions must be made, logical bootstraps as you might call them.
The process goes something like this:
- Make a list of assumptions about the universe of discourse. These assumptions are called axioms.
- Use logic to prove new and hopefully interesting statements. These statements are called theorems.
This list of axioms is sometimes called an axiom system. Where axioms come from is an interesting question. They might come philosophy, experience, or by abstracting properties of other systems.
Useful properties of axiom systems
To be useful, axioms preferably have the following properties:
- Consistency (there are some statements which are not theorems): This is a must-have feature, otherwise the system does not distinguish between theorems and non-theorems and is therefore useless. An equivalent way of saying this is that the statement is not a theorem, in other words it's not possible to derive a contradiction from the axioms.
- Completeness (every statement has either a proof or disproof): This is a should-have feature in that it's desirable to be able to decide whether a given statement is true or not. In other words there should be no unresolvable conjectures where neither a proof or counterexample is possible. Note that this is not the same as saying that the statement or not is a theorem since we may know or not is True without knowing which alternative it is. (This is somewhat controversial though; see the section on non-standard logic.) Note also that there many conjectures, some famous, for which we don't currently have a proof of or a proof of not . This only means that a proof has not been found though, not that a proof is impossible.
- Independence (none of the axioms can be proved from the others): This is partly for aesthetic and partly for practical reasons. Aesthetically, the system is cleaner and more elegant if there no extra, unneeded axioms. As for practicality, suppose you have some system which you're trying to model with the axioms. The first thing you would need to do is verify that the axioms do, in fact, hold in your system. Unnecessary axioms increase the work load for this with no added benefit.
- Aside: to show that an axiom is independent of the others, one finds a mathematical system in which all the axioms hold except for the given one. This shows that the given axiom cannot be proved from the others because if it could, it would hold in the given system.
- Uncontroversial (there is general agreement as to which axioms should be included): If this does not hold then the conclusions of the theory may be disputed.
- Power (they should be simple to describe, yet imply a plethora of rich mathematical results): This ultimately determines how useful the theory is. Sometimes the same set of axioms can be used to model different systems, and what you learn from the theory is applicable to both.
Given an axiomatic system, it would be comforting to be able to prove in some way that it's consistent and complete. Unfortunately, this was shown to impossible in the 1930's by Kurt Gödel, at least for any system usable for mathematics.
Using an axiom
We can now introduce another rule of inference:
- If is an axiom then deduce
In other words you can use any axiom in a proof with no justification other than it's an axiom.
Axioms of equality
Our first axioms have to do with equality. While this technically still falls under the subject of logic, we're now close enough to see mathematics on the horizon.
Note that we will never actually define the phrase equals since it's impossible to do so without being circular. We can list some axioms which tell us how equal objects behave, and in some sense this serves as a definition. We can also give an informal description of what equals means, which is meant to be a guide to intuition and to help make the axioms seem plausible, but this carries no logical weight and can't be used in a proof. We'll assume you already have an intuitive idea of what equals means so we'll skip that part in this case.
- Axiom E0: There is a relation '=', defined on the universe of discourse, written and read equals .
Relations are often written using infix notation instead of prefix notation . This axiom basically just reserves the '=' symbol as having a special meaning.
- Axiom E1 (Identity): For all , .
This seems self-evident given the intuitive concept of equality. Ayn Rand fans may recognize this as the line "A is A" from Atlas Shrugged. This is also called the Reflexive property.
- Axiom E2 (Substitution): If is a predicate, then for all and , and implies .
This isn't actually a single axiom but what is called an axiom schema. (The word 'schema' here is a very technical term which basically means recipe or blueprint; just replace 'schema' with 'recipe' every time you see it an you won't go far wrong.) You get a new axiom for every predicate, so it's a compact way of getting any number of axioms. We're starting to relax the logical formalism here in phrasing here as well. It's clear what the statement means so there's little need to adhere strictly to the phrasing and parentheses used in previous pages.
Properties of equality
We can now prove two of the most important properties of equality from the axioms.
- Theorem E3 (Symmetry): For all and , implies .
- Proof: Choose and and assume . Let be the predicate . Then by Axiom E1. Also, so implies . So which is the same as saying . We have implies for arbitrary and , therefore for all and , implies .
Note that we're going to be phasing out the tabular style of proof at least in the text, leaving it as an exercise for the reader to write out all the details.
- Theorem E4 (Transitivity): For all , and , and imply .
- Proof: This is left as an exercise with the following hint. Choose , and , and assume and be the predicate . (At this point we leave it to the reader to fill in the rest of the proof.)
In addition to axioms, mathematicians also use definitions as first principles. We can use logical connectives and previously defined statements and predicates to define new statements and predicates as we have been doing all along. But we also want to be able to define an object from the properties it has. But what do you need in order for such a definition to make sense? Let's consider a predicate and try to figure out what is needed for the statement
- Define to be the object for which is True.
to be a meaningful definition.
First, we need such an to exist. So we require the statement
- (Existence) For some ,
to be true. It makes sense to require this, otherwise we would have both and For all , not which would lead to a contradiction.
We need another requirement though. Suppose one person picks an to be 'the' object for which is True, and another person picks to be 'the' object for which is True. Now suppose there is some property of which is not shared by , in other words there is a predicate for which but not . This leads to a contradiction again since both and are supposed to be the same object, but is True for one and false for the other. So we need to guarantee that if and then there is no predicate for which but not . But axiom E2 gives us just that condition if . So we add another condition that the statement
- (Uniqueness) For all and , and imply
These two conditions, existence and uniqueness, seem to be enough. We can then say the expression
- The for which
is well-defined if has the existence and uniqueness properties. We can declare , or whatever other symbol we wish to use, as the name of object in our universe of discourse and take
to be an axiom. If does not have both the existence and uniqueness properties then we are certainly free to write down the expression, but it's doesn't have any meaning, in other words it's undefined. We'll introduce more compact notation in a bit, but first we'll generalize this idea.
Instead of a predicate, lets start with a relation . This time we want to know if
- Define to be the object for which is True.
to be a meaningful. This is starting to become the idea of a function, but we're reserving the word 'function' for a more complicated structure involving sets in the next chapter. We've used the word 'function' already, but as a guide to intuition rather than as a definition.
Following the same kind of reasoning as in the previous section, we say
- The for which
is well-defined if
- For some ,
- For all , and , and imply
We say a relation is a functional relation if it satisfies the second condition.
Similarly, for a relation in three variables, we say
- The for which
is well-defined if
- For some ,
- For all , , and , and imply
We can continue this for as many variables as are needed. The relation is a functional relation if it satisfies the second condition as before.
In the case where the expression "The for which " is well-defined, then write for this as a standard notation. We're overloading the parentheses notation here, but this shouldn't be a problem because you can tell what is meant by counting the parameters.
Going back to predicates, we take to mean the for which .
To summarize, if is a relation in at least one variable, we take to be the value of for which provided
- For some ,
- For all , , ... and , and imply .
If the two conditions are not met then the expression is undefined and does not represent anything meaningful. If is defined, then
is an axiom.
As an exercise, apply this to the equality relation. Namely, if is defined to be , then prove that for all , is defined, and .
Alternate approaches (optional)
The approach we've used here isn't the only one available. Another way is to simply declare that a function takes 0 or more objects as parameters and returns another object rather than a truth value as for relations. A function of 0 variables is then the same as a constant. In this scenario, you need an additional axiom for '=':
- If is any function, then for all and , and is defined imply is defined and .
Note that we don't want to assume has a value for all since that would unduly restrict the usefulness of the concept.
Another approach is the one taken by Bourbaki, following Hilbert's epsilon calculus though with different notation. In that scenario, the phrase
- Some for which
always defines an . In order to avoid the issue where different values might be picked by different people, it's assumed that the value is determined by the predicate in some way, so everyone who picks an will get the same answer. This requires the additional axiom:
- (For all , iff ) implies (Some for which )=(Some for which )
The requirement that everyone will get the same when choosing one for which seems like a big ask. In the case where has the existence and uniqueness properties, the phrase "Some for which " and "The for which " amount to the same thing.
We've chosen the approach we're using here because in future chapters we will come across situations where it's very easy to fooled by a false definition. In other words you may think you've defined a function, say, but really haven't defined anything because the logical requirements aren't met. The approach we're using spells out those requirements explicitly so once they've been checked we can move on with confidence that there are no hidden errors.