Using SPSS and PASW/Understanding the missing values dialog

From Wikibooks, open books for an open world
< Using SPSS and PASW
Jump to: navigation, search

Coding for Missing Values[edit]

There is a considerable literature on the treatment of missing data (see for example Allison (2001) for references[1]) and it is not my intention to deal with the methodological issues here. Rather I want to explain the missing values dialog in a little more detail. Here is the dialog:

11creatingnewvariables08.png

Notice that you can give up to three discrete integer or string values or a range to stand for missing data. If you want to analyse the missing values you have in your data (for example you may have data missing at random, or missing completely at random or missing not at random), you may wish to use more than one indicator but if you are only interested in eliminating these cases (or data points) then one single indicator is enough. If you variable is numeric then the missing data code must also be numeric. Wherever possible choose a missing value code that is logically impossible as a value for this variable. So, for example, you might give -1 for age. You may be tempted to give 999 as the missing code for age, but however unlikely it is that you will observe this value, it is better to choose the impossible over the improbable. For strings you can use any string, so NA for not applicable or plain old missing.

When you have completed the missing values dialog in variable view, you must still input the data. You can do this by hand and if you only have a few instances then this is a reasonable approach. First sort the data on the variable concerned so that the empty cells appear first - this makes your task easier.

You will have noticed that for numeric variables, SPSS will indicate missing data with a point . in the cell, while for string variables the cell is blank by default. SPSS by default assigns all empty numeric data the value system missing but it does not do the same for empty string data. (And on reflection we see why: we recognise the difference between the value 'zero' and missing numeric data but it is not possible to distinguish the null string from a blank).

If you have a large number of missing numeric data points, you can recode the system missing data using Transform → Recode into the same variable dialog. On the left hand side of the dialog, choose system missing as the old value and then your missing code as the new value on the right hand side.

For string data you can recode but instead of system missing as the old value, you input a space in the old value text input box.

Notes[edit]

  1. Allison, P. D. ((2001).). Missing data.. Thousand Oaks, CA: Sage Publications..