# Statistical Analysis: an Introduction using R/R/Logical operations

When accessing elements of vectors, we saw how to use a simple logical expression involving the less than sign (`<`) to produce a logical vector, which could then be used to select elements less than a certain value. This type of logical operation is very useful thing to be able to do. As well as `<`, there are a handful of other comparison operators. Here is the full set (See `?Comparison` for more details)
• `<` (less than) and `<=` (less than or equal to)
• `>` (greater than) and `>=` (greater than or equal to)
• `==` (equal to[1]) and `!=` (not equal to)

Even more flexibility can be gained by combining logical vectors using and, or, and not. For example, we might want to identify which US states have an area less than 10 000 or greater than 100 000 square miles, or to identify which have an area greater than 100 000 square miles and which have a short name. The code below shows how can be used to do this, using the following R symbols:

• `&` ("and")
• `|` ("or")
• `!` ("not")

When using logical vectors, the following functions are particularly useful, as illustrated below

• `which()` identifies which elements of a logical vector are `TRUE`
• `sum()` can be used to give the number of elements of a logical vector which are `TRUE`. This is because `sum()` forces its input to be converted to numbers, and if TRUE and FALSE are converted to numbers, they take the values 1 and 0 respectively.
• `ifelse()` returns different values depending on whether each element of a logical vector is TRUE or FALSE. Specifically, a command such as `ifelse(aLogicalVector, vectorT, vectorF)` takes `aLogicalVector` and returns, for each element that is `TRUE`, the corresponding element from `vectorT`, and for each element that is `FALSE`, the corresponding element from `vectorF`. An extra elaboration is that if `vectorT` or `vectorF` are shorter than `aLogicalVector` they are extended by duplication to the correct length.
###### Input:
1. ```### In these examples, we'll reuse the American states data, especially the state names
```
2. ```### To remind yourself of them, you might want to look at the vector "state.names"
```
3. ```
```
4. ```nchar(state.name)       # nchar() returns the number of characters in strings of text ...
```
5. ```nchar(state.name) <= 6  #so this indicates which states have names of 6 letters or fewer
```
6. ```ShortName <- nchar(state.name) <= 6         #store this logical vector for future use
```
7. ```sum(ShortName)          #With a logical vector, sum() tells us how many are TRUE (11 here)
```
8. ```which(ShortName)        #These are the positions of the 11 elements which have short names
```
9. ```state.name[ShortName]   #Use the index operator [] on the original vector to get the names
```
10. ```state.abb[ShortName]    #Or even on other vectors (e.g. the 2 letter state abbreviations)
```
11. ```
```
12. ```isSmall <- state.area < 10000  #Store a logical vector indicating states <10000 sq. miles
```
13. ```isHuge  <- state.area > 100000 #And another for states >100000 square miles in area
```
14. ```sum(isSmall)                   #there are 8 "small" states
```
15. ```sum(isHuge)                    #coincidentally, there are also 8 "huge" states
```
16. ```
```
17. ```state.name[isSmall | isHuge]   # | means OR. So these are states which are small OR huge
```
18. ```state.name[isHuge & ShortName] # & means AND. So these are huge AND with a short name
```
19. ```state.name[isHuge & !ShortName]# ! means NOT. So these are huge and with a longer name
```
20. ```
```
21. ```### Examples of ifelse() ###
```
22. ```
```
23. ```ifelse(ShortName, state.name, state.abb) #mix short names with abbreviations for long ones
```
24. ```# (think of this as "*if* ShortName is TRUE then use state.name *else* use state.abb)
```
25. ```
```
26. ```### Many functions in R increase input vectors to the correct size by duplication ###
```
27. ```ifelse(ShortName, state.name, "tooBIG")   #A silly example: the 3rd argument is duplicated
```
28. ```size <- ifelse(isSmall, "small", "large") #A more useful example, for both 2nd & 3rd args
```
29. ```size                                      #might be useful as an indicator variable?
```
30. ```ifelse(size=="large", ifelse(isHuge, "huge", "medium"), "small") #A more complex example
```
###### Result:
```> ### In these examples, we'll reuse the American states data, especially the state names
> ### To remind yourself of them, you might want to look at the vector "state.names"
>
> nchar(state.name)       # nchar() returns the number of characters in strings of text ...

```
```
```
```[1]  7  6  7  8 10  8 11  8  7  7  6  5  8  7  4  6  8  9  5  8 13  8  9 11  8  7  8  6 13
```
```
[30] 10 10  8 14 12  4  8  6 12 12 14 12  9  5  4  7  8 10 13  9  7
> nchar(state.name) <= 6  #so this indicates which states have names of 6 letters or fewer

```
```[1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
```
```
[15]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[29] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[43]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
> ShortName <- nchar(state.name) <= 6         #store this logical vector for future use
> sum(ShortName)          #With a logical vector, sum() tells us how many are TRUE (11 here)
[1] 11
> which(ShortName)        #These are the positions of the 11 elements which have short names

```
```[1]  2 11 12 15 16 19 28 35 37 43 44
```
```
> state.name[ShortName]   #Use the index operator [] on the original vector to get the names

```
```[1] "Alaska" "Hawaii" "Idaho"  "Iowa"   "Kansas" "Maine"  "Nevada" "Ohio"   "Oregon"
```
```
[10] "Texas"  "Utah"
> state.abb[ShortName]    #Or even on other vectors (e.g. the 2 letter state abbreviations)

```
```[1] "AK" "HI" "ID" "IA" "KS" "ME" "NV" "OH" "OR" "TX" "UT"
```
```
>
> isSmall <- state.area < 10000  #Store a logical vector indicating states <10000 sq. miles
> isHuge  <- state.area > 100000 #And another for states >100000 square miles in area
> sum(isSmall)                   #there are 8 "small" states
[1] 8
> sum(isHuge)                    #coincidentally, there are also 8 "huge" states
[1] 8
>
> state.name[isSmall | isHuge]   # | means OR. So these are states which are small OR huge

```
```[1] "Alaska"        "Arizona"       "California"    "Colorado"      "Connecticut"
[6] "Delaware"      "Hawaii"        "Massachusetts" "Montana"       "Nevada"
```
```
[11] "New Hampshire" "New Jersey"    "New Mexico"    "Rhode Island"  "Texas"
[16] "Vermont"
> state.name[isHuge & ShortName] # & means AND. So these are huge AND with a short name
> state.name[isHuge & !ShortName]# ! means NOT. So these are huge and with a longer name
[1] "Arizona"    "California" "Colorado"   "Montana"    "New Mexico"
>
> ### Examples of ifelse() ###
>
> ifelse(ShortName, state.name, state.abb) #mix short names with abbreviations for long ones

```
```[1] "AL"     "Alaska" "AZ"     "AR"     "CA"     "CO"     "CT"     "DE"     "FL"
```
```
[10] "GA"     "Hawaii" "Idaho"  "IL"     "IN"     "Iowa"   "Kansas" "KY"     "LA"
[19] "Maine"  "MD"     "MA"     "MI"     "MN"     "MS"     "MO"     "MT"     "NE"
[28] "Nevada" "NH"     "NJ"     "NM"     "NY"     "NC"     "ND"     "Ohio"   "OK"
[37] "Oregon" "PA"     "RI"     "SC"     "SD"     "TN"     "Texas"  "Utah"   "VT"
[46] "VA"     "WA"     "WV"     "WI"     "WY"
> # (think of this as "*if* ShortName is TRUE then use state.name *else* use state.abb)
>
> ### Many functions in R increase input vectors to the correct size by duplication ###
> ifelse(ShortName, state.name, "tooBIG")   #A silly example: the 3rd argument is duplicated

```
```[1] "tooBIG" "Alaska" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG"
```
```
[10] "tooBIG" "Hawaii" "Idaho"  "tooBIG" "tooBIG" "Iowa"   "Kansas" "tooBIG" "tooBIG"
[19] "Maine"  "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG"
[28] "Nevada" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "Ohio"   "tooBIG"
[37] "Oregon" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG" "Texas"  "Utah"   "tooBIG"
[46] "tooBIG" "tooBIG" "tooBIG" "tooBIG" "tooBIG"
> size <- ifelse(isSmall, "small", "large") #A more useful example, for both 2nd & 3rd args
> size                                      #might be useful as an indicator variable?

```
```[1] "large" "large" "large" "large" "large" "large" "small" "small" "large" "large"
```
```
[11] "small" "large" "large" "large" "large" "large" "large" "large" "large" "large"
[21] "small" "large" "large" "large" "large" "large" "large" "large" "small" "small"
[31] "large" "large" "large" "large" "large" "large" "large" "large" "small" "large"
[41] "large" "large" "large" "large" "small" "large" "large" "large" "large" "large"
> ifelse(size=="large", ifelse(isHuge, "huge", "medium"), "small") #A more complex example

```
```[1] "medium" "huge"   "huge"   "medium" "huge"   "huge"   "small"  "small"  "medium"
```
```
[10] "medium" "small"  "medium" "medium" "medium" "medium" "medium" "medium" "medium"
[19] "medium" "medium" "small"  "medium" "medium" "medium" "medium" "huge"   "medium"
[28] "huge"   "small"  "small"  "huge"   "medium" "medium" "medium" "medium" "medium"
[37] "medium" "medium" "small"  "medium" "medium" "medium" "huge"   "medium" "small"
[46] "medium" "medium" "medium" "medium" "medium"

```

If you have done any computer programming, you may be more used to dealing with logic in the context of "if" statements. While R also has an `if()` statement, it is less useful when dealing with vectors. For example, the following R expression

```if(aVariable == 0) then print("zero") else print("not zero")
```
expects `aVariable` to be a single number: it outputs "zero" if this number is 0, or "not zero" if it is a number other than zero[2]. If `aVariable` is a vector of 2 values or more, only the first element counts: everything else is ignored[3]. There are also logical operators which ignore everything but the first element of a vector: these are `&&` for AND and `||` for OR[4].

## Notes

1. Note that, when using continuous (fractional) numbers, rounding error may mean that results of calculations are not exactly equal to each other, even if they seem as if they should be. For this reason, you should be careful when using == with continuous numbers. R provides the function all.equal to help in this case
2. But unlike `ifelse`, it can't cope with `NA` values
3. For this reason, using `==` in `if` statements may not be a good idea, see the Note in `?"=="` for details.
4. These are particularly used in more advanced computer programming in R, see ?"&&" for details