Bourne Shell Scripting/Control flow
So far we've talked about basics and theory. We've covered the different shells available and how to get shell scripts running in the Bourne Shell. We've talked about the Unix environment and we've seen that you have variables that control the environment and that you can use to store values for your own use. What we haven't done yet, though, is actually do anything. We haven't made the system act, jump through hoops, fetch the newspaper or do the dishes.
In this chapter it's time to get serious. In this chapter we talk programming — how to write programs that make decisions and execute commands. In this chapter we talk about control flow and command execution.
What is the difference between a program launcher and a command shell? Why is Bourne Shell a tool that has commanded power and respect the world over for decades and not just a stupid little tool you use to start real programs? Because Bourne Shell is not just an environment that launches programs: Bourne Shell is a fully programmable environment with the power of a full programming language at its command. We've already seen in Environment that Bourne Shell has variables in memory. But Bourne Shell can do more than that: it can make decisions and repeat commands. Like any real programming language, Bourne Shell has control flow, the ability to steer the computer.
Test: evaluating conditions
Before we can make decisions in shell scripts, we need a way of evaluating conditions. We have to be able to check the state of certain affairs so that we can base our decisions on what we find.
Strangely enough the actual shell doesn't include any mechanism for this. There is a tool for exactly this purpose called test (and it was literally created for use in shell scripts), but nevertheless it is not strictly part of the shell. The 'test' tool evaluates conditions and returns either true or false, depending on what it finds. It returns these values in the form of an exit status (in the $? shell variable): a zero for true and something else for false. The general form of the test command is
This test for the equality of two strings returns an exit status of zero. There is also a shorthand notation for 'test' which is usually more readable in scripts, namely square brackets:
[ condition ]
Note the spaces between the brackets and the actual condition – don't forget them in your own scripts. The equivalent of the example above in shorthand is
'Test' can evaluate a number of different kinds of conditions, to fit with the different kinds of tests that you're likely to want to carry out in a shell script. Most specific shells have added on to the basic set of available conditions, but Bourne Shell recognizes the following:
- -b file
- file exists and is a block special file
- -c file
- file exists and is a character special file
- -d file
- file exists and is a directory
- -f file
- file exists and is a regular data file
- -g file
- file exists and has its set-group-id bit set
- -k file
- file exists and has its sticky bit set
- -p file
- file exists and is a named pipe
- -r file
- file exists and is readable
- -s file
- file exists and its size is greater than zero
- -t [n]
- The open file descriptor with number n is a terminal device; n is optional, default 1
- -u file
- file exists and has its set-user-id bit set
- -w file
- file exists and is writable
- -x file
- file exists and is executable
- -n s
- s has non-zero length
- -z s
- s has zero length
- s0 = s1
- s0 and s1 are identical
- s0 != s1
- s0 and s1 are different
- s is not null (often used to check that an environment variable has a value)
- n0 -eq n1
- n0 is equal to n1
- n0 -ge n1
- n0 is greater than or equal to n1
- n0 -gt n1
- n0 is strictly greater than n1
- n0 -le n1
- n0 is less than or equal to n1
- n0 -lt n1
- n0 is strictly less than n1
- n0 -ne n1
- n0 is not equal to n1
Finally, conditions can be combined and grouped:
- \( B \)
- Parentheses are used for grouping conditions (don't forget the backslashes). A grouped condition (B) is true if B is true.
- ! B
- Negation; is true if B is false.
- B0 -a B1
- And; is true if B0 and B1 are both true.
- B0 -o B1
- Or; is true if either B0 or B1 is true.
Okay, so now we know how to evaluate some conditions. Let's see how we can make use of this ability to do some programming.
All programming languages need two things: a form of decision making or conditional execution and a form of repetition or looping. We'll get to looping later, for now let's concentrate on conditional execution. Bourne Shell supports two forms of conditional execution, the if-statement and the case-statement.
The if-statement is the most general of the two. It's general form is
... else command-list
This command is to be interpreted as follows:
- The command list following the if is executed.
- If the last command returns a status zero, the command list following the first then is executed and the statement terminates after completion of the last command in this list.
- If the last command returns a non-zero status, the command list following the first elif (if there is one) is executed.
- If the last command returns a status zero, the command list following the next then is executed and the statement terminates after completion of the last command in this list.
- If the last command returns a non-zero status, the command list following the next elif (if there is one) is executed and so on.
- If no command list following the if or an elif terminates in a zero status, the command list following the else (if there is one) is executed.
- The statement terminates. If the statement terminated without an error, the return status is zero.
It is interesting to note that the if-statement allows command lists everywhere, including in places where conditions are evaluated. This means that you can execute as many compound commands as you like before reaching a decision point. The only command that affects the outcome of the decision is the last one executed in the list.
In most cases though, for the sake of readability and maintainability, you will want to limit yourself to one command for a condition. In most cases this command will be a use of the 'test' tool.
The case-statement is sort of a special form of the if-statement, specialized in the kind of test demonstrated in the last example: taking a value and comparing it to a fixed set of expected values or patterns. The case statement is used very frequently to evaluate command line arguments to scripts. For example, if you write a script that uses switches to identify command line arguments, you know that there are only a limited number of legal switches. The case-statement is an elegant alternative to a potentially messy if-statement in such a case.
The general form of the case statement is
case value in
pattern0 ) command-list-0 ;;
pattern1 ) command-list-1 ;;
The value can be any value, including an environment variable. Each pattern is a regular expression and the command list executed is the one for the first pattern that matches the value (so make sure you don't have overlapping patterns). Each command list must end with a double semicolon. The return status is zero if the statement terminates without syntax errors.
If versus case: what is the difference?
So what exactly is the difference between the if- and case-statements? And what is the point of having two statements that are so similar? Well, the technical difference is this: the case-statement works off of data available to the shell (like an environment variable), whereas the if-statement works off the exit status of a program or command. Since fixed values and environment variables depend on the shell but the exit status is a concept general to the Unix system, this means that the if-statement is more general than the case-statement.
Let's look at a slightly larger example, just to put the two together and compare:
Note that this is a shell script and that it uses positional variables to capture command-line arguments. The script starts with an if-statement to check that we have the right number of arguments – note the use of 'test' to see if the value of variable $2 is not null and the exit status of 'test' to determine how the if-statement proceeds. If there are enough arguments, we assume the first argument is a name and start building the sentence that is the result of our script. Otherwise we write an error message (to stderr, the place to write errors; read all about it in Files and streams) and exit the script with a non-zero return value. Note that this else statement has a command list with more than one command in it.
Assuming we got through the if-statement without trouble, we get to the case-statement. Here we check the value of variable $2, which should be a food preference. If that value is either fruit or something starting with veg, we add a claim to the script result that some person is a vegetarian. If the value was exactly meat, the person is a meat eater. Anything else, he is an omnivore. Note that in that last case pattern clause we have to use curly braces in the variable substitution; that's because we want to add a letter n directly onto the existing value of sentence, without a space in between.
Let's put the script in a file called 'preferences.sh' and look at the effect of some calls of this script:
In addition to conditional execution mechanisms every programming language needs a means of repetition, repeated execution of a set of commands. The Bourne Shell has several mechanisms for exactly that: the while-statement, the until-statement and the for-statement.
The while loop
The while-statement is the simplest and most straightforward form of repetition statement in Bourne shell. It is also the most general. Its general form is this:
while command-list1 do command-list2 done
The while-statement is interpreted as follows:
- Execute the commands in command list 1.
- If the exit status of the last command is non-zero, the statement terminates.
- Otherwise execute the commands in command list 2 and go back to step 1.
- If the statement does not contain a syntax error and it ever terminates, it terminates with exit status zero.
Much like the if-statement, you can use a full command list to control the while-statement and only the last command in that list actually controls the statement. But in reality you will probably want to limit yourself to one command and, as with the if-statement, you will usually use the 'test' program for that command.
The while-statement is commonly used to deal with situations where a script can have an indeterminate number of command-line arguments, by using the shift command and the special '$#' variable that indicates the number of command-line arguments:
The until loop
The until-statement is also a repetition statement, but it is sort of the semantic opposite of the while-statement. The general form of the until-statement is
The interpretation of this statement is almost the same as that of the while-statement. The only difference is that the commands in command list 2 are executed as long as the last command of command list 1 returns a non-zero status. Or, to put it more simply: command list 2 is executed as long as the condition of the loop is not met.
Whereas while-statements are mostly used to establish some effect ("repeat until done"), until-statements are more commonly used to poll for the existence of some condition or to wait until some condition is met. For instance, assume some process is running that will write 10000 lines to a certain file. The following until-statement waits for the file to have grown to 10000 lines:
The for loop
In the section on Control flow, we discussed that the difference between if and case was that the first depended on command exit statuses whereas the second was closely linked to data available in the shell. That kind of pairing also exists for repetition statements: while and until use command exit statuses and for uses data explicitly available in the shell.
The for-statement loops over a fixed, finite set of values. Its general form is
for name in w1 w2 ...
This statement executes the command list for each value named after the 'in'. Within the command list, the "current" value wi is available through the variable name. The value list must be separated from the 'do' by a semicolon or a newline. And the command list must be separated from the 'done' by a semicolon or a newline. So, for example:
The for statement is used a lot to loop over command line arguments. For that reason the shell even has a shorthand notation for this use: if you leave off the 'in' and the values part, the command assumes $* as the list of values. For example:
In the last section on Control Flow we discussed the major programming constructs and control flow statements offered by the Bourne Shell. However, there are lots of other syntactic constructs in the shell that allow you to control the way commands are executed and to embed commands in other commands. In this section we discuss some of the more important ones.
Earlier, we looked at the if-statement as a method of conditional execution. In addition to this expansive statement the Bourne Shell also offers a method of directly linking two commands together and making the execution of one of them conditional on the outcome (the exit status) of the other. This is useful for making quick, inline decisions on command execution. But you probably wouldn't want to use these constructs in a shell script or for longer command sequences, because they aren't the most readable.
You can join commands together using the && and || operators. These operators (which you might recognize as borrowed from the C programming language) are short circuiting operators: they make the execution of the second command dependent on the exit status of the first and so can allow you to avoid unnecessary command executions.
The && operator joins two commands together and only executes the second if the exit status of the first is zero (i.e. the first command "succeeds"). Consider the following example:
In this example the deletion would be pointless if the file creation fails (because the file system is read-only, say). Using the && operator prevents the deletion from being attempted if the file creation fails. A similar – and possibly more useful – example is this:
In contrast to the && operator, the || operator executes the second command only if the exit status of the first command is not zero (i.e. it fails). Consider the following example:
For both these operators the exit status of the joined commands is the exit status of the last command that actually got executed.
You can join multiple commands into one command list by joining them using the ; operator, like so:
There is no conditional execution here; all commands are executed, even if one of them fails.
When joining commands into a command list, you can group the commands together for clarity and some special handling. There are two ways of grouping commands: using curly braces and using parentheses.
Grouping using curly braces can be used to enhance clarity. Using them doesn't add any semantics to joining using semicolons or newlines, but you must insert an extra semicolon or newline after your command list. Spaces between the braces and your command list are required for the shell to recognize the grouping. Here's an example:
Braces can also be used to group commands together to integrate them into pipeliness and redirect their input or output. This functions exactly like a function would in the same place.
The parentheses are far more interesting. When you group a command list with parentheses, it is executed... in a separate process. This means that whatever you do in the command list doesn't affect the environment in which you gave the command. Consider the example above again, with braces and parentheses:
Here's another one:
In the chapter on Environment we talked about variable substitution. The Bourne Shell also supports command substitution. This is sort of like variable substitution, but instead of a variable being replaced by its value a command is replaced by its output. We saw an example of this earlier when discussing the while-statement, where we assigned the outcome of an arithmetic expression evaluation to an environment variable.
Command substitution is accomplished using either of two notations. The original Bourne Shell used grave accents (`command`), which is still generally supported by most shells. Later on the POSIX 1003.1 standard added the $( command ) notation. Consider the following examples:
Regular expressions and metacharacters
Usually, in the day-to-day tasks that you do with your shell, you will want to be explicit and exact about which files you want to operate on. After all, you want to delete a specific file and not a random one. And you want to send your network communications to the network device file and not to the keyboard.
But there are times, especially in scripting, when you will want to be able to operate on more than one file at a time. For instance, if you have written a script that makes a regular backup of all the files in your home directory whose names end in ".dat". If there are a lot of those files, or there are more being made each day with new names each time, then you do not want to have to name all those files explicitly in your backup script.
We have also seen another example of not wanting to be too explicit: in the section on the case-statement, there is an example where we claim that somebody is a vegetarian if he likes fruit or anything starting with "veg". We could have included all sorts of options there and been explicit (although there are an infinite number of words you can make that start with "veg"). But we used a pattern instead and saved ourselves a lot of time.
For exactly these cases the shell supports a (limited) form of regular expressions: patterns that allow you to say something like "I mean every string, every sequence of characters, that looks sort of like this". The shell allows you to use these regular expressions anywhere (although they don't always make sense — for example, it makes no sense to use a regular expression to say where you want to copy a file). That means in shell scripts, in the interactive shell, as part of the case-statement, to select files, general strings, anything.
In order to create regular expressions you use one or more metacharacters. Metacharacters are characters that have special meaning to the shell and are automatically recognized as part of regular expressions. The Bourne shell recognizes the following metacharacters:
- Matches any string.
- Matches any single character.
- Matches any character enclosed in the angle brackets.
- Matches any character not enclosed in the angle brackets.
- Matches any string that matches pat0 or pat1 (only in case-statement patterns!)
Here are some examples of how you might use regular expressions in the shell:
When selecting files, the metacharacters match all files except files whose names start with a period ("."). Files that start with a period are either special files or are assumed to be configuration files. For that reason these files are semi-protected, in the sense that you cannot just pick them up with the metacharacters. In order to include these files when selecting with regular expressions, you must include the leading period explicitly. For example:
The example above shows a listing of period files. In this example the listing includes '.profile', which is the user configuration file for the Bourne Shell. It also includes the special directories '.' (which means "the current directory") and '..' (which is the parent directory of the current directory). You can address these special directories like any other. So for instance
is the same semantically as just 'ls' and
changes your working directory to the parent directory of the directory that was your working directory before.
When you introduce special characters like the metacharacters discussed in the previous section, you automatically get into situations when you really don't want those special characters evaluated. For example, assume that you have a file whose name includes an asterisk ('*'). How would you address that file? For example:
Clearly what is needed is a way of temporarily turning metacharacters off. The Bourne Shell built-in quoting mechanisms do exactly that. In fact, they do a lot more than that. For instance, if you have a file name with spaces in it (so that the shell cannot tell the different words in the file name belong together) the quoting mechanisms will help you deal with that problem as well.
There are three quoting mechanisms in the Bourne Shell:
- backslash, for single character quoting.
- single quotes, to quote entire strings.
- double quotes, to quote entire strings but still allow for some special characters.
The simplest of these is the backslash, which quotes the character that immediately follows it. So, for example:
So the backslash basically disables special character interpretation for the duration of one character. Interestingly, the newline character is also considered a special character in this context, so you can use the backslash to split commands to the interpreter over multiple lines. Like so:
The backslash escape also works for file names with spaces:
But what if you want to pass a backslash to the shell? Well, think about it. Backslash disables interpretation of one character, so if you want to use a backslash for anything else... then '\\' will do it!
So we've seen that a backslash allows you to disable special character interpretation for a single character by quoting it. But what if you want to quote a lot of special characters all at once? As you've seen above with the filename with spaces, you can quote each special character separately, but that gets to be a drag really quickly. Usually it's quicker, easier and less error-prone simply to quote an entire string of characters in one go. To do exactly that you use single quotes. Two single quotes quote the entire string they surround, disabling interpretation of all special characters in that string — with the exception of the single quote (so that you can stop quoting as well). For example:
So let's try something. Let's assume that for some strange reason we would like to print three asterisks ("***"), then a space, then the current working directory, a space and three more asterisks. We know we can disable metacharacter interpretation with single quotes so this should be no biggy, right? And to make life easy, the built-in command 'pwd' prints the working directory, so this is really easy:
So what went wrong? Well, the single quotes disable interpretation of all special characters. So the grave accents we used for the command substitution didn't work! Can we make it work a different way? Like by using the Path of Working Directory environment variable ($PWD)? Nope, the $-character won't work either.
This is a typical Goldilocks problem. We want to quote some special characters, but not all. We could use backslashes, but that doesn't do enough to be convenient (it's too cold). We can use single quotes, but that kills too many special characters (it's too hot). What we need is quoting that's juuuust riiiiight. More to the point, what we want (and more often than you think) is to disable all special character interpretation except variable and command substitution. Because this is a common desire the shell supports it through a separate quoting mechanism: the double quote. The double quote disables all special character interpretation except the grave accent (command substitution), the $ (variable substitution) and the double quote (so you can stop quoting). So the solution to our problem above is:
By the way, we actually cheated a little bit above for educational purposes (hey, you try coming up with these examples); we could also have solved the problem like this: