An Awk Primer/Awk Invocation and Operation

From Wikibooks, open books for an open world
Jump to: navigation, search

Awk is invoked as follows:

awk -Fch  -f program-file  variables  input-files

Each parameter given to Awk is optional.

Field Separator[edit]

The first option given to Awk, -F, lets you change the field separator. Normally, each "word" or field in our data files is separated by a space. That can be changed to any one character. Many files are tab-delimited, so each data field (such as metal, weight, country, description, etc.) is separated by a tab. Using tabs allows spaces to be included in a field. Other common field separators are colons or semicolons.

For example, the file "/etc/passwd" (on Unix or Linux systems) contains a list of all users, along with some data. The each field is separated with a colon. This example program prints each user's name and ID number:

awk -F:  '{ print $1, $3 }' /etc/passwd

Notice the colon used as a field separator. This program would not work without it.

Program File[edit]

An Awk program has the general form:

BEGIN { initializations }  search pattern 1 { program actions } search pattern 2 { program actions } ... END { final actions }

Again, each part is optional if it is not needed.

If you type the Awk program into a separate file, use the -f option to tell Awk the location and name of that file. For larger, more complex programs, you will definitely want to use a program file. This allows you to put each statement on a separate line, making liberal use of spacing and indentation to improve readability. For short, simple programs, you can type the program directly on the command line.

If the Awk program is written on the command line, it should be enclosed in single quotes instead of double quotes to prevent the shell from interpreting characters within the program as special shell characters. Please remember that the COMMAND.COM shell (for Windows and DOS) does not allow use of single quotes in this way. Naturally, if such interpretation is desired, double quotes can be used. Those special shell characters in the Awk program that the shell should not interpret should be preceded with a backslash.

Variables[edit]

It is also possible to initialize Awk variables on the command line. This is obviously only useful if the Awk program is stored in a file, or if it is an element in a shell script. Any initial values needed in a script written on the command-line can be written as part of the program text.

Consider the program example in the previous chapter to compute the value of a coin collection. The current prices for silver and gold were embedded in the program, which means that the program would have to be modified every time the price of either metal changed. It would be much simpler to specify the prices when the program is invoked.

The main part of the original program was written as:

/gold/    { num_gold++; wt_gold += $2 }
/silver/  { num_silver++; wt_silver += $2 } 
END {
    val_gold   = 485 * wt_gold
    val_silver = 16 * wt_silver
    ...

The prices of gold and silver could be specified by variables, say, pg and ps:

END {
    val_gold   = pg * wt_gold
    val_silver = ps * wt_silver
    ...

The program would be invoked with variable initializations in the command line as follows:

awk -f summary.awk pg=485 ps=16 coins.txt

This yields the same results as before. Notice that the variable initializations are listed as pg=485 and ps=16, and not pg = 485 and ps = 16; including spaces is not recommended as it might confuse command-line parsing.

Data File(s)[edit]

At the end of the command line comes the data file. This is the name of the file that Awk should process with your program, like "coins.txt" in our previous examples.

Multiple data files can also be specified. Awk will scan one after another and generate a continuous output from the contents of the multiple files, as if they were just one long file.

Practice[edit]

  1. If you haven't already, try running the program from "Field Separator" to list all users. See what happens without the -F:. (If you're not using Unix or Linux, sorry; it won't work.)
  2. Write an Awk program to convert "coins.txt" (from the previous chapters) into a tab-delimited file. This will require "piping", which varies depending on your system, but you should probably write something like > tabcoins.txt to send Awk's output to a new file instead of the screen.
  3. Now, rerun "summary.awk" with -F'\t'. The single-quotes are needed so that Awk will process "\t" as a tab rather that the two characters "\" and "t". Fields can now contain spaces without harming the output. Try changing some metals to "pure gold" or "98% pure silver" to see that it works.
  4. Experiment with some of the other command-line options, like multiple input files.
  5. Write a program that acts as a simple calculator. It does not need an input file; let it receive input from the keyboard. The input should be two numbers with an operator (like + or -) in between, all separated by spaces. Match lines containing these patterns and output the result.
Open this box if you need a hint for #5.

Your program should contain lines like this:

$2=="+" { print ($1 + $3) }


In the next chapter, you will be introduced to Awk's most notable feature: pattern matching.