An Awk Primer/Using Awk from the Command Line

From Wikibooks, open books for an open world
Jump to: navigation, search

The Awk programming language was designed to be simple but powerful. It allows a user to perform relatively sophisticated text-manipulation operations through Awk programs written on the command line.

For example, suppose I want to turn a document with single-spacing into a document with double-spacing. I could easily do that with the following Awk program:

   awk '{print ; print ""}' infile > outfile

Notice how single-quotes (' ') are used to allow using double-quotes (" ") within the Awk expression. This "hides" special characters from the shell. We could also do this as follows:

   awk "{print ; print \"\"}" infile > outfile 

-- but the single-quote method is simpler.

This program does what it supposed to, but it also doubles every blank line in the input file, which leaves a lot of empty space in the output. That's easy to fix, just tell Awk to print an extra blank line if the current line is not blank:

   awk '{print ; if (NF != 0) print ""}' infile > outfile
  • One of the problems with Awk is that it is ingenious enough to make a user want to tinker with it, and use it for tasks for which it isn't really appropriate. For example, we could use Awk to count the number of lines in a file:
   awk 'END {print NR}' infile

-- but this is dumb, because the "wc (word count)" utility gives the same answer with less bother: "Use the right tool for the job."

Awk is the right tool for slightly more complicated tasks. Once I had a file containing an email distribution list. The email addresses of various different groups were placed on consecutive lines in the file, with the different groups separated by blank lines. If I wanted to quickly and reliably determine how many people were on the distribution list, I couldn't use "wc", since, it counts blank lines, but Awk handled it easily:

   awk 'NF != 0 {++count} END {print count}' list
  • Another problem I ran into was determining the average size of a number of files. I was creating a set of bitmaps with a scanner and storing them on a disk. The disk started getting full and I was curious to know just how many more bitmaps I could store on the disk.

I could obtain the file sizes in bytes using "wc -c" or the "list" utility ("ls -l" or "ll"). A few tests showed that "ll" was faster. Since "ll"

lists the file size in the fifth field, all I had to do was sum up the fifth field and divide by NR. There was one slight problem, however: the first line of the output of "ll" listed the total number of sectors used, and had to be skipped.

No problem. I simply entered:

   ll | awk 'NR!=1 {s+=$5} END {print "Average: " s/(NR-1)}'

This gave me the average as about 40 KB per file.

  • Awk is useful for performing simple iterative computations for which a more sophisticated language like C might prove overkill. Consider the Fibonacci sequence:

   1 1 2 3 5 8 13 21 34 ...

Each element in the sequence is constructed by adding the two previous elements together, with the first two elements defined as both "1". It's a discrete formula for exponential growth. It is very easy to use Awk to generate this sequence:

   awk 'BEGIN {a=1;b=1; while(++x<=10){print a; t=a;a=a+b;b=t}; exit}'

This generates the following output data:

   1    2    3    5    8    13    21    34    55    89