An Awk Primer/Using Awk from the Command Line

From Wikibooks, open books for an open world
Jump to navigation Jump to search

The Awk programming language was designed to be simple but powerful. It allows a user to perform relatively sophisticated text-manipulation operations through Awk programs written on the command line.

For example, suppose I want to turn a document with single-spacing into a document with double-spacing. I could easily do that with the following Awk program:

   awk '{print ; print ""}' infile > outfile

Notice how single-quotes (' ') are used to allow using double-quotes (" ") within the Awk expression. This "hides" special characters from the shell. We could also do this as follows:

   awk "{print ; print \"\"}" infile > outfile 

—but the single-quote method is simpler.

This program does what it supposed to, but it also doubles every blank line in the input file, which leaves a lot of empty space in the output. That's easy to fix, just tell Awk to print an extra blank line if the current line is not blank:

   awk '{print ; if (NF != 0) print ""}' infile > outfile
  • One of the problems with Awk is that it is ingenious enough to make a user want to tinker with it, and use it for tasks for which it isn't really appropriate. For example, we could use Awk to count the number of lines in a file:
   awk 'END {print NR}' infile

—but this is dumb, because the "wc (word count)" utility gives the same answer with less bother: "Use the right tool for the job."

Awk is the right tool for slightly more complicated tasks. Once I had a file containing an email distribution list. The email addresses of various different groups were placed on consecutive lines in the file, with the different groups separated by blank lines. If I wanted to quickly and reliably determine how many people were on the distribution list, I couldn't use "wc", since, it counts blank lines, but Awk handled it easily:

   awk 'NF != 0 {++count} END {print count}' list
  • Another problem I ran into was determining the average size of a number of files. I was creating a set of bitmaps with a scanner and storing them on a disk. The disk started getting full and I was curious to know just how many more bitmaps I could store on the disk.

I could obtain the file sizes in bytes using "wc -c" or the "list" utility ("ls -l" or "ll"). A few tests showed that "ll" was faster. Since "ll"

lists the file size in the fifth field, all I had to do was sum up the fifth field and divide by NR. There was one slight problem, however: the first line of the output of "ll" listed the total number of sectors used, and had to be skipped.

No problem. I simply entered:

   ll | awk 'NR!=1 {s+=$5} END {print "Average: " s/(NR-1)}'

This gave me the average as about 40 KB per file.

  • Awk is useful for performing simple iterative computations for which a more sophisticated language like C might prove overkill. Consider the Fibonacci sequence:

   1 1 2 3 5 8 13 21 34 ...

Each element in the sequence is constructed by adding the two previous elements together, with the first two elements defined as both "1". It's a discrete formula for exponential growth. It is very easy to use Awk to generate this sequence:

   awk 'BEGIN {a=1;b=1; while(++x<=10){print a; t=a;a=a+b;b=t}; exit}'

This generates the following output data:

   1    2    3    5    8    13    21    34    55    89