An Awk Primer/Nawk

From Wikibooks, open books for an open world
Jump to: navigation, search

The original version of Awk was developed in 1977. It was optimized for throwing together "one-liners" or short, quick-and-dirty programs. However, some users liked Awk so much that they used it for much more complicated tasks. To quote the language's authors: "Our first reaction to a program that didn't fit on one page was shock and amazement." Some users regarded Awk as their primary programming tool, and many had in fact learned programming using Awk.

After the authors got over their initial consternation, they decided to accept the fact, and enhance Awk to make it a better general-purpose programming tool. The new version of Awk was released in 1985. The new version is often, if not always, known as Nawk ("New Awk") to distinguish it from the old one.

  • Nawk incorporates several major improvements. The most important improvement is that users can define their own functions. For example, the following Nawk program implements the "signum" function:
   {for (field=1; field<=NF; ++field) {print signum($field)}};

    function signum(n) {
       if (n<0)
       return -1
       else if (n==0) return 0
           return 1}

Function declarations can be placed in a program wherever a match-action clause can. All parameters are local to the function. Local variables can be defined inside the function.

  • A second improvement is a new function, "getline", that allows input from files other than those specified in the command line at invocation (as well as input from pipes). "Getline" can be used in a number of ways:
   getline                   Loads $0 from current input.
   getline myvar             Loads "myvar" from current input.
   getline myfile            Loads $0 from "myfile".
   getline myvar myfile      Loads "myvar" from "myfile".
   command | getline         Loads $0 from output of "command".
   command | getline myvar   Loads "myvar" from output of "command".

  • A related function, "close", allows a file to be closed so it can be read from the beginning again:
  • A new function, "system", allows Awk programs to invoke system commands:
   system("rm myfile")

  • Command-line parameters can be interpreted using two new predefined variables, ARGC and ARGV, a mechanism instantly familiar to C programmers. ARGC ("argument count") gives the number of command-line elements, and ARGV ("argument vector") is an array whose entries store the elements individually.
  • There is a new conditional-assignment expression, known as "?:", which is used as follows:
   status = (condition == "green")? "go" : "stop"

This translates to:

   if (condition=="green") {status = "go"} else {status = "stop"}

This construct should also be familiar to C programmers.

  • There are new math functions, such as trig and random-number functions:
   sin(x)         Sine, with x in radians.
   cos(x)         Cosine, with x in radians.
   atan2(y,z)     Arctangent of y/x, in range -PI to PI.
   rand()         Random number, with 0 <= number < 1.
   srand()        Seed for random-number generator.

  • There are new string functions, such as match and substitution functions:
    • match(<target string>,<search string>)
      Search the target string for the search string; return 0 if no match, return starting index of search string if match. Also sets built-in variable RSTART to the starting index, and sets built-in variable RLENGTH to the matched string's length.
    • sub(<regular expression>,<replacement string>)
      Search for first match of regular expression in $0 and substitute replacement string. This function returns the number of substitutions made, as do the other substitution functions.
    • sub(<regular expression>,<replacement string>,<target string>)
      Search for first match of regular expression in target string and substitute replacement string.
    • gsub(<regular expression>,<replacement string>)
      Search for all matches of regular expression in $0 and substitute replacement string.
    • sub(<regular expression>,<replacement string>,<target string>)
      Search for all matches of regular expression in target string and substitute replacement string.

  • There is a mechanism for handling multidimensional arrays. For example, the following program creates and prints a matrix, and then prints the transposition of the matrix:
   BEGIN {count = 1;
      for (row = 1; row <= 5; ++row) {
        for (col = 1; col <= 3; ++col) {
          array[row,col] = count++; }
        printf("\n"); }

      for (col = 1; col <= 3; ++col) {
         for (row = 1; row <= 5; ++row) {
            printf("%4d",array[row,col]); }
         printf("\n"); }
      exit; }

This yields:

   1   2   3    4   5   6    7   8   9    10  11  12    13  14  15

   1   4   7  10  13    2   5   8  11  14    3   6   9  12  15

Nawk also includes a new "delete" function, which deletes array elements:

  • Characters can be expressed as octal codes. "\033", for example, can be used to define an "escape" character.

  • A new built-in variable, FNR, keeps track of the record number of the current file, as opposed to NR, which keeps track of the record number of the current line of input, regardless of how many files have contributed to that input. Its behavior is otherwise exactly identical to that of NR.
  • While Nawk does have useful refinements, they are generally intended to support the development of complicated programs. My feeling is that Nawk represents overkill for all but the most dedicated Awk users, and in any case would require a substantial document of its own to do its capabilities justice. Those who would like to know more about Nawk are encouraged to read THE AWK PROGRAMMING LANGUAGE by Aho / Weinberger / Kernighan. This short, terse, detailed book outlines the capabilities of Nawk and provides sophisticated examples of its use.