Guide to Unix/Explanations/awk

From Wikibooks, open books for an open world
< Guide to Unix‎ | Explanations
Jump to: navigation, search

The name 'awk' is derived from the names of the three people who originally developed it - Aho, Weinberger and Kernighan. It is a programming language which uses a pattern-action expression that transforms the input to the output. It processes the input (usually a file of data), searching each line for the given pattern. Any line that matches the given pattern has the action applied to it and this constitutes the output. A line that does not match is ignored.

Each input line is divided into fields by a separator character (default is space) and patterns can be matched to these fields as they are referenced in the usual Unix style - $1 being field 1, $2 being field 2 etc. $0 means the entire input line.

If no pattern is specified then all input lines are selected. If no action is specified, the default action is to print the entire line. Therefore if you just want to print a subset of the input, you just need to supply a pattern that will produce the desired results, Awk will print the input as found.

However, you can also specify which fields are to be output in the same way e.g. print $1.

A simple example:

awk '$1 ~ /A/ { print $2 " " $3 }' /etc/passwd

Program Structure[edit]

awk programs consist of a sequence of one or more pattern-action statements:

pattern   { action }
pattern   { action }
 :
 :

awk scans input lines of data and performs actions on those lines that match any of the specified patterns.


Running AWK[edit]

Here we call awk from a shell script awk1.sh:

#!/bin/bash

# awk1.sh
awk '
       { print }
' $1

There is no pattern, so every line fed into awk is matched and the action is invoked. Which results in every line of the file being printed on the screen. Thus awk1.sh behaves similar to cat.

To demonstrate, create the file numeric.dat with the contents:

1 one   i
2 two   ii
3 three iii
4 four  iv
5 five  v
6 six   vi
7 seven vii
8 eight viii
9 nine  ix
10 ten  x

Run awk1.sh on numeric.dat (don't forget to make the script executable):

./awk1.sh numeric.dat
1 one   i
2 two   ii
3 three iii
4 four  iv
5 five  v
6 six   vi
7 seven vii
8 eight viii
9 nine  ix
10 ten  x

(Notice how    ./    is being used to execute a script.)

Expressions =[edit]

If the first field is equal to one then print the entire line

#!/bin/sh
# awk1.sh
awk '
    $1 == 1 { print $0 }
' $1

Results in:

1 one i

If the second field is equal to "two" then print the entire line:

$2 == "two" { print $0 }

Results in:

2 two ii

If the first field is greater than 5 then print the third field

$1 > 5 { print $3 }

Results in

 vi
 vii
 viii
 ix
 x

Regular Expressions[edit]

Print the input line if the pattern "ix" is matched in any field

/ix/ { print $0 } 

Results in:

6 six   vi
9 nine  ix

Print the input line if the pattern "ix" is matched in the third field:

$3 ~ /ix/ { print $0 } 

Results in:

9 nine  ix

Print the input lines that do not contain the pattern "x"

$0 !~ /x/ { print }

Results in:

1 one   i
2 two   ii
3 three iii
4 four  iv
5 five  v
7 seven vii
8 eight viii

Compound expressions[edit]

Print lines where the third field matches the pattern "x" OR the first field is less than or equal to 3.

$3 ~ /x/ ||  $1 <= 3  { print $0 }

Results in:

1 one   i
2 two   ii
3 three iii
9 nine  ix
10 ten  x

Print lines where the third field matches the pattern "vi" AND the second field begins with the letter "s".

$3 ~ /vi/ && $2 ~ "^s"  { print $0 }

Results in:

6 six   vi
7 seven vii

Ranges[edit]

Print lines where the second field equals "three" and where the third field equals "vii" and all subsequent lines in between:

$2 == "three", $3 == "vii" { print $0 }

Results in:

3 three iii
4 four  iv
5 five  v
6 six   vi
7 seven vii

BEGIN and END[edit]

BEGIN is a special pattern which matches before the first input line. Similarly END matches after the last input line.

BEGIN { print "start at 3..." }
$2 == "three", $2 ~ /^e/ { print $1 }
END { print "...and end at eight" }

Results in

start at 3...
3 
4 
5 
6 
7 
8 
...and end at eight