Guide to Unix/Explanations/awk
The name 'awk' is derived from the names of the three people who originally developed it - Aho, Weinberger and Kernighan. It is a programming language which uses a pattern-action expression that transforms the input to the output. It processes the input (usually a file of data), searching each line for the given pattern. Any line that matches the given pattern has the action applied to it and this constitutes the output. A line that does not match is ignored.
Each input line is divided into fields by a separator character (default is space) and patterns can be matched to these fields as they are referenced in the usual Unix style - $1 being field 1, $2 being field 2 etc. $0 means the entire input line.
If no pattern is specified then all input lines are selected. If no action is specified, the default action is to print the entire line. Therefore if you just want to print a subset of the input, you just need to supply a pattern that will produce the desired results, Awk will print the input as found.
However, you can also specify which fields are to be output in the same way e.g. print $1.
A simple example:
awk '$1 ~ /A/ { print $2 " " $3 }' /etc/passwd
Contents |
[edit] Program Structure
awk programs consist of a sequence of one or more pattern-action statements:
pattern { action }
pattern { action }
:
:
awk scans input lines of data and performs actions on those lines that match any of the specified patterns.
[edit] Running AWK
For the purpose of this article we actually use the command nawk which an improved version of the original awk.
Here we call nawk from a shell script nawk1.sh:
!#/bin/bash
# nawk1.sh
nawk '
{ print }
' $1
There is no pattern, so every line fed into nawk is matched and the action is invoked. Which results in every line of the file being printed on the screen. Thus nawk1.sh behaves similar to cat.
To demonstrate, create the file numeric.dat with the contents:
1 one i 2 two ii 3 three iii 4 four iv 5 five v 6 six vi 7 seven vii 8 eight viii 9 nine ix 10 ten x
Run nawk1.sh on numeric.dat (don't forget to make the script executable):
./nawk1.sh numeric.dat 1 one i 2 two ii 3 three iii 4 four iv 5 five v 6 six vi 7 seven vii 8 eight viii 9 nine ix 10 ten x
[edit] Expressions =
If the first field is equal to one then print the entire line
#!/bin/sh
# nawk1.sh
nawk '
$1 == 1 { print $0 }
' S1
Results in:
1 one i
If the second field is equal to "two" then print the entire line:
$2 == "two" { print $0 }
Results in:
2 two ii
If the first field is greater than 5 then print the third field
$1 > 5 { print $3 }
Results in
vi vii viii ix x
[edit] Regular Expressions
Print the input line if the pattern "ix" is matched in any field
/ix/ { print $0 }
Results in:
6 six vi 9 nine ix
Print the input line if the pattern "ix" is matched in the third field:
$3 ~ /ix/ { print $0 }
Results in:
9 nine ix
Print the input lines that do not contain the pattern "x"
$0 !~ /x/ { print }
Results in:
1 one i 2 two ii 3 three iii 4 four iv 5 five v 7 seven vii 8 eight viii
[edit] Compound expressions
Print lines where the third field matches the pattern "x" OR the first field is less than or equal to 3.
$3 ~ /x/ || $1 <= 3 { print $0 }
Results in:
1 one i 2 two ii 3 three iii 9 nine ix 10 ten x
Print lines where the third field matches the pattern "vi" AND the second field begins with the letter "s".
$3 ~ /vi/ && $2 ~ "^s" { print $0 }
Results in:
6 six vi 7 seven vii
[edit] Ranges
Print lines where the second field equals and "three" and where the third field equals "vii" and all subsequent lines in between:
$2 == "three", $3 == "vii" { print $0 }
Results in:
3 three iii 4 four iv 5 five v 6 six vi 7 seven vii
[edit] BEGIN and END
BEGIN is a special pattern which matches before the first input line. Similarly END matches after the last input line.
BEGIN { print "start at 3..." }
$2 == "three", /vii/ && $2 ~ /^e/ { print $1 }
END { print "...and end at eight" }
Results in
start at 3... 3 4 5 6 7 8 ...and end at eight
This page may need to be