An Awk Primer/Standard Functions

From Wikibooks, open books for an open world
Jump to: navigation, search

Below is the list of Awk functions. Arguments which can be omitted are in square brackets.

Numerical functions[edit]

Numerical functions work with numbers. All of them return a number and have only numerical parameters, or no parameters at all.

  • int(x) returns x rounded towards zero. For example, int(-3.9) returns -3, while int(3.9) returns 3.
  • sqrt(x) returns \sqrt{x}.
  • exp(x) returns e^x.
  • log(x) returns natural logarithm of x.
  • sin(x) returns \sin x, in radians.
  • cos(x) returns \cos x, in radians.
  • atan2(y,x) is similar to the same function in C or C++, see below for more information.
  • rand() returns a pseudo-random number in the [0,1) interval (that is, it is at least 0 and less then 1). If the same program runs more than once, some implementations (i. e. GNU Awk 3.1.8) produce the same series of random numbers, while others (i. e. mawk 1.3.3) each time produce a different series.
  • srand([x]) sets x as a random number seed. Without parameters, it uses time of day to set a seed. It returns the previous seed.

atan2[edit]

atan2(y,x) returns the angle \alpha, in radians, such that:

  • -\pi < \alpha \le \pi
  • x = \sqrt{x^2+y^2}\cos\alpha
  • y = \sqrt{x^2+y^2}\sin\alpha

The formulas are

\operatorname{atan2}(y, x) = \begin{cases}
\arctan\left(\frac y x\right) & \qquad x > 0 \\
\arctan\left(\frac y x\right) + \pi& \qquad y \ge 0 , x < 0 \\
\arctan\left(\frac y x\right) - \pi& \qquad y < 0 , x < 0 \\
\frac{\pi}{2} \sgn{y}& \qquad x = 0
\end{cases}

String functions[edit]

String functions work with strings. All of them have at least one string parameter, which sometimes can be omitted. For most of them, all parameters are strings and/or regular expressions.

Note that in Awk strings, characters are numbered from 1. For example, in the string "cat", the character number 1 is "c", the character number 2 is "a", the character number 3 is "t".

Below, s and t are strings, regexp is a regular expression.

  • length([s]) returns the number of characters in s (in $0 by default).
  • split(s, A [,regexp]) splits s into array A of fields, using regexp (FS by default) as a delimiter. If regexp is empty ("" or //), some implementations (i. e. gawk) split it to characters, others (i. e. mawk 1.3.3) return array of one element, which contains the whole string s. Returns the number of fields.
  • sprintf(format [,expression, ..., expression]) - formats the expressions similar to C and C++ function sprintf, returns the result. See wikipedia article for more information.
  • gsub(regexp, s [,t]) - in t ($0 by default), substitutes all matches of regexp by s. Returns the number of substitutions.
  • sub(regexp, s [,t]) - in t ($0 by default), substitutes the first match of regexp by s. If there is no match, does nothing and returns 0, otherwise returns 1.
    • In sub() and gsub(), & in the string s means the whole matched text. Use \& for the literal & . Note that \& should be typed as \\& in order to avoid the backslash escape in Awk strings.
  • index(s, t) - returns the index of the first occurrence of t in s, or 0 if s does not contain t. Example: index("hahaha", "ah") returns 2, while index("hahaha", "foo") returns 0.
  • match(s, regexp) - like index, but seeks a regular expression rather than a string. Also, sets RSTART to the return value, RLENGTH to the length of the matched substring, or -1 if no match. If empty string is matched, RSTART is set to the index of the first character after the match (length(s)+1 if the match is at the end), and RLENGTH is set to 0.
  • tolower(s) - returns the copy of s with uppercase characters turned to lowercase.
  • toupper(s) - returns the copy of s with lowercase characters turned to uppercase.

GNU Awk extensions[edit]

String function[edit]

gensub(regexp, s, h [, t]) replaces the h-th match of regexp by s in the string t ($0 by default). For example, gensub(/o/, "O", 3, t) replaces the third "o" by "O" in t.

  • Unlike sub() and gsub(), it returns the result, while the string t remains unchanged.
  • If h is a string starting with g or G, replaces all matches.
  • Like in sub() and gsub(), & in the string s means the whole matched text. Use \& for the literal & .
    • \& should be typed as \\& in order to avoid the backslash escape in awk strings.
  • Unlike sub() and gsub(), \0 in the string s means the same as &, while \1 ... \9 mean 1-st ... 9-th parenthesized subexpression.
    • Similarly to above, \0 ... \9 should be typed as \\0 ... \\9 for the same reason.

Examples:

  • print(gensub(/o/, "O", 3, "cooperation")) prints cooperatiOn
  • print(gensub(/o/, "O", "g", "cooperation")) prints cOOperatiOn
  • print(gensub(/o+/, "(&)", "g", "cooperation")) prints c(oo)perati(o)n
  • print(gensub(/(o+)(p+)/, "<[\\1](\\2)>", "g", "I oppose any cooperation") prints I <[o](pp)>ose any c<[oo](p)>eration

Array functions[edit]

Below, A and B are arrays.

  • length(A) returns the length of A.
  • asort(A[,B]) - if B is not given, sorts A. The indices of A are replaced by sequential integers starting with 1. If B is given, copies A to B, then sorts B as above, while A remains unchanged. Returns the length of A.
  • asorti(A[,B]) - if B is not given, discard values of A and sorts its indices. The sorted indices become the new values, and sequential integers starting with 1 become the new indices. Like in the previous case, if B is given, copies A to B, then sorts B's indices as above, while A remains unchanged. Returns the length of A.

Other standard functions[edit]

GNU Awk also has:

  • time functions
  • bit manipulation functions
  • internationalization functions.

See the man page (man gawk) for more information.