What is a regular expression?
A regular expression is a method of representing a string matching pattern. Regular expressions enable strings that match a particular pattern within textual data records to be located and modified and they are often used within utility programs and programming languages that manipulate textual data. Regular expressions are extremely powerful.
Various software applications use regular expressions to locate, select or modify particular sections text. For example, a regular expression could be used to:
- replace the word "snake" with the word "serpent" throughout an entire piece of text
- locate pieces of text containing the words "fox" and "sheep" on the same line
Regular expression components
Regular expressions are made up of three types of components:
- anchors used to specify the position of the pattern in relation to a line of text.
- character sets used to match one or more characters in a single position.
- modifiers used to specify how many times a character set is repeated.
Syntax varies across application programs
The syntax of regular expressions varies across application programs. For example the shell uses a limited form of regular expression called shell regular expressions for filename substitution, whereas AWK uses a superset of extended regular expressions syntax.
Regular expressions are supported by various software tools, including command line tools, plain text editors and programming languages. Most of these tools are available for various computing platforms, including Linux, Windows and Mac OS X. The tools use slightly different syntax styles. Let's look at some notable ones.
- Command line tools
- Plain text editors
- Programming languages
A regular expression can be considered to be a little computer program that finds or isolates a subset of a larger set of text. In the same way that an ordinary computer program needs a computer to execute it, a regular expression needs a software application to interpret it — to give it meaning.
For example, a regular expression can be used to tell an editor to find the next occurrence of the word "Chapter" followed by several spaces and digits. Or you can use a regular expression to tell the UNIX grep command to show only those lines of a file that contain the word "Wiki" followed by either the word "Books" or the word-fragment "pedia". We will discuss the exact syntax of such regular expressions in the next chapter.