Regular Expressions/Introduction

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search
TODO

Editor's note

ATTENTION EDITORS: In general, please use the phrase "regular expression" instead of the abbreviation "regex" in this book. It sounds more professional, and it clearly refers to the subject of the book. -- User:Franl

A Regular Expression (regex) is a string of text written in a very concise language. Various software applications use regular expressions to match patterns in text. Using a regular expression and a tool which understands it, a user can simplify the search for a particular textual pattern. For example, you can use a regular expression to tell an editor to find the next occurrence of the word "Chapter" followed by one or more spaces followed by one or more digits. Or you can use a regular expression to tell the UNIX grep command to show only those lines of a file that contain the word "Wiki" followed by either the word "Books" or the word-fragment "pedia". We'll discuss the exact syntax of such regular expressions in the next chapter.

Regular expressions are a powerful text examination and manipulation tool. In a sense, a regular expression is a little computer program that finds or isolates a subset of a larger set of text. In the same way that an ordinary computer program needs a computer to execute it, a regular expression needs a software application to interpret it โ€” to give it meaning. There are a variety of software applications that implement regular expressions. Let's look at the notable ones.

[edit] Supporting Software

Regular expressions are supported by various software tools, including command line tools, plain text editors and programming languages. Most of these tools are available for various computing platforms, including Linux, Windows and Mac OS X.

The tools:

  • Command line tools
    • grep
    • egrep
    • sed
    • awk
  • Plain text editors
    • vi
    • Emacs
  • Programming languages
    • Java
    • JavaScript
    • .NET
    • Perl
    • Ruby
    • Tcl

The tools use slightly different styles syntax. Perl uses a form called Perl Compatible Regular Expressions (PCRE). TextPad uses POSIX style regexes. We will cover PCRE as an introduction and later discuss the difference between PCRE and POSIX styles.

Regular Expression Syntaxes โ†’

Regular Expressions ยท Regular Expression Syntaxes โ†’