Jump to content

JavaScript/Regular expressions

From Wikibooks, open books for an open world
Previous: Arrays Index Next: Operators

Overview

[edit | edit source]

JavaScript implements regular expressions (regex for short) when searching for matches within a string. As with other scripting languages, this allows searching beyond a simple letter-by-letter match, and can even be used to parse strings in a certain format.

Unlike strings, regular expressions are delimited by the slash (/) character, and may have some options appended.

Regular expressions most commonly appear in conjunction with the string.match() and string.replace() methods.

At a glance, by example:

strArray = "Hello world!".match(/world/); // Singleton array; note the slashes
strArray = "Hello!".match(/l/g); // Matched strings are returned in a string array
"abc".match(/a(b)c/)[1] === "b" // Matched subgroup is the 2nd item (index 1)
str1 = "Hey there".replace(/Hey/g, "Hello");
str2 = "N/A".replace(/\//g, ","); // Slash is escaped with \
str3 = "Hello".replace(/l/g, "m").replace(/H/g, "L").replace(/o/g, "a"); // Pile
if (str3.match(/emma/)) { console.log("Yes"); }
if (str3.match("emma")) { console.log("Yes"); } // Quotes work as well
"abbc".replace(/(.)\1/g, "$1") === "abc" // Backreference

Compatibility

[edit | edit source]

JavaScript's set of regular expressions follows the extended set. While copying a Regex pattern from JavaScript to another location may work as expected, some older programs may not function as expected.

  • In the search term, \1 is used to back reference a matched group, as in other implementations.
  • In the replacement string, $1 is substituted with a matched group in the search, instead of \1.
    • Example: "abbc".replace(/(.)\1/g, "$1") => "abc"
  • | is magic, \| is literal
  • ( is magic, \( is literal
  • The syntaxes (?=...), (?!...), (?<=...), and (?<!...) are not available.

Examples

[edit | edit source]
  • Matching
    • string = "Hello world!".match(/world/);
    • stringArray = "Hello world!".match(/l/g); // Matched strings are returned in a string array
    • "abc".match(/a(b)c/)[1] => "b" // Matched subgroup is the second member (having the index "1") of the resulting array
  • Replacement
    • string = string.replace(/expression without quotation marks/g, "replacement");
    • string = string.replace(/escape the slash in this\/way/g, "replacement");
    • string = string.replace( ... ).replace ( ... ). replace( ... );
  • Test
    • if (string.match(/regexp without quotation marks/)) {

Modifiers

[edit | edit source]

Single-letter modifiers:

g Global. The list of matches is returned in an array.
i Case-insensitive search
m Multiline. If the operand string has multiple lines, ^ and $ match the beginning and end of each line within the string, instead of matching the beginning and end of the whole string only:
"a\nb\nc".replace(/^b$/g,"d") === "a\nb\nc"
"a\nb\nc".replace(/^b$/gm,"d") === "a\nd\nc"

   

Operators

[edit | edit source]
Operator Effect
\b Matches boundary of a word.
\w Matches an alphanumeric character, including "_".
\W Negation of \w.
\s Matches a whitespace character (space, tab, newline, formfeed)
\S Negation of \s.
\d Matches a digit.
\D Negation of \d.


Function call

[edit | edit source]

For complex operations, a function can process the matched substrings. In the following code, we are capitalizing all the words. It can't be done by a simple replacement, as each letter to capitalize is a different character:

var capitalize = function(matchobj) {
  var group1 = matchobj.replace(/^(\W)[a-zA-Z]+$/g, "$1");
  var group2 = matchobj.replace(/^\W([a-zA-Z])[a-zA-Z]+$/g, "$1");
  var group3 = matchobj.replace(/^\W[a-zA-Z]([a-zA-Z]+)$/g, "$1");
  return group1 + group2.toUpperCase() + group3;
};

var classicText = "To be or not to be?";

var changedClassicText = classicText.replace(/\W[a-zA-Z]+/g, capitalize);

console.log(changedClassicText==="To Be Or Not To Be?");

The function is called for each substring. Here is the signature of the function:

function (''<matchedSubstring>[, <capture1>, ...<captureN>, <indexInText>, <entireText>]'') {
 ...
 return ''<stringThatWillReplaceInText>'';
}
  • The first parameter is the substring that matches the pattern.
  • The next parameters are the captures in the substrings. There are as many parameters as there are captures.
  • The next parameter is the index of the beginning of the substring starting from the beginning of the text.
  • The last parameter is a remainder of the entire text.
  • The return value will be put in the text instead of the matching substring.

See also

[edit | edit source]
[edit | edit source]


Previous: Arrays Index Next: Operators