JavaScript/Regular expressions
Overview
[edit | edit source]JavaScript implements regular expressions (regex for short) when searching for matches within a string. As with other scripting languages, this allows searching beyond a simple letter-by-letter match, and can even be used to parse strings in a certain format.
Unlike strings, regular expressions are delimited by the slash (/) character, and may have some options appended.
Regular expressions most commonly appear in conjunction with the string.match() and string.replace() methods.
At a glance, by example:
strArray = "Hello world!".match(/world/); // Singleton array; note the slashes
strArray = "Hello!".match(/l/g); // Matched strings are returned in a string array
"abc".match(/a(b)c/)[1] === "b" // Matched subgroup is the 2nd item (index 1)
str1 = "Hey there".replace(/Hey/g, "Hello");
str2 = "N/A".replace(/\//g, ","); // Slash is escaped with \
str3 = "Hello".replace(/l/g, "m").replace(/H/g, "L").replace(/o/g, "a"); // Pile
if (str3.match(/emma/)) { console.log("Yes"); }
if (str3.match("emma")) { console.log("Yes"); } // Quotes work as well
"abbc".replace(/(.)\1/g, "$1") === "abc" // Backreference
Compatibility
[edit | edit source]JavaScript's set of regular expressions follows the extended set. While copying a Regex pattern from JavaScript to another location may work as expected, some older programs may not function as expected.
- In the search term, \1 is used to back reference a matched group, as in other implementations.
- In the replacement string, $1 is substituted with a matched group in the search, instead of \1.
- Example: "abbc".replace(/(.)\1/g, "$1") => "abc"
- | is magic, \| is literal
- ( is magic, \( is literal
- The syntaxes (?=...), (?!...), (?<=...), and (?<!...) are not available.
Examples
[edit | edit source]- Matching
- string = "Hello world!".match(/world/);
- stringArray = "Hello world!".match(/l/g); // Matched strings are returned in a string array
- "abc".match(/a(b)c/)[1] => "b" // Matched subgroup is the second member (having the index "1") of the resulting array
- Replacement
- string = string.replace(/expression without quotation marks/g, "replacement");
- string = string.replace(/escape the slash in this\/way/g, "replacement");
- string = string.replace( ... ).replace ( ... ). replace( ... );
- Test
- if (string.match(/regexp without quotation marks/)) {
Modifiers
[edit | edit source]Single-letter modifiers:
g | Global. The list of matches is returned in an array. |
i | Case-insensitive search |
m | Multiline. If the operand string has multiple lines, ^ and $ match the beginning and end of each line within the string, instead of matching the beginning and end of the whole string only:
|
Operators
[edit | edit source]Operator | Effect |
---|---|
\b | Matches boundary of a word. |
\w | Matches an alphanumeric character, including "_". |
\W | Negation of \w. |
\s | Matches a whitespace character (space, tab, newline, formfeed) |
\S | Negation of \s. |
\d | Matches a digit. |
\D | Negation of \d. |
Function call
[edit | edit source]For complex operations, a function can process the matched substrings. In the following code, we are capitalizing all the words. It can't be done by a simple replacement, as each letter to capitalize is a different character:
var capitalize = function(matchobj) {
var group1 = matchobj.replace(/^(\W)[a-zA-Z]+$/g, "$1");
var group2 = matchobj.replace(/^\W([a-zA-Z])[a-zA-Z]+$/g, "$1");
var group3 = matchobj.replace(/^\W[a-zA-Z]([a-zA-Z]+)$/g, "$1");
return group1 + group2.toUpperCase() + group3;
};
var classicText = "To be or not to be?";
var changedClassicText = classicText.replace(/\W[a-zA-Z]+/g, capitalize);
console.log(changedClassicText==="To Be Or Not To Be?");
The function is called for each substring. Here is the signature of the function:
function (''<matchedSubstring>[, <capture1>, ...<captureN>, <indexInText>, <entireText>]'') {
...
return ''<stringThatWillReplaceInText>'';
}
- The first parameter is the substring that matches the pattern.
- The next parameters are the captures in the substrings. There are as many parameters as there are captures.
- The next parameter is the index of the beginning of the substring starting from the beginning of the text.
- The last parameter is a remainder of the entire text.
- The return value will be put in the text instead of the matching substring.
See also
[edit | edit source]- Regular Expressions - a Wikibook dedicated to regular expressions.
- Perl Regular Expressions Reference - a chapter devoted to regular expressions in a book about the Perl programming language.
External links
[edit | edit source]- JavaScript RegExp Object Reference at W3schools.com
- JavaScript RexExp Tester at regular-expressions.info
- Regular Expressions in Javascript at mozilla.org
- JavaScript RegExp Object at mozilla.org