Regular Expression Metacharacters
Metacharacter | Description |
---|---|
. | Matches any characters except \n. If \n needs to be included, use the mode such as [\s\S]. |
^ | Matches the start position of an input character string and does not match any characters. Use \^ to match the character itself. |
$ | Matches the end position of an input character string and does not match any characters. Use \$ to match the character itself. |
* | The preceding characters or sub-expressions are matched zero or more times. * is equivalent to {0,}. For example, \^*b can match b, ^b, ^^b, and so on. |
+ | Matches preceding characters or sub-expressions one or more times, equivalent to {1}. For example, a+b can match ab, aab, abb, aaab, and so on. |
? | Matches preceding characters or sub-expression zero times or once, equivalent to {0,1}. For example, a[cd]? can match a, ac, and ad. When this character follows any other qualifier such as *, +, ?, {n}, {n,}, or {n,m}, the matching mode is non-greedy. Non-greedy mode matches the shortest possible searched character strings, and the default greedy mode matches the longest possible searched character strings. For example, the character string oooo, o+? matches only a single o, while o+ matches all o. |
| | The logic or is performed on two matching conditions. For example, the regular expression (him|her) matches it belongs to him and it belongs to her, but cannot match it belongs to them. |
\ | Marks the next character as a special character, text, reverse reference, or octal escape character. For example, n matches the character n, \n matches the newline character, \\ matches \, and \( matches (. |
\w | Matches a letter, digit, or underscore (_). |
\W | Matches any character that is not a letter, digit, or underscore (_). |
\s | Matches any blank character, such as a space, tab character, or form feed. It is equivalent to [ \f\n\r\t\v]. |
\S | Matches any character except blank characters and is equivalent to [^\f\n\r\t\v]. |
\d | Matches any digit and is equivalent to [0–9]. |
\D | Matches any non-digital character and is equivalent to [^0-9]. |
\b | Matches a word boundary (the position between a word and a space) and does not match any characters. For example, er\b matches er in never but does not match er in verb. |
\B | A non-word boundary match. For example, er\B matches er in verb but does not match er in never. |
\f | Matches a form feed and is equivalent to \x0c and \cL. |
\n | Matches a linefeed and is equivalent to \x0a and \cJ. |
\r | Matches a carriage return character and is equivalent to \x0d and \cM. |
\t | Matches a tab character and is equivalent to \x09 and \cI. |
\v | Matches a vertical tab character and is equivalent to \x0b and \cK. |
\cx | Matches control characters indicated by x. For example, if \cM matches Control-M or a carriage return character, the value of x must be between A–Z or a–z. Otherwise, the c character indicates c itself. |
{n} | The value n is a non-negative integer and refers to the number of matching times. For example, o{2} does not match o in Bob, but matches o in food. |
{n,} | The value n is a non-negative integer and refers to the minimum number of matching times. For example, o{2,} does not match o in Bob but matches all o in foooood. o{1,} is equivalent to o+, and o{0,} is equivalent to o*. |
{n,m} | The values n and m (n≤m) are non-negative integers, where n refers to the minimum number of matching times and m refers to the maximum matching times. For example, o{1,3} matches the first three os in fooooood, and o{0,1} is equivalent to o?. Note that a space cannot be inserted between commas and digits. For example, ba{1,3} can match ba, baa, or baaa. |
x|y | Matches x or y. For example, z|food matches z or food, and (z|f)ood matches zood or food. |
[xyz] | Refers to a character set that matches any characters included. For example, [abc] matches a in plain. |
[^xyz] | Refers to a reverse character set that matches any characters except xyz. For example, [^abc] matches p in plain. |
[a-z] | Refers to a character range and matches any characters in the specified range. For example, [a-z] matches any lowercase letters from a to z. |
[^a-z] | Refers to a reverse character range and matches any characters not in the specified range. For example, [^a-z] does not match any characters from a to z. |
( ) | Defines expressions between ( and ) as group and save characters that match the expression to a temporary area. A maximum of nine characters can be saved in a regular expression, and these characters can be referenced by symbols \1 to \9. |
(pattern) | Matches pattern and captures sub-expressions of the match. You can use the $0–$9 attribute to retrieve captured matches from the result match set. |
(?:pattern) | Matches pattern but does not capture sub-expressions of the match. That is, it is a non-capture match and does not store matches for future use. This is useful for the or character combined with (|). For example, industr(?:y|ies) is a simpler expression than industry|industries. |
(?=pattern) | Refers to a non-capture match and indicates a forward positive pre-check, searching character strings at the start position of any character strings that match pattern. There is no need to capture the match for future use. For example, "Windows(?=95|98|NT|2000)" matches "Windows" in "Windows2000", but does not match "Windows" in "Windows3.1". A pre-check does not consume characters. That is, after a match occurs, the next search starts immediately, instead of starting from pre-checked characters. |
(?!pattern) | Refers to a non-capture match and indicates a forward negative pre-check, searching character strings at the start position of any character strings that do not match pattern. There is no need to capture the match for future use. For example, "Windows(?=95|98|NT|2000)" matches "Windows" in "Windows3.1", but does not match "Windows" in "Windows2000". |
To match special characters, add \ before the special characters. For example, to match the following special characters: ^, $, (), [], {}, ., ?, +, *, and |, use \^, \$, \ (, \), \ [, \], \{, \}, \., \?, \+, \*, and \|.