Purpose
Return value
Syntax
=REGEXTEST(text,pattern,[case_sensitivity])
- text - The text value to test.
- pattern - The pattern to test for.
- case_sensitivity - [optional] 0 = Case sensitive, 1= Case-insensitive. Default is 0.
How to use
The REGEXTEST function provides a way to test values for matching text defined by a specific pattern. The result from REGEXTEST is TRUE or FALSE. To use REGEXTEST, provide text and a pattern. For example, the formulas below show how REGEXTEXT can be used to test the text in A1 for a number or an uppercase character:
=REGEXTEST(A1,"[0-9]") // test for a number
=REGEXTEST(A1,"[A-Z]") // test for an uppercase character
REGEXTEST returns TRUE or FALSE. If we use REGEXTEST to test for "a" in "apple", it returns TRUE:
=REGEXTEST("apple","a") // returns TRUE
If we use REGEXTEST to test "apple" for a single digit between 0-9, it returns FALSE:
=REGEXTEST("apple","[0-9]") // returns FALSE
About Regular Expressions
Regular expressions ("regex" for short) are a powerful tool for pattern matching and text manipulation. Regex patterns can be configured to match simple strings or very specific sequences like phone numbers, email addresses, dates, and other text that has an identifiable pattern. It's best to think of regex as a small language in and of itself, with many special codes that can be combined into flexible "patterns" to match different types of data, such as digits, letters, punctuation, and whitespace. While regular expressions can be intimidating to new users, they are widely used in many programming languages. Compared to the simple wildcards provided in older versions of Excel, regular expressions are a giant upgrade. They make it possible to match text in extremely specific ways that were simply not possible before now.
Basic examples
The worksheet below uses REGEXTEST to test the same string, "1apple23#z" with twelve different patterns. In each formula, the text string comes from column B, and the pattern comes from column D. The formula in cell F5, copied down, is:
=REGEXTEST(B5,D5)
Let's review the formulas one by one. Note that the formulas below use literal strings instead of cell references for readability. In each case, the text being tested remains the same; only the pattern varies.
=REGEXTEST("1apple23#z","apple") // returns TRUE
REGEXTEST returns TRUE because the literal string "apple" appears in the text. Note that REGEXTEST automatically performs a "contains" type search.
=REGEXTEST("1apple23#z","a") // returns TRUE
REGEXTEST returns TRUE because the literal string "a" appears in the text.
=REGEXTEST("1apple23#z","A") // returns FALSE
REGEXTEST returns FALSE because it is case-sensitive by default, and the text does not contain an uppercase "A."
=REGEXTEST("1apple23#z","^1") // returns TRUE
REGEXTEST returns TRUE because the text string "1apple23#z" begins with a "1". The caret (^) is a metacharacter that matches the beginning of a text string.
=REGEXTEST("1apple23#z","^a") // returns FALSE
REGEXTEST returns FALSE because the text does not begin with "a".
=REGEXTEST("1apple23#z","Z$") // returns FALSE
REGEXTEST returns FALSE because the text does not end with an uppercase "Z". The dollar ($) is a metacharacter that matches the end of a text string.
=REGEXTEST("1apple23#z","[zZ]$") // returns TRUE
REGEXTEST returns TRUE because the text ends with "z" and the pattern "[zZ]$" matches "z" or "Z" at the end of the text.
=REGEXTEST("1apple23#z","[A-Z]") // returns FALSE
REGEXTEST returns FALSE because the text contains no uppercase character between "A" and "Z".
=REGEXTEST("1apple23#z","[0-9]") // returns TRUE
REGEXTEST returns TRUE because the text contains the digit "1". The pattern "[0-9]" matches any single digit between 0 and 9 anywhere in the text string.
=REGEXTEST("1apple23#z","[0-9]{2}") // returns TRUE
REGEXTEST returns TRUE because the text contains a two-digit number where each digit is between 0-9. The curly-brace syntax "{n}" specifies how many instances of a character must be present in the input for a match. The pattern "[0-9]{2}" means exactly two digits, each between 0-9.
=REGEXTEST("1apple23#z","[0-9]{3}") // returns FALSE
REGEXTEST returns FALSE because the text does not contain a three-digit number, even though it does contain three digits.
=REGEXTEST("1apple23#z","[[:punct:]]") // returns TRUE
REGEXTEST returns TRUE because the text contains the punctuation "#". The pattern "[:punct:]" matches a punctuation character like [!"#$%&'()*+,\-./:;<=>?@[\]^_`{|}~] in the text.
Alternation in Regex (OR Logic)
One problem you'll often encounter in regex is how to test for this or that within the same pattern. Alternation in regex allows you to check for multiple possibilities by using the pipe (|) symbol, which works like a logical "OR". By grouping alternatives within parentheses, you can match one of several patterns in a test string. For example, in the worksheet below, we want to test addresses for one of four US states: MN, MT, ND, and SD. The formula in D5, copied down, looks like this:
=REGEXTEST(B5,"\b(MN|MT|ND|SD)\b")
The \b symbol is a word boundary to ensure that only standalone instances of these abbreviations are matched.
Regex terminology
Because Regex is essentially a mini-language, it has its own vocabulary. Here is a list of some important terminology:
- Pattern - The actual sequence of characters that defines the regex.
- Literal - Characters in a regex that match exactly themselves. For example, in the regex cat, the literals are c, a, and t.
- Metacharacter - Characters with special meanings in regex. For example, a period (.) matches any character, ^ matches the start of a string, $ matches the end of a string.
- Character Class - A set of characters enclosed in square brackets [] that matches any one of the characters inside. For example, [aeiou] matches any vowel.
- Quantifier - Specifies how many instances of a character, group, or character class must be present in the input for a match. For example, a* matches zero or more as, a+ matches one or more as, a{3} matches exactly three as.
- Escape Sequence - A way to handle metacharacters as literals by preceding them with a backslash \. For example, \. matches a literal period.
- Group - A part of a regex pattern enclosed in parentheses () that can be referred to later. For example, (abc) matches the exact sequence abc.
- Alternation - The pipe | character used to match one thing or another. For example, cat|dog matches cat or dog.
- Anchor - Special characters that match positions within a string rather than actual characters. For example, ^ matches the start of a string, $ matches the end of a string.
- Wildcard - The dot . character, which matches any single character except a newline character. For example, c.t matches cat, cot, cut, etc.
- Boundary - Special sequences that match positions between characters. For example, \b matches a word boundary, \B matches a non-word boundary.