To remove some of the natural complexity of text (strip punctuation, normalize case, remove extra spaces) you can use a formula based on the SUBSTITUTE function, with help from the TRIM and LOWER functions.
There may be times when you need to remove some of the variability of text before other processing. One example is when you want to count specific words inside larger text strings. Because Excel doesn't provide support for regular expressions, you can't construct precise matches. For example, if you want to count how many times the word "fox" appears in a cell, you will end up counting "foxes". You can look for "fox " (with a space) but that will fail with "fox," or "fox." One workaround is to simplify the text first with a formula in a helper column, then run counts on the simplified version. The example on this page shows one way to do this.
The formula shown in this example uses a series of nested SUBSTITUTE functions to strip out parentheses, hyphens, colons, semi-colons, exclamation marks, commas, and periods. The process runs from the inside out, with each SUBSTITUTE replacing one character with a single space, then handing off to the next SUBSTITUTE. The inner most SUBSTITUTE removes the left parentheses, and the result is handed to the next SUBSTITUTE, which removes the right parentheses, and so on.
In the version below, line breaks have been added for readability, and to make it easier to edit replacements. Excel does not care about line breaks in formulas, so you can use the formula as-is.
After all substitutions are complete, the result is run through TRIM to normalize spaces, then the LOWER function to force all text to lowercase.
Note: You'll need to adjust the actual replacements to suit your data.
Adding a leading and trailing space
In some cases you may want to add a space character to the start and end of the cleaned text. For example, if you want to count words precisely, you may want to look for the word surrounded by spaces (i.e. search for " fox ", " map ") to avoid false matches. To add a leading and trailing space, just concatenate a space (" ") to the start and end:
In this example, the goal is to remove non-numeric characters from a text string with a formula. Working from the inside out, the MID function is used to extract the text in B5, one character at a time. The key to this step is the use of the ROW...
Excel doesn't have a way to cast the letters in a text string to an array directly in a formula. As a workaround, this formula uses the MID function, with help from the ROW and INDIRECT functions to achieve the same result. The formula in C5, copied...
The formula runs from the inside out, with each SUBSTITUTE removing one character. The inner most SUBSTITUTE removes the left parentheses, and the result is handed to the next SUBSTITUTE, which removes the right parentheses, and so on. Whenever you...
At the core, the formula uses the SUBSTITUTE function to perform the each substitution, with this basic pattern: = SUBSTITUTE ( text , find , replace ) "Text" is the incoming value, "find" is the text to look for, and "replace" is the text to...
The Excel SUBSTITUTE function replaces text in a given string by matching. For example =SUBSTITUTE("952-455-7865","-","") returns "9524557865"; the dash is stripped. SUBSTITUTE is case-sensitive and does not support wildcards.
The Excel LOWER function converts a text string to all lowercase letters. Numbers, punctuation, and spaces are not affected.
Excel Formula Training
Formulas are the key to getting things done in Excel. In this accelerated training, you'll learn how to use formulas to manipulate text, work with dates and times, lookup values with VLOOKUP and INDEX & MATCH, count and sum with criteria, dynamically rank values, and create dynamic ranges. You'll also learn how to troubleshoot, trace errors, and fix problems. Instant access. See details here.