Explanation
In this example, the goal is to remove non-numeric characters from a text string with a formula. This is a tricky problem in Excel, partly because there is no built-in way to convert a text string to an array of characters. However, in the current version of Excel, you can generate an array of characters using the MID function with the SEQUENCE function. This is the approach used in the worksheet shown, where the formula in cell D5 looks like this:
=TEXTJOIN("",TRUE,IFERROR(MID(B5,SEQUENCE(LEN(B5)),1)+0,""))+0
Table of contents
- Creating an array of characters
- Testing for numeric values
- Removing non-numeric characters
- Creating the final numeric value
- A better formula?
- Excel 2019
- Removing numeric characters
Creating an array of characters
Working from the inside out, the first step in this problem is to create an array of characters from the text string in column B. This is done with the snippet of code below:
MID(B5,SEQUENCE(LEN(B5)),1)
First, the LEN function runs and returns a count of 10, since there are 10 characters in the text string "100 apples". This result is returned to the SEQUENCE function as the rows argument:
MID(B5,SEQUENCE(10),1)
Next, SEQUENCE generates a numeric array of the numbers 1-10, which is returned to the MID function as the start_num argument:
MID(B5,{1;2;3;4;5;6;7;8;9;10},1)
This is the current solution for creating an array of characters in an Excel formula. In this configuration, the MID function extracts the text in B5, one character at a time, and returns an array that looks like this:
{"1";"0";"0";" ";"a";"p";"p";"l";"e";"s"}
We now have an array that contains all characters in cell B5. The next step is to figure out which characters are numbers.
Testing for numeric values
Since we have an array of characters ready to go, you might think we can just pass them into the ISNUMBER function like this:
=ISNUMBER(array)
The problem though is that the numbers in the array (if any) are actually represented as text values like "1", "0", etc. If we try to use ISNUMBER like this, it will return FALSE for every character! One solution is to use a small hack to "force" Excel to convert numbers by adding zero. A math operation like this Excel to try to convert the character to a number. Adding zero to a non-numeric character like "a", will result in a #VALUE! error. However, adding zero to "1" will convert "1" to the number 1:
="a"+0 // returns #VALUE!
="1"+0 // returns 1
This is the trick used in the formula, where we add zero to the array of characters returned by the MID function:
{"1";"0";"0";" ";"a";"p";"p";"l";"e";"s"}+0
Because the array contains 10 characters, we get back an array of 10 results like this:
{1;0;0;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!}
Notice that only the first 3 characters have survived this operation (since they are numbers) and the remaining characters are now #VALUE! errors. This is the final piece we need to remove the non-numeric characters.
Removing non-numeric characters
The way we remove non-numeric characters in this formula is also tricky - we use the IFERROR function like this:
IFERROR(MID(B5,SEQUENCE(LEN(B5)),1)+0,"")
When you wrap a formula in IFERROR, you are essentially forcing another result when the formula returns an error. When the formula does not return an error, the result passes through IFERROR unchanged. The snippet uses this behavior to convert errors to an empty string (""). After the IFERROR function processes the array returned by MID, it returns an array like this:
{1;0;0;"";"";"";"";"";"";""}
Notice the #VALUE! errors are now gone, replaced by empty strings. At this point, the remaining task is to assemble the remaining numbers into a final numeric value.
Creating the final numeric value
The last step in this problem is to join the surviving numeric values into a single number. The tool we use to perform this step is the TEXTJOIN function, which is designed to concatenate values in a range or array. In this formula, the result from IFERROR is returned to TEXTJOIN as the text1 argument like this:
=TEXTJOIN("",TRUE,{1;0;0;"";"";"";"";"";"";""})
Notice that we provide delimiter as an empty string ("") because we don't want any extra characters in the final result and we supply TRUE for ignore_empty because we don't want to include the empty strings in the final result. In this configuration, TEXTJOIN returns the three numbers in a text string like this:
="100"
So close! But notice we again have a text value because TEXTJOIN performs concatenation, which always results in a text string. The final step is to again add zero to force Excel to convert the text to a number:
="100"+0 // returns 100
Note: if you prefer, you can use the VALUE function instead of adding a zero to convert numbers as text values into numeric values. Adding zero is just a shortcut.
A better formula?
After I finished documenting the formula above, upgrading it to use the SEQUENCE function, I realized that a better approach is probably to use the FILTER function with the LET function like this:
=LET(
chars,MID(B5,SEQUENCE(LEN(B5)),1),
TEXTJOIN("",1,FILTER(chars,ISNUMBER(chars+0)))+0
)
FILTER is a more natural solution because it is designed to filter out unwanted values. The catch though is that we need to use the character array created by MID + SEQUENCE more than once, which means we should introduce the LET function for efficiency. In the formula above, we store the result from MID in a variable named "chars", then we use that variable twice inside FILTER like this:
=FILTER(chars,ISNUMBER(chars+0))
Inside the include argument of FILTER we add zero to the array to force Excel to try to convert the characters to numbers. As explained above, chars + 0 will return an array like this:
{1;0;0;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!}
Then, the ISNUMBER function will return an array like this:
{TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE}
The final result from FILTER is an array that contains just the three numbers:
{"1";"0";"0"}
We then join the numbers with TEXTJOIN and (again) force a numeric result by adding zero:
=TEXTJOIN("",1,{"1";"0";"0"})+0
="100"+0
=100
The final result is 100, the same as before. This formula is slightly more verbose than the original, but it is easier to adapt to filter characters in different ways. The original formula is shorter but more cryptic and works best for the intended task only.
Excel 2019
If you happen to be using Excel 2019, which provides the TEXTJOIN function but not the SEQUENCE function, you can use an alternative formula like this:
=TEXTJOIN("",TRUE,IFERROR(MID(B5,ROW(INDIRECT("1:"&LEN(B5))),1)+0,""))
Note: In Excel 2019 this is an array formula and must be entered with control + shift + enter.
The ROW + INDIRECT construction is another way in older versions of Excel to create a numeric array with a variable length:
=ROW(INDIRECT("1:"&LEN(B5))
=ROW(INDIRECT("1:"&10))
=ROW(INDIRECT("1:10"))
={1;2;3;4;5;6;7;8;9;10}
The resulting array is the same as that returned by the SEQUENCE function above. Note that INDIRECT is a volatile function that can cause performance problems so this approach should be avoided in later versions of Excel.
Strip numeric characters
To remove numeric characters from a text string use the formulas explained here.