## Explanation

In this example, the goal is to extract a date in a format like mm/dd/yy from a text string with a formula. The position of the date is not known, so the date must be located as a first step. This article explains two ways to solve this challenge:

- A "classic" formula based on the SEARCH function and the MID function that will work in any version of Excel.
- A modern formula based on the REGEXEXTRACT function, which is only available in the Beta channel of Excel 365.

### Classic formula

In the worksheet shown, we are using a "classic" formula to extract dates from the text strings in column B. The formula in cell D5 looks like this:

`=MID(B5,SEARCH("??/??/??",B5),8)+0`

At a high level, this formula uses the SEARCH function to locate the date and the MID function to extract the date. MID is designed to extract a given number of characters from the middle of a text string. SEARCH will return the position of a matching value as a number. Working from the inside out, SEARCH is configured like this:

`SEARCH("??/??/??",B5)`

Here, the find_text is given as "??/??/??", and the within_text is provided as B5. The SEARCH supports wildcards, and the "?" character means *any single character*. The pattern "??/??/??" means: *any two characters followed by a forward slash "/", followed by any two characters followed by a forward slash "/", followed by any two characters*.

The text in cell B5 is 57 characters long, and the date begins at character 37. The SEARCH function finds the date pattern and returns 37 as a result. This result lands inside the MID function as the *start_num*. Simplifying, we now have:

`=MID(B5,37,8)+0`

Inside the MID function, the text comes from cell B5, the *start_num* is 37, and *num_chars* is set to 8. We use 8 because a date in the format "mm/dd/yy" is eight characters. With this configuration, MID extracts 8 characters beginning at character 37, and we now have the date isolated, still in text format:

`="06/15/24"+0`

As a final step, we add zero. This is a simple hack to get a valid date from a text string. The math operation forces Excel's formula engine to try and convert the text "06/15/24" into a number. In this case, Excel recognizes that "06/15/24" as a date and performs the conversion. It then returns the serial number 45458, which is June 15, 2024, in Excel's date system:

`=45458+0`

Adding zero has no effect on the number, so the final result is 45458. The last step is to format the result using the date format "d-mmm-yyyy" which causes Excel to display the dates in column D as they appear. We can apply this formatting because we have converted text value into a valid Excel date. This format can be adjusted to display dates as desired.

Although this formula is fairly simple, it is not especially robust. For example, it will match the non-date "AA/BB/CC" and even "AAAA/BB/CCCC". It will also fail on dates in "mm/dd/yyyy" format since only the first two year digits of the year will be used, resulting in an incorrect year. If all dates use a 4-digit year, you can use the modified formula below:

`=MID(B5,SEARCH("??/??/????",B5),10)+0`

See below for a more robust formula based on regex.

*Notes: (1) If you only want a text value (not an actual date), omit adding a zero. **(2) This formula works because the SEARCH function supports wildcards, unlike the FIND function.*

### A modern formula based on regex

In the latest version of Excel, which offers the REGEXEXTRACT function, we can build a more robust formula because regex patterns are much more specific than Excel's primitive wildcards. In the worksheet below, we are using REGEXEXTRACT to extract dates with a formula like this:

`=REGEXEXTRACT(B5,"\b\d{1,2}/\d{1,2}/\d{2,4}\b")+0`

Inside REGEXEXTRACT, the *text* is given as B5. The regex *pattern* looks like this:

`"\b\d{1,2}/\d{1,2}/\d{2,4}\b"`

Regular expressions (regex) are a language used to match and extract text patterns. Briefly, this is how the pattern works:

- \b: Matches a word boundary
- \d{1,2}: Matches one or two digits for the month
- /: Matches the forward slash separator
- \d{1,2}: Matches one or two digits for the day
- /: Matches the second forward slash separator
- \d{2,4}: Matches 2-4 digits for the year
- \b: Matches another word boundary

You can see how the formula works in the worksheet below:

Notice we are now matching 4-digit years on rows 7 and 12, in addition to the other 2-digit years. Compared to the MID + SEARCH formula above, this formula does a better job of matching dates. It is more flexible in some ways but more restrictive in others. For example, it will match dates like "1/1/23", "01/01/23", and "5/25/2023", but it won't match a text string like "AA/BB/CC" or "1234/12/1234". However, note that the pattern *does not check* that the month is between 1-12 or that the day is valid for the given month. It also doesn't validate the year in any way. It would, for example, allow a 3-digit year, which Excel won't interpret correctly. Since this is regex, we can easily make the pattern more specific. The revised formula below will only allow a 2-digit year OR a 4-digit year:

`=REGEXEXTRACT(B5,"\b\d{1,2}/\d{1,2}/(\d{2}|\d{4})\b")+0`

In regex, there is always a way to tighten up edge cases at the cost of more complexity. However, even the initial regex formula above is much better than the traditional SEARCH + MID at preventing false matches. REGEXEXTRACT is a *huge upgrade* to Excel's tools for matching and extracting text.