I am trying to isolate an email address from a block of free field text (column name is TEXT).
There are many different variations of preceding and succeeding characters in the free text field, i.e.:
email me! [email protected]
e:[email protected] m:555-555-5555
[email protected] email
I've tried variations of INSTR() and SUBSTRING_INDEX() to first isolate the "@" (probably the one reliable constant in finding an email...) and extracting the characters to the left (up until a space or non-qualifying character like "-" or ":") and doing the same thing with the text following the @.
However - everything I've tried so far hasn't filtered out the noise to the level I need.
Obviously 100% accuracy isn't possible but would someone mind taking a crack at how I can structure my select statement?
There is no easy solution to do this within MySQL. However you can do this easily after you have retrieved it using regular expressions.
Here would be a an example of how to use it in your case: Regex example
If you want it to select all e-mail addresses from one string: Regex Example
You can use regex to extract the ones where it does contain an e-mail in MySQL but it still doesn't extract the group from the string. This has to be done outside MySQL
SELECT * FROM table
WHERE column RLIKE '\w*@\w*.\w*'
RLIKE is only for matching it, you can use REGEXP in the SELECT but it only returns 1 or 0 on whether it has found a match or not :s
If you do want to extract it in MySQL maybe this other stackoverflow post helps you out. But it seems like a lot of work instead of doing it outside MySQL
Now in MySQL 5 and 8 you can use REGEXP_SUBSTR to isolate just the email from a block of free text.
SELECT *, REGEXP_SUBSTR(`TEXT`, '([a-zA-Z0-9._%+\-]+)@([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable`;
If you want to get just the records with emails and remove duplicates ...
SELECT DISTINCT REGEXP_SUBSTR(`TEXT`, '([a-zA-Z0-9._%+\-]+)@([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable` WHERE `TEXT` REGEXP '([a-zA-Z0-9._%+\-]+)@([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With