I want to extract 8-digit numbers from a paragraph. It can contain a single optional hyphen anywhere between digits and MUST start with 6 or 7, so the following should match:
71234567
6-1234567
7-123-4567
61-23-45-67
7-1-2-3-4-5-6-7
...
I'd like to extract only the digits, so when matching 7-1-2-3-4-5-6-7, it returns only 71234567.
I tried to hardcode it like this:
[\b\D]([67]-?\d-?\d-?\d-?\d-?\d-?\d-?\d)[\b\D]
and then removing manually the hyphen later, but it doesn't work.
You can't omit chars from a matched substring. You need to postprocess your matches.
Also, note that [\b\D] matches a backspace char or a non-digit char. [\b] does not match a word boundary.
You can use
numbers = [x.replace('-', '') for x in re.findall(r'\b[67](?:-?\d){7}\b', data)]
# or, if the number can be glued to a letter or underscore
numbers = [x.replace('-', '') for x in re.findall(r'(?<!\d)[67](?:-?\d){7}(?!\d)', data)]
See the regex demo. Details:
\b - a word boundary(?<!\d) - a negative lookbehind that fails the match if there is a digit immediately to the left of the current location[67] - 6 or 7(?:-?\d){7} - seven occurrences of an optional - and a digit sequences(?!\d) - a negative lookahead that fails the match if there is a digit immediately to the right of the current location.See the Python demo:
import re
data = '71234567 6-1234567 7-123-4567 61-23-45-67 7-1-2-3-4-5-6-7'
print([x.replace('-', '') for x in re.findall(r'\b[67](?:-?\d){7}\b', data)])
# => ['71234567', '61234567', '71234567', '61234567', '71234567']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With