Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex must be in length range and contain letters and numbers

I want to write a regex that will find substrings of length 10-15 where all characters are [A-Z0-9] and it must contain at least 1 letter and one number (spaces are ok but not other special characters). Some examples:

  • ABABABABAB12345 should match
  • ABAB1234ABA32 should match
  • ABA BA BABAB12345 should match
  • 1234567890987 should not match
  • ABCDEFGHIJK should not match
  • ABABAB%ABAB12?345 should not match

So far the best two candidates I have come up with are:

  1. (?![A-Z]{10,15}|[0-9]{10,15})[0-9A-Z]{10,15} - this fails because if the string has 10 consecutive numbers/letters it will not match, even though the 15 character string has a mix (e.g ABABABABAB12345).
  2. (?=.*[0-9])(?=.*[A-Z])([A-Z0-9]+){10,15 } - this fails because it will match 15 consecutive letters as long as there is a number later in the string (even though it is outside the match) and vice versa (e.g. 123456789098765 abcde will match 123456789098765).

(I need to do this in python and js)

like image 252
Toby S Avatar asked Nov 19 '25 21:11

Toby S


2 Answers

If each string is on its own line, then you can use start/end anchors to construct the regex:

  • ^(?=.*[0-9])(?=.*[A-Z])(?:\s*[A-Z0-9]\s*){10,15}$
    • ^ - start of line
      • (?=.*[0-9]) - lookahead, must contain a number
      • (?=.*[A-Z]) - lookahead, must contain a letter
      • (?: - start a non-capturing group
        • \s*[A-Z0-9]\s* Contains a letter or number with optional whitespaace
      • ) - end non-capturing group
      • {10,15} - Pattern occurs 10 to 15 times
    • $ - end of line

See a live example here: https://regex101.com/r/eWX2Qo/1

like image 181
flakes Avatar answered Nov 21 '25 10:11

flakes


This doesn't account for ABA BA BABAB12345, but this still might help.

Based on what you're trying to match, it looks like you want there to be a mix.

What you can do is two lookaheads. One looking for a in the following 15 characters, and another looking for a letter in the same space. If this matches, then it looks for a group of numbers and letters of length 10 to 15.

(?=.{0,14}\d)(?=.{0,14}[A-Z])[A-Z\d]{10,15}

https://regex101.com/r/qw1Q0S/1

(?=.{0,14}\d) character 1 through 15 has to be a number

(?=.{0,14}[A-Z]) character 1 through 15 has to be a capital letter

[A-Z\d]{10,15} match 10 to 15 letters and numbers if the previous conditions are true

Edit with an improved answer:

To account for the spaces, you can tweak the above concept.

(?=(?:. *+){0,14}\d)(?=(?:. *+){0,14}[A-Z])(?:[A-Z\d] *){10,15}

Above, in the lookahead we were matching .{0,14}. . has been changed to (?:. *+), which is a non capturing group that matches . in addition to 0 or more spaces.

So putting it together:

Lookahead 1:

(?=(?:. *+){0,14}\d)

This matches 0,14 characters that may or may not be followed by spaces. This effectively ignoring spaces. This also uses a possessive quantifier ( *+) when matching spaces to prevent the engine from backtracking when spaces are matched. The pattern would work without the + modifier, but would more than double the steps taken to match on the example.

Lookahead 2:

(?=(?:. *+){0,14}[A-Z])

Same as lookahead 1, but now testing for a capital letter instead of a digit.

If lookahead 1 and lookahead 2 both match, then the engine will be left in a place where our matches can potentially be made.

Actual match:

(?:[A-Z\d] *){10,15}

This matches the capital letters and numbers, but now also 0 or more spaces. The only drawback being that the trailing space will be include in your match, although that's easily handled in post processing.

Edit:

All whitespace (\r, \n, \t and ) can be accounted for by using \s vs .

Depending on the amount of space that exists. the possessive quantifier is necessary to prevent catestrophic backtracking. This modification to the input using possessive quantifiers completes in 22,332 steps, while this one using the same input, but with a regular quantifier, fails match anything due to catastrophic backtracking .

It should be noted that the possessive quantifier *+ is not supported with javascript or python's builtin re module, but it is supported with python's regex module:

>>> import regex
>>> pattern = r'(?=(?:.\s*+){0,14}\d)(?=(?:.\s*+){0,14}[A-Z])(?:[A-Z\d]\s*){10,15}'
>>> regex.search(pattern, 'AAAAAAAAAA\n2')
<regex.Match object; span=(0, 12), match='AAAAAAAAAA\n2'>
>>> 


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!