This question was an epic failure, but here's the working solution. It's based on Gumbo's answer (Gumbo's was close to working so I chose it as the accepted answer):
r'(?=[a-zA-Z0-9\-]{4,25}$)^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$'
I'm using Python and I'm not trying to extract the value, but rather test to make sure it fits the pattern.
spam123-spam-eggs-eggs1
spam123-eggs123
spam
1234
eggs123
eggs1-
-spam123
spam--spam
I just can't have a dash at the starting or the end. There is a question on here that works in the opposite direction by getting the string value after the fact, but I simply need to test for the value so that I can disallow it. Also, it can be a maximum of 25 chars long, but a minimum of 4 chars long. Also, no 2 dashes can touch each other.
Here's what I've come up with after some experimentation with lookbehind, etc:
# Nothing here
In a regular expression, if you have [a-z] then it matches any lowercase letter. [0-9] matches any digit. So if you have [a-z0-9], then it matches any lowercase letter or digit.
In regular expressions, the hyphen ("-") notation has special meaning; it indicates a range that would match any number from 0 to 9. As a result, you must escape the "-" character with a forward slash ("\") when matching the literal hyphens in a social security number.
You also need to use regex \\ to match "\" (back-slash). Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.
Try this regular expression:
^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
This regular expression does only allow hyphens to separate sequences of one or more characters of [a-zA-Z0-9].
Edit    Following up your comment: The expression (…)* allows the part inside the group to be repeated zero or more times. That means
a(bc)*
is the same as
a|abc|abcbc|abcbcbc|abcbcbcbc|…
Edit Now that you changed the requirements: As you probably don’t want to restrict each hyphen separated part of the words in its length, you will need a look-ahead assertion to take the length into account:
(?=[a-zA-Z0-9-]{4,25}$)^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
The current regex is simple and fairly readable. Rather than making it long and complicated, have you considered applying the other constraints with normal Python string processing tools?
import re
def fits_pattern(string):
    if (4 <= len(string) <= 25 and
        "--" not in string and
        not string.startswith("-") and
        not string.endswith("-")):
        return re.match(r"[a-zA-Z0-9\-]", string)
    else:
        return None
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With