Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to ignore specific characters

I am parsing a text on non alphanumeric characters and would like to exclude specific characters like apostrophes, dash/hyphens and commas.

I would like to build a regex for the following cases:

  1. non-alphanumeric character, excluding apostrophes and hypens
  2. non-alphanumeric character, excluding commas,apostrophes and hypens

This is what i have tried:

def split_text(text):
    my_text = re.split('\W',text)

    # the following doesn't work.
    #my_text = re.split('([A-Z]\w*)',text)
    #my_text = re.split("^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$",text)

    return my_text
  • Case 1:
    • Sample Input: What's up? It's good to see you my-friend. "Hello" to-the world!.
    • Sample Output: ['What's','up','It's','good','to','see','you','my-friend','Hello','to-the','world']
  • Case 2:
    • Sample Input: It means that, it's not good-to do such things.
    • Sample Output: ['It', 'means', 'that,', 'it's', 'not', 'good-to', 'do', 'such', 'things']

Any ideas

like image 694
user3247054 Avatar asked Nov 22 '25 18:11

user3247054


2 Answers

is this what you want?

non-alphanumeric character, excluding apostrophes and hypens

my_text = re.split(r"[^\w'-]+",text)

non-alphanumeric character, excluding commas,apostrophes and hypens

my_text = re.split(r"[^\w-',]+",text)

the [] syntax defines a character class, [^..] "complements" it, i.e. it negates it.

See the documentation about that:

Characters that are not within a range can be matched by complementing the set. If the first character of the set is '^', all the characters that are not in the set will be matched. For example, [^5] will match any character except '5', and [^^] will match any character except '^'. ^ has no special meaning if it’s not the first character in the set.

like image 60
zmo Avatar answered Nov 24 '25 07:11

zmo


You can use a negated character class for this:

my_text = re.split(r"[^\w'-]+",text)

or

my_text = re.split(r"[^\w,'-]+",text)   # also excludes commas
like image 26
Tim Pietzcker Avatar answered Nov 24 '25 09:11

Tim Pietzcker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!