We're trying to find a regular expression that allows us to split sentences into words.
Of course, the immediate answer is to use \w, except that it doesn't split on _which we need.
Then, we tried [a-zA-Z0-9] (we'd like to allow for numbers inside words), the problem is that it splits on accents, which are fairly common in many langues...
So, ideally, what regexp should I use to split the following sentence in the following words :
"Je ne déguste pas d'asperges, car je n'aime pas ça"
info
["Je","ne","déguste","pas","d", "asperges", "car","je", "n","aime","pas", "ça"]
STR = "Je ne déguste pas d'asperges, car je n'aime pas ça"
words = STR.split /[\s,']+/
for w in words
print w, "\n"
end
The output is:
Je
ne
déguste
pas
d
asperges
car
je
n
aime
pas
ça
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With