Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for parsing tags from a string.. Flickr style

I wonder if anyone can provide me with the regular expressions needed to parse a string like:

'foo bar "multiple word tag"'

into an array of tags like:

["foo","bar","multiple word tag"]

Thanks


2 Answers

In Ruby

scan(/\"([\w ]+)\"|(\w+)/).flatten.compact

E.g.

"foo bar \"multiple words\" party_like_1999".scan(/\"([\w ]+)\"|(\w+)/).flatten.compact
=> ["foo", "bar", "multiple words", "party_like_1999"]
like image 91
eelco Avatar answered Nov 19 '25 09:11

eelco


You could implement a scanner to do this. For instance, in Python it'd look something like this:

import re
scanner = re.Scanner([
    (r"[a-zA-Z_]\w*", lambda s,t:t),       # regular tag
    (r"\".*?\"",      lambda s,t:t[1:-1]), # multi-word-tag
    (r"\s+",          None),               # whitespace not in multi-word-tag
    ])
tags, _ = scanner.scan('foo bar "multiple word tag"')
print tags
# ['foo', 'bar', 'multiple word tag']

This is called lexical analysis.

like image 32
Evan Fosmark Avatar answered Nov 19 '25 10:11

Evan Fosmark



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!