I'm following along a tutorial (Ruby) that uses a regex to remove all html tags from a string:
product.description.gsub(/<.*?>/,'').
I don't know how to interpret the ?. Does it mean: "at least one of the previous"? In that case, wouldn't /<.+>/ have been more adequate?
In this case, it make * lazy.
1* - match as many 1s as possible.
1*? - match as few 1s as possible.
Here, when you have <a>text<b>some more text, <.*> will match <a>text<b>.
<.*?>, however, will match <a> and <b>.
See also: Laziness Instead of Greediness
Another important note here is that this regex can easily fail on valid HTML, it is better to use an HTML parser, and get the text of your document.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With