Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I don't understand a regexp

I'm following along a tutorial (Ruby) that uses a regex to remove all html tags from a string:

product.description.gsub(/<.*?>/,'').

I don't know how to interpret the ?. Does it mean: "at least one of the previous"? In that case, wouldn't /<.+>/ have been more adequate?

like image 238
Flavius Stef Avatar asked Jan 22 '26 00:01

Flavius Stef


1 Answers

In this case, it make * lazy.

1* - match as many 1s as possible.
1*? - match as few 1s as possible.

Here, when you have <a>text<b>some more text, <.*> will match <a>text<b>.
<.*?>, however, will match <a> and <b>.

See also: Laziness Instead of Greediness

Another important note here is that this regex can easily fail on valid HTML, it is better to use an HTML parser, and get the text of your document.

like image 193
Kobi Avatar answered Jan 25 '26 02:01

Kobi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!