How do I remove everything beginning in '<' and ending in '>' from a string in C#. I know it can be done with regex but I'm not very good with it.
The tag pattern I quickly wrote for a recent small project is this one.
string tagPattern = @"<[!--\W*?]*?[/]*?\w+.*?>";
I used it like this
MatchCollection matches = Regex.Matches(input, tagPattern);
foreach (Match match in matches)
{
input = input.Replace(match.Value, string.Empty);
}
It would likely need to be modified to correctly handle script or style tags.
Non regex option: But it still won't parse nested tags!
public static string StripHTML(string line)
{
int finished = 0;
int beginStrip;
int endStrip;
finished = line.IndexOf('<');
while (finished != -1)
{
beginStrip = line.IndexOf('<');
endStrip = line.IndexOf('>', beginStrip + 1);
line = line.Remove(beginStrip, (endStrip + 1) - beginStrip);
finished = line.IndexOf('<');
}
return line;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With