<table >
<tr>
<td colspan="2" style="height: 14px">
tdtext1
<a>hyperlinktext1<a/>
</td>
</tr>
<tr>
<td>
tdtext2
</td>
<td>
<span>spantext1</span>
</td>
</tr>
</table>
This is my sample text. How to write a regular expression in C# to get the matches for the innertext for td, span, hyperlinks.
I cringe every time I hear the words regex and HTML in the same sentence. I would suggest checking out the HtmlAgilityPack on CodePlex which is a very tolerant HTML parser that lets you use XPath queries against the parsed document. It's much cleaner and the person that inherits your code will thank you!
EDIT
As per the comments below, here's some examples of how to get the InnerText of those tags. Very simple.
var doc = new HtmlDocument();
doc.LoadHtml("...your sample html...");
// all <td> tags in the document
foreach (HtmlNode td in doc.DocumentNode.SelectNodes("//td")) {
Console.WriteLine(td.InnerText);
}
// all <span> tags in the document
foreach (HtmlNode span in doc.DocumentNode.SelectNodes("//span")) {
Console.WriteLine(span.InnerText);
}
// all <a> tags in the document
foreach (HtmlNode a in doc.DocumentNode.SelectNodes("//a")) {
Console.WriteLine(a.InnerText);
}
static void Main(string[] args)
{
//...
// using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
// {
HtmlDocument doc = new HtmlWeb().Load("http://www.freeclup.com");
foreach (HtmlNode span in doc.DocumentNode.SelectNodes("//span"))
{
Console.WriteLine(span.InnerText);
}
Console.ReadKey();
// }
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With