I'm trying to parse some HTML using the AngleSharp library, which has been great so far. I now stumble upon a scenario where I'd like to parse the following piece of HTML:
<a name="someLink" href="#someLink">Link 1</a>
Some text that happens to be in between elements...
<b>Some stuff in bold</b>
Some more text
<br>
Of course, this piece of HTML has enclosing parent elements etc, but the resulting list of parsed elements for this piece of HTML is:
Effectively skipping the text in between elements. How do I obtain this text? I would think AngleSharp would generate TextNodes for these parts?
Note that fetching the parent's complete TextContent isn't what I want to do, since I still actually need the structure of the elements to know what is what.
This behavior is actually what's expected by the DOM spec. You may not realize this, but you've answered your own question :)
Here's what you seem to get not quite right: Element != Node. You asked for the elements, but you're looking for the nodes.
Tags like <a> etc end up as elements, whereas text nodes are... well... nodes, not elements. And you're asking the API to give you the elements. In other words, you're telling the API you don't want the text nodes to be returned.
Let's do a simple demo.
var parser = new HtmlParser();
var doc = parser.Parse(@"<div id=""content"">
<a name=""someLink"" href=""#someLink"">Link 1</a>
Some text that happens to be in between elements...
<b>Some stuff in bold</b>
Some more text
<br>
</div>");
var content = doc.GetElementById("content");
Now, here's essentially what you've been doing:
foreach (var element in content.Children)
Console.WriteLine(element.GetType().Name);
This outputs:
HtmlAnchorElement
HtmlBoldElement
HtmlBreakRowElement
Here's what you want instead:
foreach (var element in content.ChildNodes)
Console.WriteLine(element.GetType().Name);
Now the output is:
TextNode
HtmlAnchorElement
TextNode
HtmlBoldElement
TextNode
HtmlBreakRowElement
TextNode
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With