Using Jsoup library, I'm trying to get the content(only text) from an HTML string. There are two methods which can give me the content :
Jsoup.parse(htmlString).body().text()
Jsoup.parse(htmlString).text()
I know that the first method will return only text of the body. What does the second method return? Which one is better for my usage?
Note : According to documentation, text method is used to set the text of the body of document
Each Element has the method text()
public java.lang.String text() Gets the combined text of this element and all its children. Whitespace is normalized and trimmed.
All elements, which can contain text-nodes (node.nodeName() returns #text), are supposed to be part of the body , except for the <title> tag (the <script> and <style> tags have child-nodes with node name #data).
So a valid page will return the same text for document.body().text() and document.text(), as long as the title tag is not set in the head, otherwise document.text() will additionally contain the title text.
The second line includes the text from the entire HTML document including the head, title and body, whilst the first only includes text from the body.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With