How to select leaf tags of an html document using jsoup

Question

I am using jsoup to parse an html document. I need to extract all the child div elements. This is basically div tags without nested div tags. I used the following in java to extract div tags,

Elements bodyTag = document.select("div:not(div>div)");

Here is an example:

<div id="header">
     <div class="container">
         <div id="header-logo"> 
         <a href="/" title="mekay.com">
             <div id="logo">
             </div> </a>
        </div>
        <div id="header-banner">
            <div data-type="ad" data-publisher="lqm.j2ee.site" data-zone="ron">
            </div>
        </div>
     </div>
</div>

I need to extract only the following:

 <div id="logo">
 </div>
 <div data-type="ad" data-publisher="lqm.j2ee.site" data-zone="ron">
 </div>

Instead, the above code snippet is returning all the div tags. So, could you please help me figure out what is wrong with this selector

Buru · Accepted Answer

This one is perfectly working

Elements innerMostDivs = doc.select("div:not(:has(div))");

Try it online

add your html file
add css query as div:not(:has(div))
check resulted elements

How to select leaf tags of an html document using jsoup

Tags:

javascript

html

jsoup

mintra

1 Answers

Buru

Recent Activity

Donate For Us

How to select leaf tags of an html document using jsoup

Tags:

javascript

html

jsoup

mintra

1 Answers

Buru

Related questions

Recent Activity

Donate For Us