Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select leaf tags of an html document using jsoup

I am using jsoup to parse an html document. I need to extract all the child div elements. This is basically div tags without nested div tags. I used the following in java to extract div tags,

Elements bodyTag = document.select("div:not(div>div)"); 

Here is an example:

<div id="header">
     <div class="container">
         <div id="header-logo"> 
         <a href="/" title="mekay.com">
             <div id="logo">
             </div> </a>
        </div>
        <div id="header-banner">
            <div data-type="ad" data-publisher="lqm.j2ee.site" data-zone="ron">
            </div>
        </div>
     </div>
</div>

I need to extract only the following:

 <div id="logo">
 </div>
 <div data-type="ad" data-publisher="lqm.j2ee.site" data-zone="ron">
 </div>

Instead, the above code snippet is returning all the div tags. So, could you please help me figure out what is wrong with this selector

like image 480
mintra Avatar asked Jan 01 '26 06:01

mintra


1 Answers

This one is perfectly working

Elements innerMostDivs = doc.select("div:not(:has(div))");

Try it online

  • add your html file
  • add css query as div:not(:has(div))
  • check resulted elements
like image 155
Buru Avatar answered Jan 03 '26 12:01

Buru



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!