I am looking for some advices on how it could be done. I'm trying a solution only with xpath:
An html example:
<div>
  <div>
    <div>text div (leaf)</div>
    <p>text paragraph (leaf)</p>
  </div>
</div>
<p>text paragraph 2 (leaf)</p>
Code:
doc = Nokogiri::HTML.fragment("- the html above -")
result = doc.xpath("*[not(child::*)]")
[#<Nokogiri::XML::Element:0x3febf50f9328 name="p" children=[#<Nokogiri::XML::Text:0x3febf519b718 "text paragraph 2 (leaf)">]>] 
But this xpath only gives me the last "p". What I want is like a flatten behavior, only returning the leaf nodes.
Here are some reference answers in stackoverflow:
How to select all leaf nodes using XPath expression?
XPath - Get node with no child of specific type
Thanks
You can find all element nodes that have no child elements using:
//*[not(*)]
Example:
require 'nokogiri'
doc = Nokogiri::HTML.parse <<-end
<div>
  <div>
    <div>text div (leaf)</div>
    <p>text paragraph (leaf)</p>
  </div>
</div>
<p>text paragraph 2 (leaf)</p>
end
puts doc.xpath('//*[not(*)]').length
#=> 3
doc.xpath('//*[not(*)]').each do |e|
    puts e.text
end
#=> "text div (leaf)"
#=> "text paragraph (leaf)"
#=> "text paragraph 2 (leaf)"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With