Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get a list of all parent tags in BeautifulSoup?

Let's say I have a structure like this:

<folder name="folder1">
     <folder name="folder2">
          <bookmark href="link.html">
     </folder>
</folder>

If I point to bookmark, what would be the command to just extract all of the folder lines? For example,

bookmarks = soup.findAll('bookmark')

then beautifulsoupcommand(bookmarks[0]) would return:

[<folder name="folder1">,<folder name="folder2">]

I'd also want to know when the ending tags hit too. Any ideas?

Thanks in advance!

like image 871
FinDev Avatar asked Sep 15 '25 05:09

FinDev


2 Answers

Here is my stab at it:

>>> from BeautifulSoup import BeautifulSoup
>>> html = """<folder name="folder1">
     <folder name="folder2">
          <bookmark href="link.html">
     </folder>
</folder>
"""
>>> soup = BeautifulSoup(html)
>>> bookmarks = soup.find_all('bookmark')
>>> [p.get('name') for p in bookmarks[0].find_all_previous(name = 'folder')]
[u'folder2', u'folder1']

The key difference from @eumiro's answer is that I am using find_all_previous instead of find_parents. When I tested @eumiro's solution I found that find_parents only returns the first (immediate) parent as the name of the parent and grandparent are the same.

>>> [p.get('name') for p in bookmarks[0].find_parents('folder')]
[u'folder2']

>>> [p.get('name') for p in bookmarks[0].find_parents()]
[u'folder2', None]

It does return two generations of parents if the parent and grandparent are differently named.

>>> html = """<folder name="folder1">
     <folder_parent name="folder2">
          <bookmark href="link.html">
     </folder_parent>
</folder>
"""
>>> soup = BeautifulSoup(html)
>>> bookmarks = soup.find_all('bookmark')
>>> [p.get('name') for p in bookmarks[0].find_parents()]
[u'folder2', u'folder1', None]
like image 185
Manoj Govindan Avatar answered Sep 17 '25 19:09

Manoj Govindan


bookmarks[0].findParents('folder') will return you a list of all parent nodes. You can then iterate over them and use their name attribute.

like image 41
eumiro Avatar answered Sep 17 '25 20:09

eumiro