Using Beautiful Soup module, how can I get data of a div tag whose class name is feeditemcontent cxfeeditemcontent? Is it:
soup.class['feeditemcontent cxfeeditemcontent']
or:
soup.find_all('class')
This is the HTML source:
<div class="feeditemcontent cxfeeditemcontent">
    <div class="feeditembodyandfooter">
         <div class="feeditembody">
         <span>The actual data is some where here</span>
         </div>
     </div>
 </div> 
and this is the Python code:
 from BeautifulSoup import BeautifulSoup
 html_doc = open('home.jsp.html', 'r')
 soup = BeautifulSoup(html_doc)
 class="feeditemcontent cxfeeditemcontent"
Find element by class using CSS Selector Alternatively, you can search for HTML tags by class name using a CSS selector with BeautifulSoup select() method. Using the select method allows you to match tags that also have another CSS class other than “quote”.
See also the documentation on formatters; you'll most likely either use formatter="minimal" (the default) or formatter="html" (for html entities) unless you want to manually process the text in some way. encode_contents returns an encoded bytestring. If you want a Python Unicode string then use decode_contents instead.
Beautiful Soup 4 treats the value of the "class" attribute as a list rather than a string, meaning jadkik94's solution can be simplified:
from bs4 import BeautifulSoup                                                   
def match_class(target):                                                        
    def do_match(tag):                                                          
        classes = tag.get('class', [])                                          
        return all(c in classes for c in target)                                
    return do_match                                                             
soup = BeautifulSoup(html)                                                      
print soup.find_all(match_class(["feeditemcontent", "cxfeeditemcontent"]))
Try this, maybe it's too much for this simple thing but it works:
def match_class(target):
    target = target.split()
    def do_match(tag):
        try:
            classes = dict(tag.attrs)["class"]
        except KeyError:
            classes = ""
        classes = classes.split()
        return all(c in classes for c in target)
    return do_match
html = """<div class="feeditemcontent cxfeeditemcontent">
<div class="feeditembodyandfooter">
<div class="feeditembody">
<span>The actual data is some where here</span>
</div>
</div>
</div>"""
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
matches = soup.findAll(match_class("feeditemcontent cxfeeditemcontent"))
for m in matches:
    print m
    print "-"*10
matches = soup.findAll(match_class("feeditembody"))
for m in matches:
    print m
    print "-"*10
soup.findAll("div", class_="feeditemcontent cxfeeditemcontent")
So, If I want to get all div tags of class header <div class="header"> from stackoverflow.com, an example with BeautifulSoup would be something like:
from bs4 import BeautifulSoup as bs
import requests 
url = "http://stackoverflow.com/"
html = requests.get(url).text
soup = bs(html)
tags = soup.findAll("div", class_="header")
It is already in bs4 documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With