i have an XML file with an defined structure but different number of tags, like
file1.xml:
<document>
<subDoc>
<id>1</id>
<myId>1</myId>
</subDoc>
</document>
file2.xml:
<document>
<subDoc>
<id>2</id>
</subDoc>
</document>
Now i like to check, if the tag myId exits. So i did the following:
data = open("file1.xml",'r').read()
xml = BeautifulSoup(data)
hasAttrBs = xml.document.subdoc.has_attr('myID')
hasAttrPy = hasattr(xml.document.subdoc,'myID')
hasType = type(xml.document.subdoc.myid)
The result is for file1.xml:
hasAttrBs -> False
hasAttrPy -> True
hasType -> <class 'bs4.element.Tag'>
file2.xml:
hasAttrBs -> False
hasAttrPy -> True
hasType -> <type 'NoneType'>
Okay, <myId> is not an attribute of <subdoc>.
But how i can test, if an sub-tag exists?
//Edit: By the way: I'm don't really like to iterate trough the whole subdoc, because that will be very slow. I hope to find an way where I can direct address/ask that element.
Beautiful Soup's find_all(~) method returns a list of all the tags or strings that match a particular criteria.
A tag object in BeautifulSoup corresponds to an HTML or XML tag in the actual page or document. Tags contain lot of attributes and methods and two important features of a tag are its name and attributes.
BeautifulSoup is a Python package that parses broken HTML, just like lxml supports it based on the parser of libxml2.
if tag.find('child_tag_name'):
The simplest way to find if a child tag exists is simply
childTag = xml.find('childTag')
if childTag:
# do stuff
More specifically to OP's question:
If you don't know the structure of the XML doc, you can use the .find() method of the soup. Something like this:
with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
xml = BeautifulSoup(data.read())
xml2 = BeautifulSoup(data2.read())
hasAttrBs = xml.find("myId")
hasAttrBs2 = xml2.find("myId")
If you do know the structure, you can get the desired element by accessing the tag name as an attribute like this xml.document.subdoc.myid. So the whole thing would go something like this:
with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
xml = BeautifulSoup(data.read())
xml2 = BeautifulSoup(data2.read())
hasAttrBs = xml.document.subdoc.myid
hasAttrBs2 = xml2.document.subdoc.myid
print hasAttrBs
print hasAttrBs2
Prints
<myid>1</myid>
None
Here's an example to check if h2 tag exists in an Instagram URL. Hope you find it useful:
import datetime
import urllib
import requests
from bs4 import BeautifulSoup
instagram_url = 'https://www.instagram.com/p/BHijrYFgX2v/?taken-by=findingmero'
html_source = requests.get(instagram_url).text
soup = BeautifulSoup(html_source, "lxml")
if not soup.find('h2'):
print("didn't find h2")
you can handle it like this:
for child in xml.document.subdoc.children:
if 'myId' == child.name:
return True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With