Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python BeautifulSoup changes behaviour when provided string in findall()

I need to find some code tags with string , while find_all by tagname will successfully find all code tags, when i use a string method it weardly wont find all code tags.Here is my code:

from bs4 import BeautifulSoup
import re

text = """<!-- Data starts here -->
<code>LGEL 281220Z 33010G20KT CAVOK 32/11 Q1013</code><br/>
<br/><code>TAF LGEL 281100Z 2812/2912 34018G28KT 9999 FEW020 <br/>  BECMG 2816/2818 34015KT <br/>  TEMPO 2909/2912 34015G25KT</code><br/>
<hr width="65%"/>
<!-- Data ends here -->"""


soup = BeautifulSoup(text, 'html.parser')

info = soup.find_all("code")
value = soup.find_all('code',string = re.compile('LGEL'))

print(value)#This will not find second code tag
print(info)#This finds all code tags successfully

1 Answers

You have to first extract() the br tags, they are breaking the html structure. Then your code will work.

from bs4 import BeautifulSoup
import re

text = """<!-- Data starts here -->
<code>LGEL 281220Z 33010G20KT CAVOK 32/11 Q1013</code><br/>
<br/><code>TAF LGEL 281100Z 2812/2912 34018G28KT 9999 FEW020  <br/>  BECMG 2816/2818 34015KT  <br/>  TEMPO 2909/2912 34015G25KT</code><br/>
<hr width="65%"/>
<!-- Data ends here -->"""


soup = BeautifulSoup(text, 'html.parser')
for br in soup.find_all('br'):
    br.extract()

info = soup.find_all("code")
value = soup.find_all('code', string = re.compile('LGEL'))

print(value)#This will not find second code tag
print(info)#This finds all code tags successfully

OUTPUT:

[<code>LGEL 281220Z 33010G20KT CAVOK 32/11 Q1013</code>, <code>TAF LGEL 281100Z 2812/2912 34018G28KT 9999 FEW020   BECMG 2816/2818 34015KT   TEMPO 2909/2912 34015G25KT</code>]
[<code>LGEL 281220Z 33010G20KT CAVOK 32/11 Q1013</code>, <code>TAF LGEL 281100Z 2812/2912 34018G28KT 9999 FEW020   BECMG 2816/2818 34015KT   TEMPO 2909/2912 34015G25KT</code>]
like image 130
Maaz Avatar answered May 09 '26 13:05

Maaz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!