I need to find some code tags with string , while find_all by tagname will successfully find all code tags, when i use a string method it weardly wont find all code tags.Here is my code:
from bs4 import BeautifulSoup
import re
text = """<!-- Data starts here -->
<code>LGEL 281220Z 33010G20KT CAVOK 32/11 Q1013</code><br/>
<br/><code>TAF LGEL 281100Z 2812/2912 34018G28KT 9999 FEW020 <br/> BECMG 2816/2818 34015KT <br/> TEMPO 2909/2912 34015G25KT</code><br/>
<hr width="65%"/>
<!-- Data ends here -->"""
soup = BeautifulSoup(text, 'html.parser')
info = soup.find_all("code")
value = soup.find_all('code',string = re.compile('LGEL'))
print(value)#This will not find second code tag
print(info)#This finds all code tags successfully
You have to first extract() the br tags, they are breaking the html structure. Then your code will work.
from bs4 import BeautifulSoup
import re
text = """<!-- Data starts here -->
<code>LGEL 281220Z 33010G20KT CAVOK 32/11 Q1013</code><br/>
<br/><code>TAF LGEL 281100Z 2812/2912 34018G28KT 9999 FEW020 <br/> BECMG 2816/2818 34015KT <br/> TEMPO 2909/2912 34015G25KT</code><br/>
<hr width="65%"/>
<!-- Data ends here -->"""
soup = BeautifulSoup(text, 'html.parser')
for br in soup.find_all('br'):
br.extract()
info = soup.find_all("code")
value = soup.find_all('code', string = re.compile('LGEL'))
print(value)#This will not find second code tag
print(info)#This finds all code tags successfully
OUTPUT:
[<code>LGEL 281220Z 33010G20KT CAVOK 32/11 Q1013</code>, <code>TAF LGEL 281100Z 2812/2912 34018G28KT 9999 FEW020 BECMG 2816/2818 34015KT TEMPO 2909/2912 34015G25KT</code>]
[<code>LGEL 281220Z 33010G20KT CAVOK 32/11 Q1013</code>, <code>TAF LGEL 281100Z 2812/2912 34018G28KT 9999 FEW020 BECMG 2816/2818 34015KT TEMPO 2909/2912 34015G25KT</code>]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With