Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove html tags from strings in Python using BeautifulSoup

programming newbie here :)

I'd like to print the prices from the website using BeautifulSoup. this is my code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-


from bs4 import BeautifulSoup, SoupStrainer
from urllib2 import urlopen

url = "Some retailer's url"
html = urlopen(url).read()
product = SoupStrainer('span',{'style': 'color:red;'})
soup = BeautifulSoup(html, parse_only=product)
print soup.prettify()

and it prints prices in the following order:

<span style="color:red;">
 180
</span>
<span style="color:red;">
 1250
</span>
<span style="color:red;">
 380
</span>

I tried print soup.text.strip() but it returned 1801250380

Please help me to print the prices per single row :)

Many thanks!

like image 491
user3404005 Avatar asked Jan 19 '26 06:01

user3404005


2 Answers

>>> print "\n".join([p.get_text(strip=True) for p in soup.find_all(product)])
180
1250
380
like image 131
jfs Avatar answered Jan 21 '26 03:01

jfs


This will get you a list of strings converted to integers:

>>> [int(span.text) for span in soup.find_all('span')]
[180, 1250, 380]
like image 41
Steinar Lima Avatar answered Jan 21 '26 04:01

Steinar Lima