I'm trying to show only the text inside the tag, for example:
<span class="listing-row__price ">$71,996</span>
I want to only show
"$71,996"
My code is:
import requests
from bs4 import BeautifulSoup
from csv import writer
response = requests.get('https://www.cars.com/for-sale/searchresults.action/?mdId=21811&mkId=20024&page=1&perPage=100&rd=99999&searchSource=PAGINATION&showMore=false&sort=relevance&stkTypId=28880&zc=11209')
soup = BeautifulSoup(response.text, 'html.parser')
cars = soup.find_all('span', attrs={'class': 'listing-row__price'})
print(cars)
How can I extract the text from the tags?
To get the text within the tags, there are a couple of approaches,
a) Use the .text attribute of the tag.
cars = soup.find_all('span', attrs={'class': 'listing-row__price'})
for tag in cars:
print(tag.text.strip())
Output
$71,996
$75,831
$71,412
$75,476
....
b) Use get_text()
for tag in cars:
print(tag.get_text().strip())
c) If there is only that string inside the tag, you can use these options also
.string.contents[0]next(tag.children)next(tag.strings)next(tag.stripped_strings)ie.
for tag in cars:
print(tag.string.strip()) #or uncomment any of the below lines
#print(tag.contents[0].strip())
#print(next(tag.children).strip())
#print(next(tag.strings).strip())
#print(next(tag.stripped_strings))
Outputs:
$71,996
$75,831
$71,412
$75,476
$77,001
...
Note:
.text and .string are not the same. If there are other elements in the tag, .string returns the None, while .text will return the text inside the tag.
from bs4 import BeautifulSoup
html="""
<p>hello <b>there</b></p>
"""
soup = BeautifulSoup(html, 'html.parser')
p = soup.find('p')
print(p.string)
print(p.text)
Outputs
None
hello there
print( [x.text for x in cars] )
Actually the request not returning any response. As I see, response code is 500 which means network issue and you are not getting any data.
What you are missing is user-agent which you need to send in headers along with request.
import requests
import re #regex library
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}
crawl_url = 'https://www.cars.com/for-sale/searchresults.action/?mdId=21811&mkId=20024&page=1&perPage=100&rd=99999&searchSource=PAGINATION&showMore=false&sort=relevance&stkTypId=28880&zc=11209'
response = requests.get(crawl_url, headers=headers )
cars = soup.find_all('span', attrs={'class': 'listing-row__price'})
for car in cars:
print(re.sub(r'\s+', '', ''.join([car.text])))
$71,412
$75,476
$77,001
$77,822
$107,271
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With