Python append adding same data

Question

I'm trying to extract the stock price and the market cap data from a Korean website.

Here is my code:

import requests
from bs4 import BeautifulSoup
 
response = requests.get('http://finance.naver.com/sise/sise_market_sum.nhn?sosok=0&page=1')
html = response.text
soup = BeautifulSoup(html, 'html.parser')

table = soup.find('table', { 'class': 'type_2' })
data = []
for tr in table.find_all('tr'):
    tds = list(tr.find_all('td')) 

    for td in tds:
        if td.find('a'):
            company_name = td.find('a').text 
            price_now = tds[2].text
            market_cap = tds[5].text 
            data.append([company_name, price_now, market_cap])    

 
print(*data, sep = "
")

And this is the result I get. (Sorry for the Korean characters)

['삼성전자', '43,650', '100']

['', '43,650', '100']

['SK하이닉스', '69,800', '5,000']

['', '69,800', '5,000']

The second and the fourth line in the outcome should not be there. I just want the first and the third line. Where do line two and four come from and how do I get rid of them?

Mark White · Accepted Answer

My dear friend, I think the problem is you should check if td.find('a').text have values!

So I change your code to this and it works!

import requests
from bs4 import BeautifulSoup

response = requests.get(
    'http://finance.naver.com/sise/sise_market_sum.nhn?sosok=0&page=1')
html = response.text
soup = BeautifulSoup(html, 'html.parser')

table = soup.find('table', {'class': 'type_2'})
data = []
for tr in table.find_all('tr'):
    tds = list(tr.find_all('td'))

    for td in tds:
        # where magic happends!
        if td.find('a') and td.find('a').text:
            company_name = td.find('a').text
            price_now = tds[2].text
            market_cap = tds[5].text
            data.append([company_name, price_now, market_cap])

print(*data, sep="
")

jwoff · Answer

While I can't test it, it could be because there are two a tags on the page you're trying to scrape, while your for loop and if statement is set up to append information whenever it finds an a tag. The first one has the name of the company, but the second one has no text, thus the blank output (because you do td.find('a').text, it tries to get the text of the target a tag).

For reference, this is the a tag you want:

<a href="/item/main.nhn?code=005930" class="tltle">삼성전자</a>

This is what you're picking up the second time around:

<a href="/item/board.nhn?code=005930"><img src="https://ssl.pstatic.net/imgstock/images5/ico_debatebl2.gif" width="15" height="13" alt="토론실"></a>

Perhaps you can change your if statement to make sure the class of the a tag is title or something to make sure that you only enter the if statement when you're looking at the a tag with the company name in it.

I'm at work so I can't really test anything, but let me know if you have any questions later!

Python append adding same data

Tags:

python

beautifulsoup

K Lee

2 Answers

Mark White

jwoff

Recent Activity

Donate For Us

Python append adding same data

Tags:

python

beautifulsoup

K Lee

2 Answers

Mark White

jwoff

Related questions

Recent Activity

Donate For Us