Scrape a table class in Python

Question

I am trying to scrape http://emojipedia.org/emoji/ , but I am not sure what is the most efficient way to do so. What I would like to scrape is found inside the table class ="emoji_list". I would like to save the stuff inside each "td" in separate columns. The output will be like the following where each line represent an emoji:

Col1_Link               Col2_emoji      Col3_Comment        Col4_UTF
"/emoji/%F0%9F%98%80/"       😀        Grinning Face         U+1F600

I have written the following code so far, but I am not sure what is the best way to do that.

import requests
from bs4 import BeautifulSoup 
import urllib
import re    

url = "http://emojipedia.org/emoji/"
html = urllib.urlopen(url)
soup = BeautifulSoup(html)
soup.findAll('tr', limit=2)

Many thanks in advance for your help.

Padraic Cunningham · Accepted Answer

soup.findAll('tr', limit=2) won't do much considering that just gets the first two trs on the page. You need to first find all the rows of the table then extract what you want which is inside the two tds in each tr:

import requests
from bs4 import BeautifulSoup
url = "http://emojipedia.org/emoji/"
html = requests.get(url).content

soup = BeautifulSoup(html)
table = soup.select_one("table.emoji-list")

for row in table.find_all("tr")[:5]:
    td1, td2 = row.find_all("td")
    em, desc =  td1.text.split(None, 1)
    print(td1.a["href"], em, desc, td2.text)

Another way would be to only get text without splitting would be to get the text from the a tag excluding the child text with find(text=True, recursive=False)

for row in table.find_all("tr"):
    td1, td2 = row.find_all("td")
    print(td1.a["href"], td1.a.span.text, td1.a.find(text=True, recursive=False), td2.text)

Also I would stick to using requests over urllib.

Scrape a table class in Python

Tags:

python

beautifulsoup

python-requests

web-scraping

morfara

1 Answers

Padraic Cunningham

Recent Activity

Donate For Us

Scrape a table class in Python

Tags:

python

beautifulsoup

python-requests

web-scraping

morfara

1 Answers

Padraic Cunningham

Related questions

Recent Activity

Donate For Us