Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrape a table class in Python

I am trying to scrape http://emojipedia.org/emoji/ , but I am not sure what is the most efficient way to do so. What I would like to scrape is found inside the table class ="emoji_list". I would like to save the stuff inside each "td" in separate columns. The output will be like the following where each line represent an emoji:

Col1_Link               Col2_emoji      Col3_Comment        Col4_UTF
"/emoji/%F0%9F%98%80/"       😀        Grinning Face         U+1F600

I have written the following code so far, but I am not sure what is the best way to do that.

import requests
from bs4 import BeautifulSoup 
import urllib
import re    

url = "http://emojipedia.org/emoji/"
html = urllib.urlopen(url)
soup = BeautifulSoup(html)
soup.findAll('tr', limit=2)

Many thanks in advance for your help.

like image 609
morfara Avatar asked Mar 21 '26 11:03

morfara


1 Answers

soup.findAll('tr', limit=2) won't do much considering that just gets the first two trs on the page. You need to first find all the rows of the table then extract what you want which is inside the two tds in each tr:

import requests
from bs4 import BeautifulSoup
url = "http://emojipedia.org/emoji/"
html = requests.get(url).content

soup = BeautifulSoup(html)
table = soup.select_one("table.emoji-list")

for row in table.find_all("tr")[:5]:
    td1, td2 = row.find_all("td")
    em, desc =  td1.text.split(None, 1)
    print(td1.a["href"], em, desc, td2.text)

Another way would be to only get text without splitting would be to get the text from the a tag excluding the child text with find(text=True, recursive=False)

for row in table.find_all("tr"):
    td1, td2 = row.find_all("td")
    print(td1.a["href"], td1.a.span.text, td1.a.find(text=True, recursive=False), td2.text)

Also I would stick to using requests over urllib.

like image 134
Padraic Cunningham Avatar answered Mar 24 '26 00:03

Padraic Cunningham



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!