Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Basic BeautifulSoup Wikipedia scrape

I'm trying to get a very basic, and short, basic unordered list <ul> off of Wikipedia. My end goal is to put it into a DataFrame. My question is, where do I go from here?

In [28]: from bs4 import BeautifulSoup

         import urllib2

         import requests

         from pandas import Series,DataFrame

In [29]: url = "https://en.wikipedia.org/wiki/National_Pro_Grid_League"

In [31]: result = requests.get(url)

In [32]: c = result.content

In [33]: soup = BeautifulSoup(c)

I cant seem to find any answers on this StackOverflow, so I would appreciate any advice anyone could give me.
This is the specific list I'm looking for:

Active teams[edit]
Baltimore Anthem (2015–present)
Boston Iron (2014–present)
DC Brawlers (2014–present)
Los Angeles Reign (2014–present)
Miami Surge (2014–present)
New York Rhinos (2014–present)
Phoenix Rise (2014–present)
San Francisco Fire (2014–present)
like image 497
Spencer Timothy Avatar asked Dec 23 '25 00:12

Spencer Timothy


1 Answers

First you'll want to find the correct part of the page. You can do this by finding the heading with id="Active_teams_at_league_closing" and then finding the next <ul> element from there.

from bs4 import BeautifulSoup
import requests

url = "https://en.wikipedia.org/wiki/National_Pro_Grid_League"
r = requests.get(url)
soup = BeautifulSoup(r.content)

heading = soup.find(id='Active_teams_at_league_closing')
teams = heading.find_next('ul')
for team in teams:
    print(team.string)
like image 181
wpercy Avatar answered Dec 24 '25 14:12

wpercy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!