Basic BeautifulSoup Wikipedia scrape

Question

I'm trying to get a very basic, and short, basic unordered list <ul> off of Wikipedia. My end goal is to put it into a DataFrame. My question is, where do I go from here?

In [28]: from bs4 import BeautifulSoup

         import urllib2

         import requests

         from pandas import Series,DataFrame

In [29]: url = "https://en.wikipedia.org/wiki/National_Pro_Grid_League"

In [31]: result = requests.get(url)

In [32]: c = result.content

In [33]: soup = BeautifulSoup(c)

I cant seem to find any answers on this StackOverflow, so I would appreciate any advice anyone could give me.
This is the specific list I'm looking for:

Active teams[edit]
Baltimore Anthem (2015–present)
Boston Iron (2014–present)
DC Brawlers (2014–present)
Los Angeles Reign (2014–present)
Miami Surge (2014–present)
New York Rhinos (2014–present)
Phoenix Rise (2014–present)
San Francisco Fire (2014–present)

wpercy · Accepted Answer

First you'll want to find the correct part of the page. You can do this by finding the heading with id="Active_teams_at_league_closing" and then finding the next <ul> element from there.

from bs4 import BeautifulSoup
import requests

url = "https://en.wikipedia.org/wiki/National_Pro_Grid_League"
r = requests.get(url)
soup = BeautifulSoup(r.content)

heading = soup.find(id='Active_teams_at_league_closing')
teams = heading.find_next('ul')
for team in teams:
    print(team.string)

Basic BeautifulSoup Wikipedia scrape

Tags:

python

pandas

beautifulsoup

web-scraping

Spencer Timothy

1 Answers

wpercy

Recent Activity

Donate For Us

Basic BeautifulSoup Wikipedia scrape

Tags:

python

pandas

beautifulsoup

web-scraping

Spencer Timothy

1 Answers

wpercy

Related questions

Recent Activity

Donate For Us