Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pulling contents of div tags with beautifulsoup and creating a pandas dataframe

date = '2017-08-04'
writer = pd.ExcelWriter('MLB Daily Data.xlsx')

url_4 = 'http://www.baseballpress.com/lineups/'+date
resp_4 = requests.get(url_4)
soup_4 = BeautifulSoup(resp_4.text, "lxml")
lineups = soup_4.findAll('div', attrs = {'class': 'players'},limit=None)

row_lineup = 0
for lineup in lineups:
    lineup1 = lineup.prettify()
    lineup2 = lineup1.replace('>'&&'<',',')
    df4 = pd.DataFrame(eval(lineup2))
    df4.to_excel(writer, sheet_name='Starting Lineups', startrow=row_lineups, startcol=0)   
    row_lineups = row_lineups + len(df.index) + 3
writer.save()

I am trying to get the starting lineups from the webpage, convert it them into a pandas data frame, and then save it to an excel file. I'm having an issue with turning it into a data frame. I replaced the brackets with commas because I figured that would turn it into csv format.

like image 888
Brett Ford Avatar asked Jan 30 '26 20:01

Brett Ford


1 Answers

This may get you moving in the right direction, where each line is a line up

data = [[x.text for x in y.findAll('a')] for y in lineups]

df = pd.DataFrame(data)
like image 122
DJK Avatar answered Feb 01 '26 09:02

DJK



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!