I am trying to download the Kaggle leaderboard table available under an individual Kaggle competition. I have used the Kaggle API and also downloaded it via the 'Raw Data' output but the table data is incomplete.
The downloaded table specifically does not contain information on '# of Entries' and 'Member Details (if available for a competition)'.
I have tried scraping the table (based on code available here) as well but the code is unable to identify any table on the website:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
# Site URL
url="https://www.kaggle.com/c/jane-street-market-prediction/leaderboard"
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
# Parse HTML code for the entire site
soup = BeautifulSoup(html_content, "lxml")
#print(soup.prettify()) # print the parsed data of html
# The following line will generate a list of HTML content for each table
leaderboard = soup.find_all('table', attrs={"class": "competition-leaderboard__table"})
print("Number of tables on site: ",len(leaderboard))
Would be great if someone could help out on this. Thanks in advance!
You can try the Meta Kaggle dataset. It has files with team membership data and solutions submitted by team and competition.
P.S. Parsing competition web pages is indeed hard - I've spent hours trying to get info that way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With