Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading Kaggle zip files in R

Tags:

r

zip

kaggle

I'm attempting to download zip files directly from the Kaggle space in my R code itself. Unfortunately, it's not working out right. Here's what's happening:

For the San Francisco Crime Data set at https://www.kaggle.com/c/sf-crime/data

Take the first data set: test.csv.zip: https://www.kaggle.com/c/sf-crime/download/test.csv.zip

I'm using R code:

download.file(url='https://www.kaggle.com/c/sf-crime/download/test.csv.zip', destfile = 'test.zip',method = 'curl')

In place of the original 18.75MB file, R only downloads a 183byte file.

Session output:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   183  100   183    0     0    665      0 --:--:-- --:--:-- --:--:--   667

What am I doing wrong?

Thanks in advance, Rahul

like image 633
Rahul Avatar asked Dec 01 '25 20:12

Rahul


1 Answers

library(RCurl)

#Set your browsing links 
loginurl = "https://www.kaggle.com/account/login"
dataurl  = "https://www.kaggle.com/c/titanic/download/train.csv"

#Set user account data and agent
pars=list(
  UserName="[email protected]",
  Password="-----"
)
agent="Mozilla/5.0" #or whatever 

#Set RCurl pars
curl = getCurlHandle()
curlSetOpt(cookiejar="cookies.txt",  useragent = agent, followlocation = TRUE, curl=curl)
#Also if you do not need to read the cookies. 
#curlSetOpt(  cookiejar="", useragent = agent, followlocation = TRUE, curl=curl)

#Post login form
welcome=postForm(loginurl, .params = pars, curl=curl)

bdown=function(url, file, curl){
  f = CFILE(file, mode="wb")
  curlPerform(url = url, writedata = f@ref, noprogress=FALSE, curl = curl)
  close(f)
}

ret = bdown(dataurl, "c:\\test.csv",curl)

rm(curl)
gc()

FYI : use RCurl like a web client.

like image 143
suiwenfeng Avatar answered Dec 04 '25 13:12

suiwenfeng