Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a file downloaded by selenium webdriver in python

I am using selenium with webdriver in python to download a csv file from a site . The file gets downloaded into the download directory specified. Here is an overview of my code

fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir",'xx/yy')
fp.set_preference('browser.helperApps.neverAsk.saveToDisk', "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream")
driver = webdriver.Firefox(fp)
driver.get('url')

I need to print the contents of this csv to the terminal . A lot of similar files with random names will be downloaded into the same folder so accessing the file via filename wont work as I don't know what it will be in advance

like image 800
Keegan Avatar asked Dec 19 '25 08:12

Keegan


2 Answers

You can get the last downloaded file from that location and then read the file:

path = /path to folder
list = os.listdir(path)
time_sorted_list = sorted(list, key=os.path.getmtime)
file_name = time_sorted_list[len(time_sorted_list)-1]

and then u can read from this file. Hoping not multiple files are getting there by parallel processes.

EDIT: Just saw comment that multiple instances are up for downloading, so other way around you can use urllib and download the file by using its url as:

import urllib
urllib.urlretrieve( "http://www.example.com/yourfile.ext", "your-file-name.ext") // you can provide unique-id to your file name
like image 125
Vivek Singh Avatar answered Dec 21 '25 21:12

Vivek Singh


This answer was formed from a combination of previous stack overflow questions , answers as well as comments in this post so thank you everyone.

I combined selenium webdriver and the python requests module for this solution . I essentially logged into the site using selenium, copied the cookies from the webdriver session and then used a requests.get(url,cookies = webdriver_cookies) to get the file.

Here's the gist of my solution

fp = webdriver.FirefoxProfile() 
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False) 
fp.set_preference("browser.download.dir",'xx/yy') 
fp.set_preference('browser.helperApps.neverAsk.saveToDisk', "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream") 
driver = webdriver.Firefox(fp)

# selenium login code ...

driver_cookies = driver.get_cookies()
cookies_copy = {}
for driver_cookie in driver_cookies:
    cookies_copy[driver_cookie["name"]] = driver_cookie["value"]
r = requests.get('url',cookies = cookies_copy)
print r.text

I hope that this helps someone

like image 29
Keegan Avatar answered Dec 21 '25 21:12

Keegan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!