Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading a file with a URL using python

I want to download the file in the following url using python. I tried with the following code but it seems like not working. I think the error is in the file format. I would be glad if you can suggest the modifications to the code or a new code that I can use for this purpose

Link to the website

https://www.gov.uk/government/statistics/transport-use-during-the-coronavirus-covid-19-pandemic

URL required to be downloaded

https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods

My Code

from urllib import request


response = request.urlopen("https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods")
csv = response.read()


csvstr = str(csv).strip("b'")

lines = csvstr.split("\\n")
f = open("historical.csv", "w")
for line in lines:
   f.write(line + "\n")
f.close()

Here basically I only want to download the file. I have heard that Beautifulsoup can be used for that but I don't have much experience on this. Any code that would serve my purpose is highly appreciated

Thanks

like image 457
python_coder_ Avatar asked Sep 20 '25 20:09

python_coder_


2 Answers

To download the file:

In [1]: import requests

In [2]: url = 'https://assets.publishing.service.gov.uk/government/uploads/syste
   ...: m/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.
   ...: ods'

In [3]: with open('COVID-19-transport-use-statistics.ods', 'wb') as out_file:
   ...:     content = requests.get(url, stream=True).content
   ...:     out_file.write(content)

And then you can use pandas-ods-reader to read the file by running:

pip install pandas-ods-reader

Then:

In [4]: from pandas_ods_reader import read_ods

In [5]: df = read_ods('COVID-19-transport-use-statistics.ods', 1)

In [6]: df
Out[6]: 
                   Department for Transport statistics  ...   unnamed.9
0    https://www.gov.uk/government/statistics/trans...  ...        None
1                                                 None  ...        None
2    Use of transport modes: Great Britain, since 1...  ...        None
3    Figures are percentages of an equivalent day o...  ...        None
4                                                 None  ...  Percentage
..                                                 ...  ...         ...
390                  Transport for London Tube and Bus  ...        None
391                               Buses (excl. London)  ...        None
392                                           Cycling   ...        None
393                                  Any other queries  ...        None
394                                    Media enquiries  ...        None

And you can save it as a csv if that is what you want using df.to_csv('my_data.csv', index=False)

like image 52
watch-this Avatar answered Sep 22 '25 11:09

watch-this


I see that you are just trying to download the file that is .ods format and I think saving it in .csv wont convert it into a csv file.

Following code would help you download the file. I have used requests library which is a better option in place of urllib.

import requests

file_url = "https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods"


file_data = requests.get(file_url).content
# create the file in write binary mode, because the data we get from net is in binary
with open("historical.ods", "wb") as file:
    file.write(file_data)

Output file can be viewed in MS Excel.

enter image description here

like image 36
gsb22 Avatar answered Sep 22 '25 10:09

gsb22