i'm a total noobie, i'm just starting with web scraping as a hobby.
I want to scrape data from forum (total numer of post, total numer of subjects and numer of all users) from https://www.fly4free.pl/forum/
photo of which data I want to scrape
Watching some turotirals i've came to this code:
from bs4 import BeautifulSoup
import requests
import datetime
import csv
source = requests.get('https://www.fly4free.pl/forum/').text
soup = BeautifulSoup(source, 'lxml')
csv_file = open('4fly_forum.csv', 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Data i godzina', 'Wszytskich postów', 'Wszytskich tematów', 'Wszytskich użytkowników'])
czas = datetime.datetime.now()
czas = czas.strftime("%Y-%m-%d %H:%M:%S")
print(czas)
dane = soup.find('p', class_='genmed')
posty = dane.find_all('strong')[0].text
print(posty)
tematy = dane.find_all('strong')[1].text
print(tematy)
user = dane.find_all('strong')[2].text
print(user)
print()
csv_writer.writerow([czas, posty, tematy, user])
csv_file.close()
I don't know how to make it run once a day and how to add data to the file once a day. Sorry if my questions are infantile for you pros ;), it's my first training assignment.
Also my reasult csv file looks not nice, i would like that the data will nice formated into columns
Any help and insight will be much appreciated. thx in advance Dejvciu
You can use the Schedule library in Python to do this. First install it using
pip install schedule
Then you can modify your code to run at intervals of your choice
import schedule
import time
def scrape():
# your web scraping code here
print('web scraping')
schedule.every().day.at("10:30").do(scrape) # change 10:30 to time of your choice
while True:
schedule.run_pending()
time.sleep(1)
This will run the web scraping script every day at 10:30 and you can easily host it for free to make it run continually.
Here's how you would save the results to a csv in a nice formatted way with filednames (czas, tematy, posty and user) as column names.
import csv
from os import path
# this will avoid appending the headers (fieldnames or column names) everytime the script runs. Headers will be written to csv only once
file_status = path.isfile('filename.csv')
with open('filename.csv', 'a+', newline='') as csvfile:
fieldnames = ['czas', 'posty', 'tematy', 'user']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
if not file_status:
writer.writeheader()
writer.writerow({'czas': czas, 'posty': posty, 'tematy': tematy, 'user': user})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With