Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Not able to write output to csv bs4 python

I am trying to write the output of tha following code to a csv file. Data is getting overwritten. So at last I am able to see only last data that was scraped from the website in the output file.

from bs4 import BeautifulSoup
import urllib2
import csv
import re
import requests
for i in xrange(3179,7000):
    try:
        page = urllib2.urlopen("http://bvet.bytix.com/plus/trainer/default.aspx?id={}".format(i))
    except:
        continue
    else:
        soup = BeautifulSoup(page.read())
        for eachuniversity in soup.findAll('fieldset',{'id':'ctl00_step2'}):
            data = i, re.sub(r'\s+',' ',''.join(eachuniversity.findAll(text=True)).encode('utf-8')),'\n'
    print data  
    myfile = open("ttt.csv", 'wb')
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    wr.writerow(data)

I am new to this. I do not know where I am wrong.

UPDATE

from bs4 import BeautifulSoup
import urllib2
import csv
import re
import requests
with open("BBB.csv", 'wb') as myfile:
    writer = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    for i in xrange(3179,7000):
        try:
            page = urllib2.urlopen("http://bvet.bytix.com/plus/trainer/default.aspx?id={}".format(i))
        except Exception:
            continue
        else:
            soup = BeautifulSoup(page.read())
            for eachuniversity in soup.findAll('fieldset',{'id':'ctl00_step2'}):
                data = [i] + [re.sub('\s+', '', text).encode('utf8') for text in eachuniversity.find_all(text=True) if text.strip()]
                writer.writerow(data)
like image 291
Venkateshwaran Selvaraj Avatar asked Jun 01 '26 18:06

Venkateshwaran Selvaraj


1 Answers

Open the file once, before looping, and add data to it in the loop as you find it:

with open("ttt.csv", 'wb') as myfile:
    writer = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    for i in xrange(3179,7000):
        try:
            page = urllib2.urlopen("http://bvet.bytix.com/plus/trainer/default.aspx?id={}".format(i))
        except urllib2.HTTPError:
            continue
        else:
            soup = BeautifulSoup(page.read(), from_encoding=page.info().getparam('charset'))
            for eachuniversity in soup.findAll('fieldset',{'id':'ctl00_step2'}):
                data = [i] + [re.sub('\s+', ' ', text).strip().encode('utf8') for text in eachuniversity.find_all(text=True) if text.strip()]
                writer.writerow(data)

Every time you open a file in w write mode, it is truncated. Any previous data is removed to make place for the new data you are about to write. The trick then is to open the file only once, at the start, and keep it open to write everything you need to while keeping it open.

The with statement here closes the file for you when the outer for loop is done.

The from_encoding passes any character set given by the server headers to BeautifulSoup so it won't have to guess as hard; for the given URL BeautifulSoup is actually guessing wrong if you don't add that keyword parameter.

I removed the newline you were adding to each row; the csv.writer() class takes care of newlines for you. I also changed the blanket except: to except urllib2.HTTPError: to catch just the exceptions that are actually being thrown here, not everything.

The text is cleaned up to write each text entry to separate columns.

This produces output like this:

"3179","Neue Suche starten","MoHaJa - die Schule für Hunde und Halter","Stähli Monika","Meisenweg 1","3303 Jegenstorf","Routenplaner","Ausbildung/Anerkennung: Triple-S Ausbildungszentrum für Mensch und Hund","Sprache: Deutsch","Tel.: +41 31 761 14 33","Handy: +41 79 760 41 69","[email protected]","www.hundeschule-mohaja.ch"
"3180","Neue Suche starten","Dogs Nature","Fernandez Salome-Nicole","Dorzematte 30","3313 Büren zum Hof","Routenplaner","Ausbildung/Anerkennung: Triple-S Ausbildungszentrum für Mensch und Hund","Sprache: Deutsch","Tel.: 079 658 71 71","[email protected]","www.dogsnature.ch"
"3181","Neue Suche starten","Gynny-Dog","Speiser Franziska","Wirtsgartenweg 27","4123 Allschwil","Routenplaner","Ausbildung/Anerkennung: Triple-S Ausbildungszentrum für Mensch und Hund","Sprache: Deutsch","Handy: 076 517 20 94","[email protected]","www.gynny-dog.ch"
"3183","Neue Suche starten","keep-natural","Mory Sandra","Beim Werkhof","4434 Hölstein","Routenplaner","Ausbildung/Anerkennung: Triple-S Ausbildungszentrum für Mensch und Hund","Sprache: Deutsch","Tel.: 079 296 00 65","[email protected]","www.keep-natural.ch"
"3184","Neue Suche starten","Küng Silvia","Eptingerstrasse 41","4448 Läufelfingen","Routenplaner","Ausbildung/Anerkennung: Triple-S Ausbildungszentrum für Mensch und Hund","Sprache: Deutsch","Tel.: 061 981 38 04","Handy: 079 415 83 57","[email protected]","www.different-dogs.ch"
like image 160
Martijn Pieters Avatar answered Jun 03 '26 07:06

Martijn Pieters