Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How do I delete periods occurring alone in a CSV file?

I have a bunch of CSV files. In some of them, missing data are represented by empty cells, but in others there is a period. I want to loop over all my files, open them, delete any periods that occur alone, and then save and close the file.

I've read a bunch of other questions about doing whole-word-only searches using re.sub(). That is what I want to do (delete . when it occurs alone but not the . in 3.5), but I can't get the syntax right for a whole-word-only search where the whole word is a special character ('.'). Also, I'm worried those answers might be a little different in the case where a whole word can be distinguished by tab and newlines too. That is, does /b work in my CSV file case?

UPDATE: Here is a function I wound up writing after seeing the help below. Maybe it will be useful to someone else.

import csv, re

def clean(infile, outfile, chars):

''' 
Open a file, remove all specified special characters used to represent missing data, and save.\n\n
infile:\tAn input file path\n
outfile:\tAn output file path\n
chars:\tA list of strings representing missing values to get rid of
'''

in_temp = open(infile)
out_temp = open(outfile, 'wb')

csvin = csv.reader(in_temp)
csvout = csv.writer(out_temp)
for row in csvin:
    row = re.split('\t', row[0])
    for colno, col in enumerate(row):
        for char in chars:
            if col.strip() == char:
                row[colno] = ''
    csvout.writerow(row)

in_temp.close()
out_temp.close()
like image 798
David M Avatar asked Dec 31 '25 01:12

David M


2 Answers

Something like this should do the trick... This data wouldn't happen to be coming out of SAS would it - IIRC, that quite often used '.' as missing for numeric values.

import csv

with open('input.csv') as fin, open('output.csv', 'wb') as fout:
    csvin = csv.reader(fin)
    csvout = csv.writer(fout)
    for row in csvin:
        for colno, col in enumerate(row):
            if col.strip() == '.':
                row[colno] = ''
        csvout.writerow(row)
like image 114
Jon Clements Avatar answered Jan 02 '26 14:01

Jon Clements


Why not just use the csv module?

#!/usr/bin/env python

import csv

with open(somefile) as infile:
  r=csv.reader(infile)
  rows = []
  for row in csv:
    rows.append(['' if f == "." else f for f in row])
with open(newfile, 'w') as outfile:
  w=csv.writer(outfile)
  w.writelines(rows)
like image 21
kojiro Avatar answered Jan 02 '26 15:01

kojiro



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!