Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to manage a problem reading a csv that is a semicolon-separated file where some strings contain semi-colons?

Tags:

python

string

csv

The problem I have can be illustrated by showing a couple of sample rows in my csv (semicolon-separated) file, which look like this:

4;1;"COFFEE; COMPANY";4
3;2;SALVATION ARMY;4

Notice that in one row, a string is in quotation marks and has a semi-colon inside of it (none of the columns have quotations marks around them in my input file except for the ones containing semicolons).

These rows with the quotation marks and semicolons are causing a problem -- basically, my code is counting the semicolon inside quotation marks within the column/field. So when I read in this row, it reads this semicolon inside the string as a delimiter, thus making it seem like this row has an extra field/column.

The desired output would look like this, with no quotation marks around "coffee company" and no semicolon between 'coffee' and 'company':

4;1;COFFEE COMPANY;4
3;2;SALVATION ARMY;4

Actually, this column with "coffee company" is totally useless to me, so the final file could look like this too:

4;1;xxxxxxxxxxx;4
3;2;xxxxxxxxxxx;4

How can I get rid of just the semi-colons inside of this one particular column, but without getting rid of all of the other semi-colons?

like image 582
TJE Avatar asked Dec 06 '25 04:12

TJE


1 Answers

The csv module makes it relatively easy to handle a situation like this:

# Contents of input_file.csv
# 4;1;"COFFEE; COMPANY";4
# 3;2;SALVATION ARMY;4

import csv
input_file = 'input_file.csv'  # Contents as shown in your question.

with open(input_file, 'r', newline='') as inp:
    for row in csv.reader(inp, delimiter=';'):
        row[2] = row[2].replace(';', '')  # Remove embedded ';' chars.
        # If you don't care about what's in the column, use the following instead:
        # row[2] = 'xyz'  # Value not needed.
        print(';'.join(row))

Printed output:

4;1;COFFEE COMPANY;4
3;2;SALVATION ARMY;4

Follow-on question: How to write this data to a new csv file?

import csv
input_file = 'input_file.csv'  # Contents as shown in your question.
output_file = 'output_file.csv'

with open(input_file, 'r', newline='') as inp, \
     open(output_file, 'w', newline='') as outp:
    writer= csv.writer(outp, delimiter=';')
    for row in csv.reader(inp, delimiter=';'):
        row[2] = row[2].replace(';', '')  # Remove embedded ';' chars.
        writer.writerow(row)
like image 111
martineau Avatar answered Dec 08 '25 16:12

martineau



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!