Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python open a csv document that have different types of separators

Tags:

python

pandas

csv

I have a txt document with the following structure:

1:0.84722,0.52855;0.65268,0.24792;0.66525,0.46562
2:0.84722,0.52855;0.65231,0.24513;0.66482,0.46548
3:0.84722,0.52855;0.65197,0.24387;0.66467,0.46537

The first number with the colon is the index, and I don't know how to indicate it when I open the file. Indeed I would like to erase it. Then data is separated with commas and semicolons and I would like to have each number in a different column, regardless of whether the separator is a comma or a semicolon. How could I do it?

like image 355
Cristina Dominguez Fernandez Avatar asked Dec 30 '25 16:12

Cristina Dominguez Fernandez


2 Answers

Use the following to load the csv using pd.read_csv:

import pandas as pd

df = pd.read_csv("data.csv",  # the file path, change it to your filename 
                 sep="[,;:]",  # the separator use a regular expression
                 engine="python",  # need this to use regular expression as sep
                 usecols=range(1, 7),  # use columns from [1, 7)
                 header=None  # no header
                 )
print(df)

Output

         1        2        3        4        5        6
0  0.84722  0.52855  0.65268  0.24792  0.66525  0.46562
1  0.84722  0.52855  0.65231  0.24513  0.66482  0.46548
2  0.84722  0.52855  0.65197  0.24387  0.66467  0.46537

Note
Once you load the file I advise to save it (using to_csv) as a proper csv file.

like image 138
Dani Mesejo Avatar answered Jan 02 '26 06:01

Dani Mesejo


As you are using pandas.read_csv already, simply have a look at its documentation for argument sep:

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'.

So in your case, simply calling pandas.read_csv(..., sep='[,;:]') should do the trick.

like image 38
ojdo Avatar answered Jan 02 '26 08:01

ojdo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!