Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I parse a .txt with a delimiter that has multiple characters into a pandas df?

I have a large dataset that I'd like to analyse in python with pandas. It's all contained in a .txt but the separator is +++$+++. How can I parse this?

import pandas as pd
df = pd.read_csv('filename.txt', sep='+++$+++', header=None)

These two lines bring up this error:

sre_constants.error: nothing to repeat
like image 350
foxyblue Avatar asked Sep 06 '25 03:09

foxyblue


1 Answers

that's because if the separator is longer than 1 char it's interpreted as a regular expression, as stated in http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html; so the + indicat "any number of matches of the before char", which there isn't, so there's "nothing to repeat".

i think escaping the symbols might work.

like image 54
odradek Avatar answered Sep 07 '25 16:09

odradek