Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas parse issue with missing zeros after thousands seperator in csv-file

Given a csv file with the following content

actual; shouldbe
1,200;  1200
1,2;    1200
12;     12

I want to read in the content in a way that both columns have equal values. The problem is that there are no trailing zeros after the thousands seperator

df = pd.read_csv(file, sep=';', thousands=',')

leads to

    actual  shouldbe
0   1200    1200
1   12  1200
2   12  12

I hope the problem is clear. I don't have an idea how to clean my data, either in pandas or any other python and non-python tool.

like image 912
Corvince Avatar asked Dec 22 '25 23:12

Corvince


1 Answers

I'm not sure it could be done without some data cleaning after loading:

>>> s=u"""actual;shouldbe
... 1,200;1200
... 1,001,21;  1001210
... 1,2;   1200
... 12;   12"""
>>> df = pd.read_csv(StringIO(s), sep=";")
>>> df['result'] = df.actual.apply(lambda x: ''.join(k if i==0 else k.ljust(3, '0') for i,k in enumerate(x.split(','))))
>>> df
     actual  shouldbe   result
0     1,200      1200     1200
1  1,001,21   1001210  1001210
2       1,2      1200     1200
3        12        12       12
like image 66
Roman Pekar Avatar answered Dec 24 '25 13:12

Roman Pekar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!