I have a dataframe with 5 million rows. Let's say the dataframe looked like below:
>>> df = pd.DataFrame(data={"Random": "86 7639103627 96 32 1469476501".split()})
>>> df
Random
0 86
1 7639103627
2 96
3 32
4 1469476501
Note that the Random column is stored as a string.
If the number in column Random has fewer than 9 digits, I want to add leading zeros to make it 9 digits. If the number has 9 or more digits, I want to add leading zeros to make it 20 digits.
what I have done is this:
for i in range(0,len(df['Random'])):
if len(df['Random'][i]) < 9:
df['Random'][i]=df['Random'][i].zfill(9)
else:
df['Random'][i]=df['Random'][i].zfill(20)
Since the number of rows is over 5 million, this process takes a lot of time! (performance was 5it/sec. Tested using tqdm, estimated time of completion was in days!).
Is there an easier and faster way of performing this task?
Let us do np.where combine with zfill, alternative you can check with str.pad
df.Random=np.where(df.Random.str.len()<9,df.Random.str.zfill(9),df.Random.str.zfill(20))
df
Out[9]:
Random
0 000000086
1 00000000007639103627
2 000000096
3 000000032
4 00000000001469476501
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With