Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas vectorized operation to get the length of string [duplicate]

I have a pandas dataframe.

df = pd.DataFrame(['Donald Dump','Make America Great Again!','Donald Shrimp'],
                   columns=['text'])

What I like to have is another column in Dataframe which has the length of the strings in the 'text' column.

For above example, it would be

                        text  text_length
0                Donald Dump           11
1  Make America Great Again!           25
2              Donald Shrimp           13

I know I can loop through it and get the length but is there any way to vectorize this operation? I have few million rows.

like image 490
aerin Avatar asked Jan 25 '26 10:01

aerin


2 Answers

Use str.len:

print (df.text.str.len())                   
0    11
1    25
2    13
Name: text, dtype: int64

Sample:

import pandas as pd

df = pd.DataFrame(['Donald Dump','Make America Great Again!','Donald Shrimp'],
                   columns=['text'])
print (df)
                        text
0                Donald Dump
1  Make America Great Again!
2              Donald Shrimp

df['text_length'] = (df.text.str.len())                   
print (df)
                        text  text_length
0                Donald Dump           11
1  Make America Great Again!           25
2              Donald Shrimp           13
like image 65
jezrael Avatar answered Jan 26 '26 23:01

jezrael


I think the easiest way is to use the apply method of the DataFrame. With this method you can manipulate the data any way you want.

You could do something like:

df['text_ength'] = df['text'].apply(len)

to create a new column with the data you want.


Edit After seeing @jezrael answer I was curious and decided to timeit. I created a DataFrame full with lorem ipsum sentences (101000 rows) and the difference is quite small. For me I got:

In [59]: %timeit df['text_length'] = (df.text.str.len())
10 loops, best of 3: 20.6 ms per loop

In [60]: %timeit df['text_length'] = df['text'].apply(len)
100 loops, best of 3: 17.6 ms per loop
like image 35
pekapa Avatar answered Jan 27 '26 01:01

pekapa



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!