I have a pandas dataframe as follows:
df = pd.DataFrame([ [1,2], [np.NaN,1], ['test string1', 5]], columns=['A','B'] )
df
              A  B
0             1  2
1           NaN  1
2  test string1  5
I am using pandas 0.20. What is the most efficient way to remove any rows where 'any' of its column values has length > 10?
len('test string1') 12
So for the above e.g., I am expecting an output as follows:
df
              A  B
0             1  2
1           NaN  1
                Using query() to Filter by Column Value in pandas DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame. If you wanted to update the existing DataFrame use inplace=True param.
Get Number of Rows in DataFrameYou can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.
If based on column A
In [865]: df[~(df.A.str.len() > 10)]
Out[865]:
     A  B
0    1  2
1  NaN  1
If based on all columns
In [866]: df[~df.applymap(lambda x: len(str(x)) > 10).any(axis=1)]
Out[866]:
     A  B
0    1  2
1  NaN  1
                        I had to cast to a string for Diego's answer to work:
df = df[df['A'].apply(lambda x: len(str(x)) <= 10)]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With