I have the following dataframe:
df = pd.DataFrame({'TX':['bob','tim','frank'],'IL':['fred','bob','tim'],'NE':['tim','joe','bob']})
I would like to isolate the strings that occur across all columns to generate a list. The expected result is:
output = ['tim','bob']
The only way I can think to achieve this is using for loops which I would like to avoid. Is there a built-in pandas function suited to accomplishing this?
You can create mask for count values per columns and test if not missing values per rows by DataFrame.all:
m = df.apply(pd.value_counts).notna()
print (m)
TX IL NE
bob True True True
frank True False False
fred False True False
joe False False True
tim True True True
L = m.index[m.all(axis=1)].tolist()
print (L)
['bob', 'tim']
You can achieve this by pandas.DataFrame.apply() and set.intersection(), like this:
cols_set = list(df.apply(lambda col: set(col.values)).values)
output = list(set.intersection(*cols_set))
The result is following:
>>> print(output)
['tim', 'bob']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With