Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: find common values across columns

I have the following dataframe:

df = pd.DataFrame({'TX':['bob','tim','frank'],'IL':['fred','bob','tim'],'NE':['tim','joe','bob']})

I would like to isolate the strings that occur across all columns to generate a list. The expected result is:

output = ['tim','bob']

The only way I can think to achieve this is using for loops which I would like to avoid. Is there a built-in pandas function suited to accomplishing this?

like image 819
g_ret3 Avatar asked Nov 23 '25 14:11

g_ret3


2 Answers

You can create mask for count values per columns and test if not missing values per rows by DataFrame.all:

m = df.apply(pd.value_counts).notna()
print (m)
          TX     IL     NE
bob     True   True   True
frank   True  False  False
fred   False   True  False
joe    False  False   True
tim     True   True   True

L = m.index[m.all(axis=1)].tolist()
print (L)
['bob', 'tim']
like image 95
jezrael Avatar answered Nov 25 '25 02:11

jezrael


You can achieve this by pandas.DataFrame.apply() and set.intersection(), like this:

cols_set = list(df.apply(lambda col: set(col.values)).values)
output = list(set.intersection(*cols_set))

The result is following:

>>> print(output)
['tim', 'bob']
like image 37
Jaroslav Bezděk Avatar answered Nov 25 '25 03:11

Jaroslav Bezděk