Pandas: find common values across columns

Question

I have the following dataframe:

df = pd.DataFrame({'TX':['bob','tim','frank'],'IL':['fred','bob','tim'],'NE':['tim','joe','bob']})

I would like to isolate the strings that occur across all columns to generate a list. The expected result is:

output = ['tim','bob']

The only way I can think to achieve this is using for loops which I would like to avoid. Is there a built-in pandas function suited to accomplishing this?

jezrael · Accepted Answer

You can create mask for count values per columns and test if not missing values per rows by DataFrame.all:

m = df.apply(pd.value_counts).notna()
print (m)
          TX     IL     NE
bob     True   True   True
frank   True  False  False
fred   False   True  False
joe    False  False   True
tim     True   True   True

L = m.index[m.all(axis=1)].tolist()
print (L)
['bob', 'tim']

Jaroslav Bezděk · Answer

You can achieve this by pandas.DataFrame.apply() and set.intersection(), like this:

cols_set = list(df.apply(lambda col: set(col.values)).values)
output = list(set.intersection(*cols_set))

The result is following:

>>> print(output)
['tim', 'bob']

Pandas: find common values across columns

Tags:

python

pandas

dataframe

g_ret3

2 Answers

jezrael

Jaroslav Bezděk

Recent Activity

Donate For Us

Pandas: find common values across columns

Tags:

python

pandas

dataframe

g_ret3

2 Answers

jezrael

Jaroslav Bezděk

Related questions

Recent Activity

Donate For Us