Pandas: Return count of matching values between two DataFrame variables

Question

This is a simple thing but I don't think it's been covered on SO or in the Pandas documentation.

The data

Two DataFrames containing data about cities. The DataFrame UK contains a variable for UK city names. It has been grouped so there are no duplicate city names. This is a truncated example of this city variable, (not the full DataFrame):

city
Hamilton
Edinburgh
Bury
...

The DataFrame US contains a corresponding variable for US city names. It does have duplicate city names, but these are not actually duplicate values as there are numerous cities with the same name. Again, a truncated example:

city
Hamilton
Hamilton
Edinburgh
Edinburgh
Edinburgh
Bury
Bury 
...

(Note: The variables aren't the same length in my data.)

The aim

Create a new variable in the UK DataFrame which is an integer count for each time a city name in the US city variable matches a city name in the UK city variable.

Below shows an example of the output I'm aiming for. Say for one UK city called Hamilton there are 2 cities called Hamilton in the US. For one UK city called Edinburgh there are 3 matches in the US. And 2 matches for Bury.

city       count
Hamilton       2
Edinburgh      3
Bury           2

Where I'm stuck

If I do this...

uk['count'] = uk['city'].isin(us['city']).astype(int)

the new variable is a binary 1 or 0 indicating that there is a match, which is halfway there. However I'm struggling with the Pandas syntax to return the count of matches. I've tried appending value_counts and variations of unique but these didn't work. Maybe I'm approaching this the wrong way.

unutbu · Accepted Answer

Use Series.value_counts to count the number of occurrences for each city in US['city'], and then use Series.map to apply those counts to corresponding values in UK['city']:

In [40]: US['city'].value_counts()
Out[40]: 
Edinburgh    3
Bury         2
Hamilton     2
Name: city, dtype: int64

In [41]: UK['count'] = UK['city'].map(US['city'].value_counts())

In [42]: UK
Out[42]: 
        city  count
0   Hamilton      2
1  Edinburgh      3
2       Bury      2

If there is a city in UK['city'] which is not in US['city'] then the above will assign a NaN value to the count column for that city.

Pandas: Return count of matching values between two DataFrame variables

Tags:

python

pandas

RDJ

1 Answers

unutbu

Recent Activity

Donate For Us

Pandas: Return count of matching values between two DataFrame variables

Tags:

python

pandas

RDJ

1 Answers

unutbu

Related questions

Recent Activity

Donate For Us