Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Build a Dict of Counts based on Two Dataframe Columns

I have a dataframe that looks like this:

    start   stop
0   1       2
1   3       4
2   2       1
3   4       3

I'm trying to build a dictionary with key= (start, stop) pairs from my list of tuples and the value= count of their occurrence, regardless of the order. In other words, (1,2) and (2,1) would both count as an occurrence of the pair (1,2) in the list of tuples.

Desired output: dict_count= {('1','2'):2, ('3','4'):2}

Here's my attempt:

my_list=[('1','2'),('3','4')]

for pair in my_list:
    count=0
    if ((df[df['start']]==pair[0] and df[df['end']]==pair[1]) or (df[df['start']]==pair[1]) and df[df['end']]==pair[0])::
        count+=1
    dict_count[pair]=count

However, this gives me a KeyError: KeyError: "['1' ...] not in index"

like image 474
Caerus Avatar asked Nov 17 '25 16:11

Caerus


2 Answers

Use collections.Counter:

>>> from collections import Counter
>>> Counter(map(tuple, np.sort(df[['start','stop']], axis=1)))
{(1, 2): 2, (3, 4): 2}

This does not modify your original DataFrame.

like image 168
cs95 Avatar answered Nov 19 '25 05:11

cs95


Using values + sort then we do groupby

df.values.sort()
df
  start stop
0   '1'  '2'
1   '3'  '4'
2   '1'  '2'
3   '3'  '4'
df.groupby(df.columns.tolist()).size()
start  stop
'1'    '2'     2
'3'    '4'     2
dtype: int64

If you need dict

df.groupby(df.columns.tolist()).size().to_dict()
{("'1'", "'2'"): 2, ("'3'", "'4'"): 2}

Update

df['orther']=1
df[['start','stop']]=np.sort(df[['start','stop']].values)
df.groupby(['start','stop']).size().to_dict()
{("'1'", "'2'"): 2, ("'3'", "'4'"): 2}
like image 40
BENY Avatar answered Nov 19 '25 05:11

BENY