I have a dataset of random words and names and I am trying to group all of the similar words and names. So given the dataframe below:
Name ID Value
0 James 1 10
1 James 2 2 142
2 Bike 3 1
3 Bicycle 4 1197
4 James Marsh 5 12
5 Ants 6 54
6 Job 7 6
7 Michael 8 80007
8 Arm 9 47
9 Mike K 10 9
10 Michael k 11 1
My pseudo code would be something like:
import pandas as pd
from fuzzywuzzy import fuzz
minratio = 95
for idx1, name1 in df['Name'].iteritems():
for idx2, name2 in df['Name'].iteritems():
ratio = fuzz.WRatio(name1, name2)
if ratio > minratio:
grouped = df.groupby(['Name', 'ID'])['Value']\
.agg(Total_Value='sum', Group_Size='count')
This would then give me the desired output:
print(grouped)
Name ID Total_Value Group_Size
0 James 1 164 3 # All James' grouped
2 Bike 3 1198 2 # Bike's and Bicycles grouped
5 Ants 6 54 1
6 Job 7 6 1
7 Michael 8 80017 3 # Mike's and Michael's grouped
8 Arm 9 47 1
Obviously this doesn't work, and honestly, I am not sure if this is even possible, but this is what I'm trying to accomplish. Any advice that could get me on the right track would be useful.
Using affinity propagation clustering (not perfect but maybe a starting point):
import pandas as pd
import numpy as np
import io
from fuzzywuzzy import fuzz
from scipy import spatial
import sklearn.cluster
s="""Name ID Value
0 James 1 10
1 James 2 2 142
2 Bike 3 1
3 Bicycle 4 1197
4 James Marsh 5 12
5 Ants 6 54
6 Job 7 6
7 Michael 8 80007
8 Arm 9 47
9 Mike K 10 9
10 Michael k 11 1"""
df = pd.read_csv(io.StringIO(s),sep='\s\s+',engine='python')
names = df.Name.values
sim = spatial.distance.pdist(names.reshape((-1,1)), lambda x,y: fuzz.WRatio(x,y))
affprop = sklearn.cluster.AffinityPropagation(affinity="precomputed", random_state=None)
affprop.fit(spatial.distance.squareform(sim))
res = df.groupby(affprop.labels_).agg(
Names=('Name',','.join),
First_ID=('ID','first'),
Total_Value=('Value','sum'),
Group_Size=('Value','count')
)
Result
Names First_ID Total_Value Group_Size
0 James,James 2,James Marsh,Ants,Arm 1 265 5
1 Bike,Bicycle 3 1198 2
2 Job 7 6 1
3 Michael,Mike K,Michael k 8 80017 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With