I would like to be able to assign a PRNG to a dataframe.
I can assign a unique ID using cat.codes
or ngroup()
import pandas as pd
import random
import string
df1 = pd.DataFrame({'Name': ['John', 'Susie', 'Jack', 'Jill', 'John']})
df1['id'] = df1.groupby('Name').ngroup()
df1['idz'] = df1['Name'].astype('category').cat.codes
Name id idz
0 John 2 2
1 Susie 3 3
2 Jack 0 0
3 Jill 1 1
4 John 2 2
and I've used a function from this post to create this unique ID row-by-row.
def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
return ''.join(random.SystemRandom().choice(chars) for _ in range(size))
df1['random id'] = df1['idz'].apply(lambda x : id_generator(3))
Name id idz random id
0 John 2 2 118 #<--- Check Here
1 Susie 3 3 KGZ
2 Jack 0 0 KMQ
3 Jill 1 1 T2L
4 John 2 2 Q3F #<--- Check Here
But how do I combine the two together so that John in this small use-case would recieve the same ID? I'd like to avoid a long if ID not used, then ID, and if name has ID, use existing ID
loop if possible due to size of data.
gourpby
+ transform
df1['random id'] = df1.groupby('idz').idz.transform(lambda x : id_generator(3))
df1
Out[657]:
Name id idz random id
0 John 2 2 35P
1 Susie 3 3 6UU
2 Jack 0 0 XGF
3 Jill 1 1 5LC
4 John 2 2 35P
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With