The following df is a summary of my hole dataset just to illustrate my problem.
The df shows the job application of each id and i want to know which combination of sector is more likely for an individual to apply?
df
id education area_job_application
1 Collage Construction
1 Collage Sales
1 Collage Administration
2 University Finance
2 University Sales
3 Collage Finance
3 Collage Sales
4 University Administration
4 University Sales
4 University Data analyst
5 University Administration
5 University Sales
answer
Construction Sales Administration Finance Data analyst
Contruction 1 1 1 0 0
Sales 1 5 3 1 1
Administration 1 3 3 0 1
Finance 0 2 0 2 0
Data analyst 0 1 1 0 1
This answer shows that administration and sales are the sector that more chances have to receive a postulation by the same id (this is the answer which i am looking). But i am also interesting for other combinations, i think that a mapheat will be very informative to illustrate this data.
Sector combination from the same sector are irrelevant (maybe in the diagonal from the answer matrix should be a 0, doesnt matter the value, i wont anaylse).
Use crosstab or groupby with size and unstack first and then DataFrame.dot by transpose DataFrame and last add reindex for custom order of index and columns:
#dynamic create order by unique values of column
L = df['area_job_application'].unique()
#df = pd.crosstab(df.id, df.area_job_application)
df = df.groupby(['id', 'area_job_application']).size().unstack(fill_value=0)
df = df.T.dot(df).rename_axis(None).rename_axis(None, axis=1).reindex(columns=L, index=L)
print (df)
Construction Sales Administration Finance Data analyst
Construction 1 1 1 0 0
Sales 1 5 3 2 1
Administration 1 3 3 0 1
Finance 0 2 0 2 0
Data analyst 0 1 1 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With