Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data analysis with pandas

The following df is a summary of my hole dataset just to illustrate my problem. The df shows the job application of each id and i want to know which combination of sector is more likely for an individual to apply?

df
id    education   area_job_application
 1      Collage           Construction 
 1      Collage                  Sales
 1      Collage         Administration
 2   University                Finance
 2   University                  Sales
 3      Collage                Finance
 3      Collage                  Sales
 4   University         Administration   
 4   University                  Sales
 4   University           Data analyst
 5   University         Administration
 5   University                  Sales

answer

              Construction    Sales    Administration   Finance   Data analyst
Contruction              1        1                 1         0             0
Sales                    1        5                 3         1             1           
Administration           1        3                 3         0             1
Finance                  0        2                 0         2             0
Data analyst             0        1                 1         0             1

This answer shows that administration and sales are the sector that more chances have to receive a postulation by the same id (this is the answer which i am looking). But i am also interesting for other combinations, i think that a mapheat will be very informative to illustrate this data.

Sector combination from the same sector are irrelevant (maybe in the diagonal from the answer matrix should be a 0, doesnt matter the value, i wont anaylse).

like image 325
Lucas Dresl Avatar asked Nov 28 '25 04:11

Lucas Dresl


1 Answers

Use crosstab or groupby with size and unstack first and then DataFrame.dot by transpose DataFrame and last add reindex for custom order of index and columns:

#dynamic create order by unique values of column
L = df['area_job_application'].unique()

#df = pd.crosstab(df.id, df.area_job_application)
df = df.groupby(['id', 'area_job_application']).size().unstack(fill_value=0)
df = df.T.dot(df).rename_axis(None).rename_axis(None, axis=1).reindex(columns=L, index=L)
print (df)
                Construction  Sales  Administration  Finance  Data analyst
Construction               1      1               1        0             0
Sales                      1      5               3        2             1
Administration             1      3               3        0             1
Finance                    0      2               0        2             0
Data analyst               0      1               1        0             1
like image 104
jezrael Avatar answered Nov 30 '25 20:11

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!