Pandas: How to find percentage of group members type per subgroup?

Tags:

(Data sample and attempts at the end of the question)

With a dataframe such as this:

    Type    Class   Area    Decision
0   A       1       North   Yes
1   B       1       North   Yes
2   C       2       South   No
3   A       3       South   No
4   B       3       South   No
5   C       1       South   No
6   A       2       North   Yes
7   B       3       South   Yes
8   B       1       North   No
9   C       1       East    No
10  C       2       West    Yes

How can I find what percentage of each type [A, B, C, D] that belongs to each area [North, South, East, West]?

Desired output:

    North   South   East    West
A   0.66    0.33    0       0
B   0.5     0.5     0       0
C   0       0.5     0.25    0.25

My best attempt so far is:

df_attempt1= df.groupby(['Area', 'Type'])['Type'].aggregate('count').unstack().T

Which returns:

Area  East  North  South  West
Type                          
A      NaN    2.0    1.0   NaN
B      NaN    2.0    2.0   NaN
C      1.0    NaN    2.0   1.0

And I guess I can build on that by calculating sums in the margins and appending 0 for missing observations, but I'd really appreciate suggestions for more elegant approaches.

Thank you for any suggestions!

Code:

import pandas as pd

df = pd.DataFrame(
    {
        "Type": {0: "A", 1: "B", 2: "C", 3: "A", 4: "B", 5: "C", 6: "A", 7: "B", 8: "B", 9: "C", 10: "C"},
        "Class": {0: 1, 1: 1, 2: 2, 3: 3, 4: 3, 5: 1, 6: 2, 7: 3, 8: 1, 9: 1, 10: 2},
        "Area": {0: "North", 1: "North", 2: "South", 3: "South", 4: "South", 5: "South", 6: "North", 7: "South", 8: "North", 9: "East", 10: "West"},
        "Decision": {0: "Yes", 1: "Yes", 2: "No", 3: "No", 4: "No", 5: "No", 6: "Yes", 7: "Yes", 8: "No", 9: "No", 10: "Yes"},
    }
)

dfg = df[['Area', 'Type']].groupby(['Area']).agg('count').unstack()

df_attempt1 = df.groupby(['Area', 'Type'])['Type'].aggregate('count').unstack().T

281

asked Jan 28 '20 10:01

vestland

2 Answers

You can use the function crosstab:

pd.crosstab(index=df['Type'], columns=df['Area'], normalize='index')

Output:

Area  East     North     South  West
Type                                
A     0.00  0.666667  0.333333  0.00
B     0.00  0.500000  0.500000  0.00
C     0.25  0.000000  0.500000  0.25

120

answered Sep 16 '22 15:09

Mykola Zotko

You were quite close already. The following should do the trick:

df.groupby('Type')['Area'].value_counts(normalize = True).unstack(fill_value=0)

Output:

Area    East    North       South       West
Type                
A       0.00    0.666667    0.333333    0.00
B       0.00    0.500000    0.500000    0.00
C       0.25    0.000000    0.500000    0.25

If order matters, you can reorder the dataframe manipulating it's columns attribute

answered Sep 17 '22 15:09

Lukas Thaler

Related questions
                            
                                How to install the os module on Windows? [duplicate]
                            
                                How to save pandas DataFrame's rows as JSON strings?
                            
                                Get JSON from website (instagram)
                            
                                How to suppress warnings about lack of cert verification in a requests HTTPS call?
                            
                                How to automatically save changes before running a Python script in VS Code
                            
                                Optimizing string replace in python
                            
                                How do I plot two pandas DataFrames in one graph with the same colors but different line styles?
                            
                                django queryset select_related . values() rename key
                            
                                Prophet / fbprophet package in Python
                            
                                Does sklearn LogisticRegressionCV use all data for final model
                            
                                Filtering after a join in Flask-SQLAlchemy
                            
                                Pandas date_range with only hours, minutes and seconds
                            
                                I want to create django popup form in my project [closed]
                            
                                Support Vector Regression: TypeError: must be real number, not str
                            
                                Create an array of dictionaries from a Python list
                            
                                Is there a `tensor` operation or function in Pytorch that works like cv2.dilate in OpenCV?
                            
                                One class classification using Keras and Python
                            
                                Parse prettyprinted tabular data with pandas
                            
                                How to specify Schema in psycopg2 connection method?
                            
                                How to clean a tox environment after running?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: How to find percentage of group members type per subgroup?

Tags:

python

pandas

dataframe

group-by

vestland

People also ask

2 Answers

Mykola Zotko

Lukas Thaler

Recent Activity

Donate For Us