Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: groupby index and then fill the dataframe using function

I have a dataframe which is like this:

df = pd.DataFrame({'pop1': ['1000', '2000','3000','4000'],
                   'pop2': ['2000', '3000','2000','2000']},
                  index=['female','female','male','male'])

How can I create a 2*2 DataFrame that gives the percent of the total population in a given year (the 2 columns) for a given gender (the two rows)?

like image 951
Candice Zhang Avatar asked Dec 06 '25 19:12

Candice Zhang


2 Answers

You need first cast string to int by astype, then groupby with aggregating sum and divide by div by sum. Last multiple 100:

df = df.astype(int)

a = df.groupby(level=0).sum()
print (a)
        pop1  pop2
female  3000  5000
male    7000  4000

b = df.sum()
print (b)
pop1    10000
pop2     9000
dtype: int64

print (a.div(b).mul(100))
        pop1       pop2
female  30.0  55.555556
male    70.0  44.444444

It is same as:

df = df.astype(int)
print (df.groupby(level=0).sum().div(df.sum()).mul(100))
        pop1       pop2
female  30.0  55.555556
male    70.0  44.444444
like image 64
jezrael Avatar answered Dec 08 '25 13:12

jezrael


Here is a one liner:

(df.astype(int) / df.astype(int).sum()).groupby(level=0).sum() * 100

It is a little prettier if you are already dealing with integers:

df = df.astype(int)
(df / df.sum()).groupby(level=0).sum() * 100

Put into words, after you convert the data into integers, you then divide each number by the total size of the relevant population, sum up those weights for each gender, and then multiply by 100 so the result looks like a percentage.

like image 29
Karnage Avatar answered Dec 08 '25 15:12

Karnage



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!