I have a dataframe which is like this:
df = pd.DataFrame({'pop1': ['1000', '2000','3000','4000'],
'pop2': ['2000', '3000','2000','2000']},
index=['female','female','male','male'])
How can I create a 2*2 DataFrame that gives the percent of the total population in a given year (the 2 columns) for a given gender (the two rows)?
You need first cast string to int by astype, then groupby with aggregating sum and divide by div by sum. Last multiple 100:
df = df.astype(int)
a = df.groupby(level=0).sum()
print (a)
pop1 pop2
female 3000 5000
male 7000 4000
b = df.sum()
print (b)
pop1 10000
pop2 9000
dtype: int64
print (a.div(b).mul(100))
pop1 pop2
female 30.0 55.555556
male 70.0 44.444444
It is same as:
df = df.astype(int)
print (df.groupby(level=0).sum().div(df.sum()).mul(100))
pop1 pop2
female 30.0 55.555556
male 70.0 44.444444
Here is a one liner:
(df.astype(int) / df.astype(int).sum()).groupby(level=0).sum() * 100
It is a little prettier if you are already dealing with integers:
df = df.astype(int)
(df / df.sum()).groupby(level=0).sum() * 100
Put into words, after you convert the data into integers, you then divide each number by the total size of the relevant population, sum up those weights for each gender, and then multiply by 100 so the result looks like a percentage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With