I have a dataframe that looks like this:
id       status      year 
1        yes         2014
3        no          2013
2        yes         2014
4        no          2014
The actual dataframe is very large with multiple ids and years. I am trying to make a new dataframe that has the percents of 'yes's and 'no's grouped by year.
I was thinking of grouping the dataframe by the year, which would then put the statuses per year in a list and then analyzing the counts of yes's and no's that way, but I was wondering whether there is a more pythonic way to do this?
I would like for the end dataframe to look like this:
year      yes_count     no_count     ratio_yes_to_toal    
2013       0             1             0%
2014       2             1             67%
Basically, the pivot_table() function is a generalization of the pivot() function that allows aggregation of values — for example, through the len() function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation. Here's an example.
What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.
Pivot table in pandas is an excellent tool to summarize one or more numeric variable based on two other categorical variables. Pivot tables in pandas are popularly seen in MS Excel files. In python, Pivot tables of pandas dataframes can be created using the command: pandas. pivot_table .
You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.
I'd suggest grouping by year and status, counting, pivoting, and then creating an additional column of the ratio:
df2 = df.groupby(['year', 'status']).count().pivot_table(index="year", columns=["status"]).fillna(0)
df2.columns = df2.columns.get_level_values(1)
df2['ratio'] = df2['yes'] / (df2['yes'] + df2['no'])
Output
status   no  yes     ratio
year                      
2013    1.0  0.0  0.000000
2014    1.0  2.0  0.666667
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With