I'm trying to make a column in a dataframe depicting a group or bin that observation belongs to. The idea is to sort the dataframe according to some column, then develop another column denoting which bin that observation belongs to. If I want deciles, then I should be able to tell a function I want 10 equal (or close to equal) groups.
I tried the pandas qcut but that just gives a tuples of the the upper and lower limits of the bins. I would like just 1,2,3,4....etc. Take the following for example
import numpy as np
import pandas as pd
x = [1,2,3,4,5,6,7,8,5,45,64545,65,6456,564]
y = np.random.rand(len(x))
df_dict = {'x': x, 'y': y}
df = pd.DataFrame(df_dict)
This gives a df of 14 observations. How could I get groups of 5 equal bins?
The desired result would be the following:
x y group
0 1 0.926273 1
1 2 0.678101 1
2 3 0.636875 1
3 4 0.802590 2
4 5 0.494553 2
5 6 0.874876 2
6 7 0.607902 3
7 8 0.028737 3
8 5 0.493545 3
9 45 0.498140 4
10 64545 0.938377 4
11 65 0.613015 4
12 6456 0.288266 5
13 564 0.917817 5
Group by N rows, and find ngroup
df['group']=df.groupby(np.arange(len(df.index))//3,axis=0).ngroup()+1
x y group
0 1 0.548801 1
1 2 0.096620 1
2 3 0.713771 1
3 4 0.922987 2
4 5 0.283689 2
5 6 0.807755 2
6 7 0.592864 3
7 8 0.670315 3
8 5 0.034549 3
9 45 0.355274 4
10 64545 0.239373 4
11 65 0.156208 4
12 6456 0.419990 5
13 564 0.248278 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With