I want to make a random sample selection in python from the following df such that at least 65% of the resulting sample should have color yellow and cumulative sum of the quantities selected to be less than or equals to 18.
Original Dataset:
Date        Id      color       qty
02-03-2018  A       red         5
03-03-2018  B       blue        2
03-03-2018  C       green       3
04-03-2018  D       yellow      4
04-03-2018  E       yellow      7
04-03-2018  G       yellow      6
04-03-2018  H       orange      8
05-03-2018  I       yellow      1
06-03-2018  J       yellow      5
I have got total qty. selected condition covered but stuck on how to move forward with integrating the % condition:
df2 = df1.sample(n=df1.shape[0])
df3= df2[df2.qty.cumsum() <= 18]
Required dataset:
Date        Id      color       qty
03-03-2018  B       blue        2
04-03-2018  D       yellow      4
04-03-2018  G       yellow      6
06-03-2018  J       yellow      5
Or something like this:
Date        Id      color       qty
02-03-2018  A       red         5
04-03-2018  D       yellow      4
04-03-2018  E       yellow      7
05-03-2018  I       yellow      1
Any help would be really appreciated!
Thanks in advance.
Filter rows with 'yellow' and select a random sample of at least 65% of your total sample size
import random
yellow_size = float(random.randint(65,100)) / 100
df_yellow = df3[df3['color'] == 'yellow'].sample(yellow_size*sample_size)
Filter rows with other colors and select a random sample for the remaining of your sample size.
others_size = 1 - yellow_size
df_others = df3[df3['color'] != 'yellow].sample(others_size*sample_size)
Combine them both and shuffle the rows.
df_sample = pd.concat([df_yellow, df_others]).sample(frac=1)
UPDATE:
If you want to check for both conditions simultaneously, this could be one way to do it:
import random
df_sample = df
while sum(df_sample['qty']) > 18:
    yellow_size = float(random.randint(65,100)) / 100
    df_yellow = df[df['color'] == 'yellow'].sample(yellow_size*sample_size)
    others_size = 1 - yellow_size
    df_others = df[df['color'] != 'yellow'].sample(others_size*sample_size)
    df_sample = pd.concat([df_yellow, df_others]).sample(frac=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With