I want to select half of the dataframe given values in one of the columns. In other words, from the dataframe given below I need to extract half of rows given values in column Y:
DF:
id1 column Y value
9830 A 6
7609 A 0
9925 B 0
9922 B 5
9916 B 6
9917 B 8
9914 C 2
9914 C 7
9914 C 7
9914 C 2
9914 C 9
New data frame should look like this:
NEW DF:
id1 column Y value
9830 A 6
9925 B 0
9922 B 5
9914 C 2
9914 C 7
Also, it would be helpful to know solution for selecting random half of all rows datefram DF given column Y (e.g. not selecting first 50%).
Any help is appreciated. Thanks!
Assuming you want the first half of each group of rows with the same value for column Y where for odd number of rows we round down, we can use filter from dplyr:
library(dplyr)
df %>% group_by(`column Y`) %>% filter(row_number() <= floor(n()/2))
##Source: local data frame [5 x 3]
##Groups: column Y [3]
##
## id1 column Y laclen
## <int> <fctr> <int>
##1 9830 A 6
##2 9925 B 0
##3 9922 B 5
##4 9914 C 2
##5 9914 C 7
We first group_by column Y (note the back quotes since the column name contains space) and then use filter to keep only the rows for which the row_number is less than or equal to the total number of rows given by n() for the group divided by 2 (and rounded down with floor).
To select a random 50% of the rows in each group, use sample to generate the row numbers to keep and %in% to match those to keep:
set.seed(123)
result <- df %>% group_by(`column Y`) %>% filter(row_number() %in% sample(seq_len(n()),floor(n()/2)))
##Source: local data frame [5 x 3]
##Groups: column Y [3]
##
## id1 column Y laclen
## <int> <fctr> <int>
##1 9830 A 6
##2 9922 B 5
##3 9917 B 8
##4 9914 C 2
##5 9914 C 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With