Select half of the dataframe given values in one column

Question

I want to select half of the dataframe given values in one of the columns. In other words, from the dataframe given below I need to extract half of rows given values in column Y:

DF:
 id1  column Y   value
9830     A         6 
7609     A         0 
9925     B         0 
9922     B         5 
9916     B         6
9917     B         8 
9914     C         2
9914     C         7
9914     C         7
9914     C         2
9914     C         9

New data frame should look like this:

  NEW DF:
     id1  column Y   value
    9830     A         6 
    9925     B         0 
    9922     B         5 
    9914     C         2
    9914     C         7

Also, it would be helpful to know solution for selecting random half of all rows datefram DF given column Y (e.g. not selecting first 50%).

Any help is appreciated. Thanks!

aichao · Accepted Answer

Assuming you want the first half of each group of rows with the same value for column Y where for odd number of rows we round down, we can use filter from dplyr:

library(dplyr)
df %>% group_by(`column Y`) %>% filter(row_number() <= floor(n()/2))
##Source: local data frame [5 x 3]
##Groups: column Y [3]
##
##    id1 column Y laclen
##  <int>   <fctr>  <int>
##1  9830        A      6
##2  9925        B      0
##3  9922        B      5
##4  9914        C      2
##5  9914        C      7

We first group_by column Y (note the back quotes since the column name contains space) and then use filter to keep only the rows for which the row_number is less than or equal to the total number of rows given by n() for the group divided by 2 (and rounded down with floor).

To select a random 50% of the rows in each group, use sample to generate the row numbers to keep and %in% to match those to keep:

set.seed(123)
result <- df %>% group_by(`column Y`) %>% filter(row_number() %in% sample(seq_len(n()),floor(n()/2)))
##Source: local data frame [5 x 3]
##Groups: column Y [3]
##
##    id1 column Y laclen
##  <int>   <fctr>  <int>
##1  9830        A      6
##2  9922        B      5
##3  9917        B      8
##4  9914        C      2
##5  9914        C      9

Select half of the dataframe given values in one column

Tags:

select

r

subset

rows

Makaroni

1 Answers

aichao

Recent Activity

Donate For Us

Select half of the dataframe given values in one column

Tags:

select

r

subset

rows

Makaroni

1 Answers

aichao

Related questions

Recent Activity

Donate For Us