I would like to randomly select 1 case (so 1 row from a dataframe) from each group in R, but I cannot work out how to do it.
My data is structured in longformat: 400 cases (rows) clustered within 250 groups (some groups only contain a single case, others 2, 3, 4, 5, or even 6). So what I would like to end up with is a dataframe containing 250 rows (with each row representing 1 randomly selected case from the 250 different groups).
I have the idea that I should use the sample function for this, but I could work out how to do it. Anyone any ideas?
Suppose your data frame X indicates group membership with a variable named "Group," as in this synthetic example:
G <- 8
set.seed(17)
X <- data.frame(Group=sort(sample.int(G, G, replace=TRUE)),
Case=1:G)
Here is a printout of X:
Group Case 1 2 1 2 2 2 3 2 3 4 4 4 5 4 5 6 5 6 7 7 7 8 8 8
Pick up the first instance of each value of "Group" using the duplicated function after randomly permuting the rows of X:
Y <- X[sample.int(nrow(X)), ]
Y[!duplicated(Y$Group), ]
Group Case 8 8 8 1 2 1 4 4 4 6 5 6 7 7 7
A comparison to X indicates random cases in each group were selected. Repeat these last two steps to confirm this if you like.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With