Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep duplicate values only if they are represented in first sampling period

I am trying to clean my data so that only duplicate values that have an observation in my first sampling period are kept. For instance, if my data frame looks like this:

    df <- data.frame(ID = c(1,1,1,2,2,2,3,3,4,4), period = c(1,2,3,1,2,3,2,3,1,3), mass = rnorm(10, 5, 2)) 

    df

       ID period     mass
    1   1      1 3.313674
    2   1      2 6.371979
    3   1      3 5.449435
    4   2      1 4.093022
    5   2      2 2.615782
    6   2      3 3.622842
    7   3      2 4.466666
    8   3      3 6.940979
    9   4      1 6.226222
    10  4      3 4.233397

I would like to keep observations only the observations that are duplicated for individuals measured during period 1. My new data frame would then look like this:

       ID period     mass
    1   1      1 3.313674
    2   1      2 6.371979
    3   1      3 5.449435
    4   2      1 4.093022
    5   2      2 2.615782
    6   2      3 3.622842
    9   4      1 6.226222
    10  4      3 4.233397

Using suggestions on this page (Remove all unique rows) I have tried using the following command, but it leaves in the observations for individual 3 (which was not measured in period 1).

    subset(df, duplicated(ID) | duplicated(ID, fromLast=T))
like image 955
burkh1jj Avatar asked Nov 23 '25 16:11

burkh1jj


1 Answers

If you want a base solution, the following should work, as well.

> df_new <- df[df$ID %in% df$ID[df$period == 1], ]
> df_new
   ID period     mass
1   1      1 3.238832
2   1      2 3.428847
3   1      3 1.205347
4   2      1 8.498452
5   2      2 7.523085
6   2      3 3.613678
9   4      1 3.324095
10  4      3 1.932733
like image 195
Nick Criswell Avatar answered Nov 26 '25 11:11

Nick Criswell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!