How to specify and clean/delete the rows?

Question

For example, here is the dataframe. I want to delete the rows if the age difference in each ProjectNumber is less than 5. Within ProjectNumber = 1, the age difference are 6(56-50) so rows in ProjectNumber 1 is maintained. In ProjectNumber2, the age difference are 2(37-35), so 3 rows in ProjectNumber 2 will be deleted. How can I do that? There are thousands of Project Number in real data set so can't really do it mannually. The key problem is how to specify these rows and delete them. Thank you.

ProjectNumber	Age
1	50
1	52
1	53
1	55
1	56
2	35
2	36
2	37
3	40
3	41
3	42
3	43
3	45
3	46

r2evans · Accepted Answer

Base R variants:

aggregate then merge. This version is good if you have more than one "key" (ProjectNumber here).

keep <- subset(aggregate(Age ~ ProjectNumber, data = df1,
                         FUN = function(z) diff(range((z))) >= 5), Age)
keep
#   ProjectNumber  Age
# 1             1 TRUE
# 3             3 TRUE

merge(df1, keep[,1,drop=FALSE], by = "ProjectNumber")
#    ProjectNumber Age
# 1              1  50
# 2              1  52
# 3              1  53
# 4              1  55
# 5              1  56
# 9              3  40
# 10             3  41
# 11             3  42
# 12             3  43
# 13             3  45
# 14             3  46

aggregate then %in%:

### 'keep' from above
subset(df1, ProjectNumber %in% keep$ProjectNumber)

ave then simple subset:

keep <- ave(df1$Age, df1$ProjectNumber, 
            FUN = function(z) diff(range(z)) >= 5) > 0
keep
#  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
df1[keep,]
#    ProjectNumber Age
# 1              1  50
# 2              1  52
# 3              1  53
# 4              1  55
# 5              1  56
# 9              3  40
# 10             3  41
# 11             3  42
# 12             3  43
# 13             3  45
# 14             3  46

TarJae · Answer

Akrun's answer is top. Here: cumbersome but leads to the goal:

df %>% 
  group_by(ProjectNumber) %>% 
  mutate(diff_Age = max(Age) - min(Age)) %>% 
  subset(!diff_Age < 5) %>% 
  select(-diff_Age)

Output:

   ProjectNumber   Age
           <int> <int>
 1             1    50
 2             1    52
 3             1    53
 4             1    55
 5             1    56
 6             3    40
 7             3    41
 8             3    42
 9             3    43
10             3    45
11             3    46

How to specify and clean/delete the rows?

Tags:

r

Couch Tomato

2 Answers

r2evans

TarJae

Recent Activity

Donate For Us

How to specify and clean/delete the rows?

Tags:

r

Couch Tomato

2 Answers

r2evans

TarJae

Related questions

Recent Activity

Donate For Us