I have a dataframe like this
ID <- c ("ABC_10","AZM_11","ABC_11","ABC_12",
"ABC_13","AZM_12","ABC_14","ABC_15",
"CZX_10","CZX_11","CZX_12","CZX_13",
"FIN_10","FIN_11","FIN_12","FIN_13",
"FNM_10","FNM_11","FXS_10","FXS_11")
Id.n <- c(345,380,339,361,
245,390,639,661,
545,580,539,261,
345,180,139,261,
1045,1580,39,161)
df <- data.frame(ID,Id.n)
I am trying to subset this dataframe with the following condition
Threshold of ID.n's for FXS - 100
Threshold of ID.n's for FIN - 200
Threshold of ID.n's for all Other ID's - 300
My desired output is
ID Id.n
ABC_10 345
AZM_11 380
ABC_11 339
ABC_12 361
AZM_12 390
ABC_14 639
ABC_15 661
CZX_10 545
CZX_11 580
CZX_12 539
FIN_10 345
FIN_13 261
FNM_10 1045
FNM_11 1580
FXS_11 161
I am trying to do it this way but just not getting it right.
df <- subset(df,ifelse(grepl("FXS",df$ID), df$ID.n > 100,))
Can someone point me in the right direction?
Using dplyr:
library(dplyr)
df2 <- df %>%
filter((grepl("FXS", ID) & Id.n > 100) |
(grepl("FIN", ID) & Id.n > 200) |
(!grepl("FXS|FIN", ID) & Id.n > 300))
df2
# ID Id.n
# ABC_10 345
# AZM_11 380
# ABC_11 339
# ABC_12 361
# AZM_12 390
# ABC_14 639
# ABC_15 661
# CZX_10 545
# CZX_11 580
# CZX_12 539
# FIN_10 345
# FIN_13 261
# FNM_10 1045
# FNM_11 1580
# FXS_11 161
This is simpler with sanitized data. Using data.table, that'd look like...
library(data.table)
setDT(df)
df[, c("x", "y") := tstrsplit(ID, "_")][, ID := NULL ]
xDT = data.table(x = unique(df$x))
xDT[, th := 300 ]
xDT[.(x = c("FXS", "FIN"), th = c(100, 200)), on=.(x), th := i.th ]
Then a non equi join works to filter:
df[xDT, on=.(x, Id.n > th)]
Id.n x y
1: 300 ABC 11
2: 300 ABC 10
3: 300 ABC 12
4: 300 ABC 14
5: 300 ABC 15
6: 300 AZM 11
7: 300 AZM 12
8: 300 CZX 12
9: 300 CZX 10
10: 300 CZX 11
11: 200 FIN 13
12: 200 FIN 10
13: 300 FNM 10
14: 300 FNM 11
15: 100 FXS 11
Regarding grepl here, I think it is a bad idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With