I have following dataframe:
forStack
AGE BMI time A B ID
1 59 23.8 0 (0,75] (4,14.9] 9000099
2 69 29.8 0 (96.4,100] (-Inf,0] 9000296
3 71 22.7 0 (75,89.3] (4,14.9] 9000622
4 56 32.4 0 (0,75] (14.9,68] 9000798
5 72 30.7 0 (0,75] (14.9,68] 9001104
6 75 23.5 0 (96.4,100] (0,4] 9001400
dput (forStack)
structure(list(AGE = c(59, 69, 71, 56, 72, 75), BMI = c(23.8,
29.8, 22.7, 32.4, 30.7, 23.5), time = c(0, 0, 0, 0, 0, 0), A = structure(c(2L,
5L, 3L, 2L, 2L, 5L), .Label = c("(-Inf,0]", "(0,75]", "(75,89.3]",
"(89.3,96.4]", "(96.4,100]", "(100, Inf]"), class = "factor"),
B = structure(c(3L, 1L, 3L, 4L, 4L, 2L), .Label = c("(-Inf,0]",
"(0,4]", "(4,14.9]", "(14.9,68]", "(68, Inf]"), class = "factor"),
ID = c(9000099, 9000296, 9000622, 9000798, 9001104, 9001400
)), .Names = c("AGE", "BMI", "time", "A", "B", "ID"), row.names = c(NA,
6L), class = "data.frame")
Variables A and B are factors representing quartiles:
forStack$A
[1] (0,75] (96.4,100] (75,89.3] (0,75] (0,75] (96.4,100]
Levels: (-Inf,0] (0,75] (75,89.3] (89.3,96.4] (96.4,100] (100, Inf]
forStack$B
[1] (4,14.9] (-Inf,0] (4,14.9] (14.9,68] (14.9,68] (0,4]
Levels: (-Inf,0] (0,4] (4,14.9] (14.9,68] (68, Inf]
I would like to recode A and B values to two-level factors as follows:
For A, the upper factor levels (96.4,100] and (100, Inf] should be recoded as 0 level, other levels - as 1 level
For B the the lowest factor levels (-Inf,0] and (0,4] should be recoded as 0 level, other levels - as 1 level
Thus, the dataframe should look like:
forStack
AGE BMI time A B ID
1 59 23.8 0 1 1 9000099
2 69 29.8 0 0 0 9000296
3 71 22.7 0 1 1 9000622
4 56 32.4 0 1 1 9000798
5 72 30.7 0 1 1 9001104
6 75 23.5 0 0 0 9001400
What is the most efficient way to do it? Thank you very much in advance
Here's one approach:
within(forStack, {
A <- as.numeric(!A %in% tail(levels(A), 2))
B <- as.numeric(!B %in% head(levels(B), 2))
})
# AGE BMI time A B ID
# 1 59 23.8 0 1 1 9000099
# 2 69 29.8 0 0 0 9000296
# 3 71 22.7 0 1 1 9000622
# 4 56 32.4 0 1 1 9000798
# 5 72 30.7 0 1 1 9001104
# 6 75 23.5 0 0 0 9001400
The basic idea here is that head and tail both have an "n" argument that lets you specify how many values you want from the "head" and "tail" of your vector or dataset. That lets us easily grab (96.4,100] and (100, Inf] for vector A, and the relevant values for vector B.
within is a convenient way to dynamically replace the values in your data.frame.
As you know that the factors are ordered, you can do the following
within(forStack, {
Ar <- (as.integer(A) < length(levels(A))-1)*1
Br <- (as.integer(B) > 2)*1
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With