Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recoding levels of factors

Tags:

r

I have following dataframe:

forStack
  AGE  BMI time          A         B      ID
 1  59 23.8    0     (0,75]  (4,14.9] 9000099
 2  69 29.8    0 (96.4,100]  (-Inf,0] 9000296
 3  71 22.7    0  (75,89.3]  (4,14.9] 9000622
 4  56 32.4    0     (0,75] (14.9,68] 9000798
 5  72 30.7    0     (0,75] (14.9,68] 9001104
 6  75 23.5    0 (96.4,100]     (0,4] 9001400

dput (forStack)
structure(list(AGE = c(59, 69, 71, 56, 72, 75), BMI = c(23.8, 
29.8, 22.7, 32.4, 30.7, 23.5), time = c(0, 0, 0, 0, 0, 0), A = structure(c(2L, 
5L, 3L, 2L, 2L, 5L), .Label = c("(-Inf,0]", "(0,75]", "(75,89.3]", 
"(89.3,96.4]", "(96.4,100]", "(100, Inf]"), class = "factor"), 
B = structure(c(3L, 1L, 3L, 4L, 4L, 2L), .Label = c("(-Inf,0]", 
"(0,4]", "(4,14.9]", "(14.9,68]", "(68, Inf]"), class = "factor"), 
ID = c(9000099, 9000296, 9000622, 9000798, 9001104, 9001400
)), .Names = c("AGE", "BMI", "time", "A", "B", "ID"), row.names = c(NA, 
6L), class = "data.frame")

Variables A and B are factors representing quartiles:

   forStack$A
   [1] (0,75]     (96.4,100] (75,89.3]  (0,75]     (0,75]     (96.4,100]
   Levels: (-Inf,0] (0,75] (75,89.3] (89.3,96.4] (96.4,100] (100, Inf]

   forStack$B
   [1] (4,14.9]  (-Inf,0]  (4,14.9]  (14.9,68] (14.9,68] (0,4]    
   Levels: (-Inf,0] (0,4] (4,14.9] (14.9,68] (68, Inf]

I would like to recode A and B values to two-level factors as follows:

For A, the upper factor levels (96.4,100] and (100, Inf] should be recoded as 0 level, other levels - as 1 level

For B the the lowest factor levels (-Inf,0] and (0,4] should be recoded as 0 level, other levels - as 1 level

Thus, the dataframe should look like:

 forStack
  AGE  BMI time          A         B      ID
 1  59 23.8    0         1         1   9000099
 2  69 29.8    0         0         0   9000296
 3  71 22.7    0         1         1   9000622
 4  56 32.4    0         1         1   9000798
 5  72 30.7    0         1         1   9001104
 6  75 23.5    0         0         0   9001400

What is the most efficient way to do it? Thank you very much in advance

like image 829
DSSS Avatar asked Nov 23 '25 03:11

DSSS


2 Answers

Here's one approach:

within(forStack, {
  A <- as.numeric(!A %in% tail(levels(A), 2))
  B <- as.numeric(!B %in% head(levels(B), 2))
})
#   AGE  BMI time A B      ID
# 1  59 23.8    0 1 1 9000099
# 2  69 29.8    0 0 0 9000296
# 3  71 22.7    0 1 1 9000622
# 4  56 32.4    0 1 1 9000798
# 5  72 30.7    0 1 1 9001104
# 6  75 23.5    0 0 0 9001400

The basic idea here is that head and tail both have an "n" argument that lets you specify how many values you want from the "head" and "tail" of your vector or dataset. That lets us easily grab (96.4,100] and (100, Inf] for vector A, and the relevant values for vector B.

within is a convenient way to dynamically replace the values in your data.frame.

like image 89
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 25 '25 19:11

A5C1D2H2I1M1N2O1R2T1


As you know that the factors are ordered, you can do the following

within(forStack, {
    Ar <- (as.integer(A) < length(levels(A))-1)*1
    Br <- (as.integer(B) > 2)*1
})
like image 24
mnel Avatar answered Nov 25 '25 21:11

mnel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!