Recoding levels of factors

Question

I have following dataframe:

forStack
  AGE  BMI time          A         B      ID
 1  59 23.8    0     (0,75]  (4,14.9] 9000099
 2  69 29.8    0 (96.4,100]  (-Inf,0] 9000296
 3  71 22.7    0  (75,89.3]  (4,14.9] 9000622
 4  56 32.4    0     (0,75] (14.9,68] 9000798
 5  72 30.7    0     (0,75] (14.9,68] 9001104
 6  75 23.5    0 (96.4,100]     (0,4] 9001400

dput (forStack)
structure(list(AGE = c(59, 69, 71, 56, 72, 75), BMI = c(23.8, 
29.8, 22.7, 32.4, 30.7, 23.5), time = c(0, 0, 0, 0, 0, 0), A = structure(c(2L, 
5L, 3L, 2L, 2L, 5L), .Label = c("(-Inf,0]", "(0,75]", "(75,89.3]", 
"(89.3,96.4]", "(96.4,100]", "(100, Inf]"), class = "factor"), 
B = structure(c(3L, 1L, 3L, 4L, 4L, 2L), .Label = c("(-Inf,0]", 
"(0,4]", "(4,14.9]", "(14.9,68]", "(68, Inf]"), class = "factor"), 
ID = c(9000099, 9000296, 9000622, 9000798, 9001104, 9001400
)), .Names = c("AGE", "BMI", "time", "A", "B", "ID"), row.names = c(NA, 
6L), class = "data.frame")

Variables A and B are factors representing quartiles:

   forStack$A
   [1] (0,75]     (96.4,100] (75,89.3]  (0,75]     (0,75]     (96.4,100]
   Levels: (-Inf,0] (0,75] (75,89.3] (89.3,96.4] (96.4,100] (100, Inf]

   forStack$B
   [1] (4,14.9]  (-Inf,0]  (4,14.9]  (14.9,68] (14.9,68] (0,4]    
   Levels: (-Inf,0] (0,4] (4,14.9] (14.9,68] (68, Inf]

I would like to recode A and B values to two-level factors as follows:

For A, the upper factor levels (96.4,100] and (100, Inf] should be recoded as 0 level, other levels - as 1 level

For B the the lowest factor levels (-Inf,0] and (0,4] should be recoded as 0 level, other levels - as 1 level

Thus, the dataframe should look like:

 forStack
  AGE  BMI time          A         B      ID
 1  59 23.8    0         1         1   9000099
 2  69 29.8    0         0         0   9000296
 3  71 22.7    0         1         1   9000622
 4  56 32.4    0         1         1   9000798
 5  72 30.7    0         1         1   9001104
 6  75 23.5    0         0         0   9001400

What is the most efficient way to do it? Thank you very much in advance

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

Here's one approach:

within(forStack, {
  A <- as.numeric(!A %in% tail(levels(A), 2))
  B <- as.numeric(!B %in% head(levels(B), 2))
})
#   AGE  BMI time A B      ID
# 1  59 23.8    0 1 1 9000099
# 2  69 29.8    0 0 0 9000296
# 3  71 22.7    0 1 1 9000622
# 4  56 32.4    0 1 1 9000798
# 5  72 30.7    0 1 1 9001104
# 6  75 23.5    0 0 0 9001400

The basic idea here is that head and tail both have an "n" argument that lets you specify how many values you want from the "head" and "tail" of your vector or dataset. That lets us easily grab (96.4,100] and (100, Inf] for vector A, and the relevant values for vector B.

within is a convenient way to dynamically replace the values in your data.frame.

mnel · Answer

As you know that the factors are ordered, you can do the following

within(forStack, {
    Ar <- (as.integer(A) < length(levels(A))-1)*1
    Br <- (as.integer(B) > 2)*1
})

Recoding levels of factors

Tags:

r

DSSS

2 Answers

A5C1D2H2I1M1N2O1R2T1

mnel

Recent Activity

Donate For Us

Recoding levels of factors

Tags:

r

DSSS

2 Answers

A5C1D2H2I1M1N2O1R2T1

mnel

Related questions

Recent Activity

Donate For Us