Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Winsorizing across all columns in a data frame (R) using `lapply`

I am trying to apply the Winsorize() function using lapply from the library(DescTools) package. What I currently have is;

data$col1 <- Winsorize(data$col1)

Which essentially replaces the extreme values with a value based on quantiles, replacing the below data as follows;

> data$col1
 [1]   -0.06775798   **-0.55213508**   -0.12338265
 [4]    0.04928349    **0.47524313**    0.04782829
 [7]   -0.05070639 **-112.67126382**    0.12657896
[10]   -0.12886632

> Winsorize(data$col1)
 [1] -0.06775798 **-0.37884540** -0.12338265  0.04928349
 [5]  **0.26038103**  0.04782829 -0.05070639 **-0.37884540**
 [9]  0.12657896 -0.12886632

I have a for loop which can do this across all columns of the data.frame col1, col2, col3, col4, however, I know lapply is a better option so I am trying to incorporate it into an lapply function but cannot seem to get it working. If anybody can point me in the right direction it would be much apreciated.

The data;

data <- structure(list(EQ.TA = c(-0.0677579847115102, -0.552135083517749, 
-0.123382654164705, 0.0492834931482554, 0.475243125304193, 0.0478282913638668, 
-0.050706389027946, -112.671263815473, 0.126578956975704, -0.128866322940619
), NI.EQ = c(3.64670235329765, 1.66115713369585, 0.209424623633739, 
0.340430636358184, -0.248411254566261, -12.1709277350516, 1.06888235737433, 
0.0515582237132515, 0.177323118521857, 0.419879195374698), NI.TA = c(-0.24709320230217, 
-0.917183132749265, -0.0258393659113752, 0.0167776109344148, 
-0.118055740980805, -0.582114677880617, -0.0541991646381309, 
-5.80913022585296, 0.0224453753901758, -0.0541082879872031), 
    TL.TA = c(1.06775798471151, 1.55213508351775, 1.12338265416471, 
    0.950716506851745, 0.524756874695807, 0.952171708636133, 
    1.05070638902795, 113.671263815473, 0.873421043024296, 1.12886632294062
    )), .Names = c("EQ.TA", "NI.EQ", "NI.TA", "TL.TA"), row.names = c(NA, 
10L), class = "data.frame")
like image 787
user113156 Avatar asked Oct 15 '25 09:10

user113156


2 Answers

You can lapply over the whole data.frame and reassign it like:

library(DescTools)
data[]<-lapply(data, Winsorize)

data
#          EQ.TA       NI.EQ       NI.TA      TL.TA
#1   -0.06775798  2.75320700 -0.24709320  1.0677580
#2   -0.55213508  1.66115713 -0.91718313  1.5521351
#3   -0.12338265  0.20942462 -0.02583937  1.1233827
#4    0.04928349  0.34043064  0.01677761  0.9507165
#5    0.31834425 -0.24841125 -0.11805574  0.6816558
#6    0.04782829 -6.80579532 -0.58211468  0.9521717
#7   -0.05070639  1.06888236 -0.05419916  1.0507064
#8  -62.21765589  0.05155822 -3.60775403 63.2176559
#9    0.12657896  0.17732312  0.01989488  0.8734210
#10  -0.12886632  0.41987920 -0.05410829  1.1288663
like image 99
Mike H. Avatar answered Oct 18 '25 05:10

Mike H.


I like the answers above. But for a recent research project I had a data frame with variables of different types. I only want to winsorize numeric variables at the 1%-level using lapply keeping NA values. Extending the answer above I think the following might be a suitable extension:

library(DescTools)

wins_vars <- function(x, pct_level = 0.01){
    if(is.numeric(x)){
      Winsorize(x, probs = c(pct_level, 1-pct_level), na.rm = T)
      } else {x}
    }

df <- bind_cols(
  lapply(df, wins_vars))

like image 32
ToWii Avatar answered Oct 18 '25 05:10

ToWii



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!