Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to best normalize a data frame in R by column?

I have a dataset like the one below that I would like to normalize (0 to 1) by column.

What I have currently:

        2015 Value      2014 Value      2013 Value
China           500             400             450
Germany         890             760             700
Italy           240             210             200

What would be great to end up with:

            2015 Value      2015 Normed     2014 Value      2014 Normed     2013 Value      2013 Normed
China           500             0.5             400             0.5             450             0.5
Germany         890             1.0             760             1.0             700             1.0
Italy           240             0.0             210             0.0             200             0.0

After this step, I'd like to average each Normed column in a "total average".

I've tried a couple of things, but I'm not seeing how to do apply the function by column with a new column for each output. The lapply function seems to be the right track, but I'm not sure how to best use it. (I'm a newcomer to R trying to learn.)

I really appreciate your help. Sorry for the basic questions!

like image 799
Natasha R. Avatar asked Dec 05 '25 09:12

Natasha R.


1 Answers

We can use lapply to loop over the columns, do the normalization, cbind with the original dataset columns alternatively using Map and then cbind the list elements to a data.frame

lst <- lapply(df[-1], function(x) round((x-min(x))/(max(x)-min(x)), 1))

res <- cbind(df[1], do.call(cbind.data.frame, Map(cbind , df[-1], lst)))
names(res)[-1] <- rbind(names(df)[-1], sub("Value", "Norm", names(df)[-1]))
res
#   Country 2015 Value 2015 Norm 2014 Value 2014 Norm 2013 Value 2013 Norm
#1   China        500       0.4        400       0.3        450       0.5
#2 Germany        890       1.0        760       1.0        700       1.0
#3   Italy        240       0.0        210       0.0        200       0.0

data

df <- structure(list(Country = c("China", "Germany", "Italy"), `2015 Value` = c(500L, 
890L, 240L), `2014 Value` = c(400L, 760L, 210L), `2013 Value` = c(450L, 
700L, 200L)), .Names = c("Country", "2015 Value", "2014 Value", 
"2013 Value"), class = "data.frame", row.names = c(NA, -3L))
like image 88
akrun Avatar answered Dec 07 '25 16:12

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!