I have a data frame with customer information in rows and periods (months) in columns. I use this format for clustering purposes. I want to scale the values in the rows. I can do it with the following code, but there are some problems:
Here is my sample data and code:
mydata
cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20
1 A 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0
2 B 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0
3 C 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0
4 D 0 1.0 2 1.0 0 1.0 2 1.0 0 1.0 2 1.0 0 1.0 2 1.0 0 1.0 2 1.0
5 E 4 5.0 6 5.0 4 5.0 6 5.0 4 5.0 6 5.0 4 5.0 6 5.0 4 5.0 6 5.0
6 F 8 9.0 10 9.0 8 9.0 10 9.0 8 9.0 10 9.0 8 9.0 10 9.0 8 9.0 10 9.0
7 G 2 1.5 1 0.5 0 0.5 1 1.5 2 1.5 1 0.5 0 0.5 1 1.5 2 1.5 1 0.5
8 H 6 5.5 5 4.5 4 4.5 5 5.5 6 5.5 5 4.5 4 4.5 5 5.5 6 5.5 5 4.5
9 I 10 9.5 9 8.5 8 8.5 9 9.5 10 9.5 9 8.5 8 8.5 9 9.5 10 9.5 9 8.5
code that I am using:
library(dplyr)
library(tidyr)
# first transpose the data
g_mydata = mydata %>% gather(period,value,-cust)
spr_mydata = g_mydata %>% spread(cust,value)
# then scale the values for each period
sc_mydata = spr_mydata %>%
mutate_each_(funs(scale),vars = c("A","B","C","D","E","F","G","H","I") )
# then transpose again back to original format
g_scdata = sc_mydata %>% gather(cust,value,-period)
scaled_data = g_scdata %>% spread(period,value)
Thanks for any help or suggestions.
You could always try apply():
sc_mydata = apply(spr_mydata[, -1], 1, scale)
If the NaN's are messing that up, you could transpose spr_mydata and try to run scale() directly:
scale(spr_mydata[-1, ])
Here is a dplyr way of doing it.
long_data =
mydata %>%
gather(period, value,-cust)
to_scale =
long_data %>%
group_by(cust) %>%
summarize(sd = sd(value)) %>%
filter(sd != 0) %>%
select(-sd)
flat =
long_data %>%
anti_join(to_scale) %>%
mutate(value = 0)
wide_scale =
long_data %>%
right_join(to_scale) %>%
group_by(cust) %>%
mutate(value =
value %>%
scale %>%
signif(7)) %>%
bind_rows(flat) %>%
spread(period, value)
type =
wide_scale %>%
select(-cust) %>%
distinct %>%
mutate(type_ID = 1:n())
customer__type =
type %>%
left_join(wide_scale) %>%
select(type_ID, cust)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With