Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

creating a factor variable with dplyr?

Tags:

r

dplyr

Suppose I have a data frame that looks something like this:

df1=structure(list(Name = structure(1:6, .Label = c("N1", "N2", "N3", 
                                                    "N4", "N5", "N6", "N7"), class = "factor"), sector = structure(c(4L, 
                                                                                                                     4L, 4L, 3L, 3L, 2L), .Label = c("other stuff", "Private for-profit, 4-year or above", 
                                                                                                                                                     "Private not-for-profit, 4-year or above", "Public, 4-year or above"
                                                                                                                     ), class = "factor"), flagship = c(1, 0, 0, 0, 0, 0)), .Names = c("Name", 
                                                                                                                                                                                       "sector", "flagship"), row.names = c(NA, 6L), class = "data.frame")

I want to create a new factor variable, "Sector". I can do it in a long way with many lines of code, but I'm sure there is a more efficient way.

Right now this is what I'm doing:

df1$PublicFlag=0
df1$PublicFlag[df1$sector=="Public, 4-year or above" & df1$flagship==1]=1
df1$Public=0
df1$Public[df1$sector=="Public, 4-year or above" & df1$flagship==0]=1
df1$PrivateNP=0
df1$PrivateNP[df1$sector=="Private not-for-profit"]=1
df1$Private4P=0
df1$Private4P[df1$sector=="Private for-profit, 4-year or above"]=1

library(reshape)
df2 = melt(df1, id=c("Name", "sector", "flagship"))
df2 = df2[df2$value==1,c("Name", "sector", "flagship", "variable")]
library(plyr)
df2 = rename(df2, c("variable"="Sector"))

Thanks for the help!

like image 901
Ignacio Avatar asked Sep 06 '25 03:09

Ignacio


1 Answers

It's an old post, but I often stumble across it. That's why I want to give an up-to-date answer. Version 0.5.0 of dplyr introduced a lot of useful vector functions to solve this problem.

Avoiding ifelse-nesting (and thus keeping many, many kittens alive) with case_when():

df1 %>% 
  mutate(Sector = case_when(
        sector=="Public, 4-year or above" & flagship==1 ~ "PublicFlag",
        sector=="Public, 4-year or above" & flagship==0 ~ "Public",
        sector=="Private not-for-profit" ~ "PrivateNP",
        sector=="Private for-profit, 4-year or above" ~ "Private4P"),
    Sector = factor(Sector, levels=c("Public","PublicFlag","PrivateNP","Private4P"))
  )

Generation factor from character (or numeric) variable with recode_factor():

df1 %>%
    mutate(Sector = recode_factor(sector,
                               "Public, 4-year or above" = "Public",
                               "Private not-for-profit" = "PrivateNP",
                               "Private for-profit, 4-year or above" = "Private4P"))
like image 87
MarkusN Avatar answered Sep 08 '25 01:09

MarkusN