The Titanic Dataset can be downloaded from kaggle: kaggle.com/c/titanic/data. Please use the train.csv or install the package 'titanic' and use the dataset titanic_train.
This works
library(dplyr)
library(stringr)
titanic <- titanic %>%
mutate(Cabin_Letter = ifelse(!is.na(Cabin), str_extract(Cabin, "[A-Z]+"), 'Unknown'))
This does not work entirely
titanic <- titanic %>%
mutate(Cabin_Letter = factor(ifelse(!is.na(Cabin), str_extract(Cabin, "[A-Z]+"), 'Unknown')))
Warning:
Warning messages: 1: In mutate_impl(.data, dots) : Unequal factor levels: coercing to character 2: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector 3: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector 4: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector 5: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector 6: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector 7: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector
How could I resolve this issue? I don't want to take the extra line:
titanic$Cabin_letter <- factor(titanic$Cabin_letter)
This issue can happen if the data is grouped (grouped_df) using the group_by() function. I just ran into it. My solution was to ungroup() the data frame and then convert to factor using as.factor().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With