I have a nested data.frame - df_nested, there one of column contains df:
df <- tibble(ID_Value = 1:8,
xyz001 = c("text4", NA, NA, NA, NA, NA, NA, "text2"),
xyz002 = c(NA, NA, NA, "text3", "text1", NA, NA, NA),
xyz003 = c(NA, "text1", NA, NA, "text2", NA, "text2", NA))
I want to find a way, how to mutate this df, on these requirements:
mutate(across(matches("\\d")my attempt:
df_nested <- df_nested %>%
mutate(df = map(data, ~.x %>%
mutate(across(matches("\\dd"), function (x) {
conditions (ifelse, case_when or other)
...}
Also, should we better use across(), or is vars() still a good way to do it as well?
Thank you in advance.
Expected Output
df <- tibble(ID_Value = 1:8,
xyz001 = c("text4", NA, NA, NA, NA, NA, NA, NA),
xyz002 = c(NA, NA, NA, "text3", NA, NA, NA, NA),
xyz003 = c(NA, NA, NA, NA, "text2", NA, "text2", NA))
factor type to specify the order you want.Consider this function
max_only <- function(x, lvls) {
fct <- droplevels(factor(x, lvls))
`[<-`(x, as.integer(fct) != length(levels(fct)), NA_character_)
}
You can then specify any order you want
> max_only(c("apple", "banana", NA_character_), c("banana", "apple"))
[1] "apple" NA NA
> max_only(c("apple", "banana", NA_character_), c("apple", "banana"))
[1] NA "banana" NA
df %>%
mutate(across(matches("\\d"), max_only, c("tier1", "tier2", "tier3", "tier4")))
Output (this one looks more like your expected output)
# A tibble: 8 x 4
ID_Value xyz001 xyz002 xyz003
<int> <chr> <chr> <chr>
1 1 tier4 NA NA
2 2 NA NA NA
3 3 NA NA NA
4 4 NA tier3 NA
5 5 NA NA tier2
6 6 NA NA NA
7 7 NA NA tier2
8 8 NA NA NA
df %>%
mutate(as.data.frame(t(apply(
across(matches("\\d")), 1L,
max_only, c("tier1", "tier2", "tier3", "tier4")
))))
Output
# A tibble: 8 x 4
ID_Value xyz001 xyz002 xyz003
<int> <chr> <chr> <chr>
1 1 tier4 NA NA
2 2 NA NA tier1
3 3 NA NA NA
4 4 NA tier3 NA
5 5 NA NA tier2
6 6 NA NA NA
7 7 NA NA tier2
8 8 tier2 NA NA
[<- is almost equivalent to x[...] <- y; x. If ... is a logical vector (i.e. TRUE/FALSE), then values in x indexed by TRUE will be replaced by y. For example,
> x <- c("a", "b" ,"c")
> `[<-`(x, c(FALSE, TRUE, TRUE), NA_character_)
[1] "a" NA NA
> x[c(FALSE, TRUE, TRUE)] <- NA_character_; x
[1] "a" NA NA
NA_character_ is the NA value of a character type.
as.integer(fct) != length(levels(fct)) returns a logical vector of the same length as fct. TRUE indexes positions where the values of fct are not the highest level, FALSE indexes the opposite, and NA indexes NAs. For example, assume that fct looks like this
> x <- c("apple", "banana", NA)
> fct <- droplevels(factor(x, c("apple", "banana", "pear")))
> fct
[1] apple banana <NA>
Levels: apple banana
Then, you can see that
> as.integer(fct) != length(levels(fct))
[1] TRUE FALSE NA
All together, it just means that we assign NA_character_s to values that are not equal to the highest level but keep NAs unchanged.
[<-(x, as.integer(fct) != length(levels(fct)), NA_character_)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With