I would like to use dplyr::mutate to add two named columns to a dataframe simulataneously and with a single function call. Consider the following example
library(dplyr)
n <- 1e2; M <- 1e3
variance <- 1
x <- rnorm(n*M, 0, variance)
s <- rep(1:M, each = n)
dat <- data.frame(s = s, x = x)
ci_studclt <- function(x, alpha = 0.05) {
n <- length(x)
S_n <- var(x)
mean(x) + qt(c(alpha/2, 1 - alpha/2), df = n-1)*sqrt(S_n / n)
}
ci_studclt(x)
Trying something like the below returns an error, since obviously two values are produced and cannot be inserted into a single atomic-type column.
dat %>%
group_by(s) %>%
mutate(ci = ci_studclt(x, variance))
It seems one option is to insert a list column then unnest_wider and that this is easier with data.table or the specific case of splitting a string column into two new columns.
In my example, a confidence interval (lower and upper bound) come out of a function and I would like to directly add both as new columns to dat e.g. calling the columns ci_lower and ci_upper.
Is there a straightforward way of doing this with dplyr or do I need to insert the elements as a list column then unnest?
NB Keep in mind that the confidence interval values are a function of a group of simulated values x, grouped by s; the CI values should be constant within a group.
You can do this by having your function (or a wrapper function) return a data.frame. When you call it in mutate, don’t specify a column name (or else you’ll end up with a nested data.frame column). If you want to specify names for the new columns, you can include them as function arguments as in the below.
library(dplyr)
n <- 1e2; M <- 1e3
variance <- 1
x <- rnorm(n*M, 0, variance)
s <- rep(1:M, each = n)
dat <- data.frame(s = s, x = x)
ci_studclt <- function(x, alpha = 0.05) {
n <- length(x)
S_n <- var(x)
mean(x) + qt(c(alpha/2, 1 - alpha/2), df = n-1)*sqrt(S_n / n)
}
ci_wrapper <- function(x, alpha = 0.05, names_out = c("ci_lower", "ci_upper")) {
ci <- ci_studclt(x, alpha = alpha)
out <- data.frame(ci[[1]], ci[[2]])
names(out) <- names_out
out
}
# original code was ci_studclt(x, variance)
# but ci_studclt() doesn't take a variance argument, so I omitted
dat %>%
group_by(s) %>%
mutate(ci_wrapper(x))
output:
# A tibble: 100,000 x 4
# Groups: s [1,000]
s x ci_lower ci_upper
<int> <dbl> <dbl> <dbl>
1 1 0.233 -0.223 0.139
2 1 1.03 -0.223 0.139
3 1 1.53 -0.223 0.139
4 1 0.0150 -0.223 0.139
5 1 -0.211 -0.223 0.139
6 1 -1.13 -0.223 0.139
7 1 -1.51 -0.223 0.139
8 1 0.371 -0.223 0.139
9 1 1.80 -0.223 0.139
10 1 -0.137 -0.223 0.139
# ... with 99,990 more rows
With specified column names:
dat %>%
group_by(s) %>%
mutate(ci_wrapper(x, names_out = c("ci.lo", "ci.hi")))
output:
# A tibble: 100,000 x 4
# Groups: s [1,000]
s x ci.lo ci.hi
<int> <dbl> <dbl> <dbl>
1 1 0.233 -0.223 0.139
2 1 1.03 -0.223 0.139
3 1 1.53 -0.223 0.139
4 1 0.0150 -0.223 0.139
5 1 -0.211 -0.223 0.139
6 1 -1.13 -0.223 0.139
7 1 -1.51 -0.223 0.139
8 1 0.371 -0.223 0.139
9 1 1.80 -0.223 0.139
10 1 -0.137 -0.223 0.139
# ... with 99,990 more rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With