Add two columns simulataneously via mutate

Question

I would like to use dplyr::mutate to add two named columns to a dataframe simulataneously and with a single function call. Consider the following example

library(dplyr)

n <- 1e2; M <- 1e3
variance <- 1

x <- rnorm(n*M, 0, variance)
s <- rep(1:M, each = n)

dat <- data.frame(s = s, x = x)

ci_studclt <- function(x, alpha = 0.05) {
  n <- length(x)
  S_n <- var(x)
  mean(x) + qt(c(alpha/2, 1 - alpha/2), df = n-1)*sqrt(S_n / n)
}

ci_studclt(x)

Trying something like the below returns an error, since obviously two values are produced and cannot be inserted into a single atomic-type column.

dat %>% 
  group_by(s) %>% 
  mutate(ci = ci_studclt(x, variance))

It seems one option is to insert a list column then unnest_wider and that this is easier with data.table or the specific case of splitting a string column into two new columns.

In my example, a confidence interval (lower and upper bound) come out of a function and I would like to directly add both as new columns to dat e.g. calling the columns ci_lower and ci_upper.

Is there a straightforward way of doing this with dplyr or do I need to insert the elements as a list column then unnest?

NB Keep in mind that the confidence interval values are a function of a group of simulated values x, grouped by s; the CI values should be constant within a group.

zephryl · Accepted Answer

You can do this by having your function (or a wrapper function) return a data.frame. When you call it in mutate, don’t specify a column name (or else you’ll end up with a nested data.frame column). If you want to specify names for the new columns, you can include them as function arguments as in the below.


library(dplyr)

n <- 1e2; M <- 1e3
variance <- 1

x <- rnorm(n*M, 0, variance)
s <- rep(1:M, each = n)

dat <- data.frame(s = s, x = x)

ci_studclt <- function(x, alpha = 0.05) {
  n <- length(x)
  S_n <- var(x)
  mean(x) + qt(c(alpha/2, 1 - alpha/2), df = n-1)*sqrt(S_n / n)
}

ci_wrapper <- function(x, alpha = 0.05, names_out = c("ci_lower", "ci_upper")) {
  ci <- ci_studclt(x, alpha = alpha)
  out <- data.frame(ci[[1]], ci[[2]])
  names(out) <- names_out
  out
}

# original code was ci_studclt(x, variance)
# but ci_studclt() doesn't take a variance argument, so I omitted
dat %>% 
  group_by(s) %>% 
  mutate(ci_wrapper(x))

output:

# A tibble: 100,000 x 4
# Groups:   s [1,000]
       s       x ci_lower ci_upper
   <int>   <dbl>    <dbl>    <dbl>
 1     1  0.233    -0.223    0.139
 2     1  1.03     -0.223    0.139
 3     1  1.53     -0.223    0.139
 4     1  0.0150   -0.223    0.139
 5     1 -0.211    -0.223    0.139
 6     1 -1.13     -0.223    0.139
 7     1 -1.51     -0.223    0.139
 8     1  0.371    -0.223    0.139
 9     1  1.80     -0.223    0.139
10     1 -0.137    -0.223    0.139
# ... with 99,990 more rows

With specified column names:

dat %>% 
  group_by(s) %>% 
  mutate(ci_wrapper(x, names_out = c("ci.lo", "ci.hi")))

output:

# A tibble: 100,000 x 4
# Groups:   s [1,000]
       s       x  ci.lo ci.hi
   <int>   <dbl>  <dbl> <dbl>
 1     1  0.233  -0.223 0.139
 2     1  1.03   -0.223 0.139
 3     1  1.53   -0.223 0.139
 4     1  0.0150 -0.223 0.139
 5     1 -0.211  -0.223 0.139
 6     1 -1.13   -0.223 0.139
 7     1 -1.51   -0.223 0.139
 8     1  0.371  -0.223 0.139
 9     1  1.80   -0.223 0.139
10     1 -0.137  -0.223 0.139
# ... with 99,990 more rows

Add two columns simulataneously via mutate

Tags:

r

dplyr

Anil

1 Answers

zephryl

Recent Activity

Donate For Us

Add two columns simulataneously via mutate

Tags:

r

dplyr

Anil

1 Answers

zephryl

Related questions

Recent Activity

Donate For Us