Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clearing all values in a dataset while retaining attributes using tidyverse

Tags:

r

dplyr

tidyverse

I need to create a blank version of a dataset, to clear all the values while preserving the column names and, importantly, the classes of the variables.

Here's some toy data, three different variables with three different attributes

df <- data.frame(x = rnorm(5),
                 y = factor(letters[5:1]),
                 z = c(1:2, NA, 4:5))

glimpse(df)

Rows: 5
Columns: 3
$ x <dbl> -0.24530142, -0.05332072, 0.12387791, -0.26148671, -0.53779766
$ y <fct> e, d, c, b, a
$ z <int> 1, 2, NA, 4, 5

Now when I try to clear the values using mutate and across in dplyr...

df %>%
  mutate(across(everything(),
                ~ NA)) -> blankDF

blankDF

   x  y  z
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA

Looks good, but

glimpse(blankDF)

# Rows: 5
# Columns: 3
# $ x <lgl> NA, NA, NA, NA, NA
# $ y <lgl> NA, NA, NA, NA, NA
# $ z <lgl> NA, NA, NA, NA, NA

It has stripped the attributes of all the variables, turning them logical.

Can someone give advice on how to get the blank dataset while retaining the attributes?

A tidyverse solution would be nice, but any solutions appreciated.

like image 918
llewmills Avatar asked Feb 04 '26 04:02

llewmills


2 Answers

You could replace all values across the columns by replacing the columns .x with NA using na_if like this:

library(dplyr)

glimpse(df)
#> Rows: 5
#> Columns: 3
#> $ x <dbl> -0.2006935, 1.3461746, -0.1433400, -0.8983886, -0.3190282
#> $ y <fct> e, d, c, b, a
#> $ z <int> 1, 2, NA, 4, 5

df_output = df %>% 
  mutate(across(everything(), ~ na_if(.x, .x)))

glimpse(df_output)
#> Rows: 5
#> Columns: 3
#> $ x <dbl> NA, NA, NA, NA, NA
#> $ y <fct> NA, NA, NA, NA, NA
#> $ z <int> NA, NA, NA, NA, NA

Created on 2023-07-07 with reprex v2.0.2

like image 90
Quinten Avatar answered Feb 05 '26 21:02

Quinten


Subset 0th row, i.e. no rows, that would keep the structure, then subset n rows, as there are no rows, it will give all NAs:

blankDF <- df[0, ][seq_len(nrow(df)), ]

str(blankDF)
# 'data.frame': 5 obs. of  3 variables:
# $ x: num  NA NA NA NA NA
# $ y: Factor w/ 5 levels "a","b","c","d",..: NA NA NA NA NA
# $ z: int  NA NA NA NA NA
like image 30
zx8754 Avatar answered Feb 05 '26 23:02

zx8754



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!