I am trying to create a list of data with NA values for each ID# in a dataframe so I can keep track of missing data. I have a dataframe where each row is an ID# and each column is a variable name. Each ID# may have different missing data so I want to condense this information down into a two column table. Currently, my table looks something like this:
ID Var1 Var2 Var3 Var4 Var5
1 10 T NA 2 NA
2 15 F 50 2 NA
3 12 NA 41 2 NA
4 NA NA NA 1 NA
5 NA F NA NA NA
...
I am hoping to get output that looks something like this:
ID Missing Variables
1 Var3, Var5
2 Var5
3 Var2, Var5
4 Var1, Var2, Var3, Var5
5 Var1, Var3, Var4, Var5
...
I'm confused how I would return to column name for each missing data entry. I know you can probably do something like this with a for loop but beyond that I get a little lost. Any help is much appreciated!
Here is a tidyverse
solution.
df <- read_table("
ID Var1 Var2 Var3 Var4 Var5
1 10 T NA 2 NA
2 15 F 50 2 NA
3 12 NA 41 2 NA
4 NA NA NA 1 NA
5 NA F NA NA NA", col_names = TRUE)
library(dplyr)
library(tidyr)
df %>%
mutate(across(starts_with("var"), is.na)) %>% # replace all NA with TRUE and else FALSE
pivot_longer(-ID, names_to = "var") %>% # pivot longer
filter(value) %>% # remove the FALSE rows
group_by(ID) %>% # group by the ID
summarise(`Missing Variables` = toString(var)) # convert the variable names to a string column
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 5 x 2
ID `Missing Variables`
<dbl> <chr>
1 1 Var3, Var5
2 2 Var5
3 3 Var2, Var5
4 4 Var1, Var2, Var3, Var5
5 5 Var1, Var3, Var4, Var5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With