I've got a tibble like below:
structure(list(id = 1:11, var1 = c("A", "C", "B", "B", "B", "A",
"B", "C", "C", "C", "B"), var2 = list(NULL, NULL, NULL, structure(list(
x = c(0, 1, 23, 3), y = c(0.75149005651474, 0.149892757181078,
0.695984086720273, 0.0247649133671075)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame")), NULL, NULL,
NULL, NULL, NULL, NULL, NULL)), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
I'd like to leave only the rows where var2 is NOT null. But the simple !is.null() just doesn't work. df %>% filter(!is.null(var2)) returns the whole df. Why is that and how can I filter out all those rows with NULL in var2 column?
One possibility also involving purrr could be:
df %>%
filter(!map_lgl(var2, is.null))
id var1 var2
<int> <chr> <list>
1 4 B <tibble [4 × 2]>
Reflecting the properties of is.null(), you can also do:
df %>%
rowwise() %>%
filter(!is.null(var2))
The function drop_na() from tidyr will also work for NULL. You just have to be careful for the edge case where you have both NULL and NA values and only wanted to drop the NULL for some reason.
Drop rows containing missing values
library(tidyr)
df %>%
drop_na(var2)
# id var1 var2
# <int> <chr> <list>
# 1 4 B <tibble[,2] [4 x 2]>
!is.null() doesnt work because your var2 is a nested list (list of lists). It contains a tibble as its fourth element. A tibble is a list beacuse it is a data.frame and is.null checks only the first level of the nested list.
#show that the tibble is a list:
> is.list(df$var2[[4]])
[1] TRUE
You can try filtering on lengths(df$var2) > 0
> lengths(df$var2)
[1] 0 0 0 2 0 0 0 0 0 0 0
# each of the columns of the tibble in var2[[4]] is one element
# of the list contained in var2[[4]]. Thus var2[[4]] is a list of length two
> df %>% filter(lengths(var2) > 0)
# A tibble: 1 x 3
id var1 var2
<int> <chr> <list>
1 4 B <tibble [4 x 2]>
>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With