Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do 'neutral' parenthesis cause errors in R (or maybe the tidyverse)?

Tags:

r

dplyr

When I have large complicated chunks of code I'll often use more parenthesis sets than are required. My code could look like this:

library(tidyverse)
mtcars %>% 
  mutate(name = rownames(.)) %>% 
  filter((cyl == 4 & grepl("Toyota", name)) | 
           ((cyl == 6 | cyl == 8), grepl("Mazda", name)))

instead of this:

mtcars %>% 
  mutate(name = rownames(.)) %>% 
  filter(cyl == 4 & grepl("Toyota", name) | 
           (cyl == 6 | cyl == 8, grepl("Mazda", name)))

What can I say? In my head the parenthesis protection helps me see De Morgan's laws easier, PEMDAS easier, computational order, etc.

This liberal use of parenthesis presents a problem. Even though I would think extra parenthesis should be neutral in R and the tidyverse, they appear not to be. Look at the error I get in the code chunk below.

mtcars %>% filter((cyl == 4 & am == 1)) %>% .[1, 3]
# [1] 108
mtcars %>% filter((cyl == 4, am == 1)) %>% .[1, 3]
# Error: unexpected ',' in "mtcars %>% filter((cyl == 4,"

Why does the first example directly above work, and the second throws an error? I know the direct answer is "you're using too many parenthesis" but why can't I? It just makes life easier, for me, in these logical operator nest nightmares. I am aware of the case_when() function which does help in these type of circumstances. I still want to know why I can't use extra parenthesis though. Thank you.

like image 795
Display name Avatar asked Nov 17 '25 09:11

Display name


2 Answers

The problem here is that you're using a "feature" of filter that conditions passed into ... and separated by commas are "automagically" joined with an &. But the parsing magic that makes that work isn't smart enough to sees through a single set of parens.

So it see just one condition, cyl == 4, am == 1 which isn't really syntactically valid as a single boolean expression. This is one of the reasons why I rather dislike this feature and always write out the &'s.

I suspect that since this feature of filter is fairly complicated to implement, there's not much benefit in trying to get it to parse the conditions in a more recursive manner. The perhaps unsatisfying answer is that passing conditions separated by commas is fine if you are doing something very simple, but if your boolean condition is complicated you should be more explicit.

like image 67
joran Avatar answered Nov 19 '25 22:11

joran


What MrFlick has commented is the main reason for your problem, but there is still some inconsistency in your example. We can use the rlang package to see what dplyr does with your input, which by the way isn't really magic, it's just using R's standard features for non-standard evaluation (R does have to parse your code after all).

First the expression that was given to filter without extra parentheses:

library(rlang)

exprs(cyl == 4 & grepl("Toyota", name) | 
        (cyl == 6 | cyl == 8), grepl("Mazda", name))
[[1]]
cyl == 4 & grepl("Toyota", name) | (cyl == 6 | cyl == 8)

[[2]]
grepl("Mazda", name)

So after joining everything with & you do get the expression that you want, but what's important is to note that you get 2 expressions because the comma separated them, and filter will parse each separately using standard R rules. You could write the expression like this for clarity:

exprs(cyl == 4 & grepl("Toyota", name) | (cyl == 6 | cyl == 8), 
      grepl("Mazda", name))

If you add the extra parentheses, filter would see a single expression with everything you wrote. If R tried to parse that expression, it would see (after removing the parentheses):

cyl == 4 & grepl("Toyota", name) | (cyl == 6 | cyl == 8), grepl("Mazda", name)

Which, like MrFlick said, is not valid in R.

like image 25
Alexis Avatar answered Nov 20 '25 00:11

Alexis



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!