I recently started to write my own functions to speed up standard and repetitive task while analyzing data with R.
At the moment I'm working on a function with three arguments and ran into a challenge I could not solve yet. I would like to have an optional grouping argument. During the process the function should check if there is a grouping argument and then continue using either subfunction 1 or 2.
But I always get the error "Object not found" if the grouping argument is not NA. How can I do this?
Edit: In my case the filter usually is used to filter certain valid or invalid years. If there is a grouping argument there will follow more steps in the pipe than if there is none.
require(tidyverse)
Data <- mpg
userfunction <- function(DF,Filter,Group) {
without_group <- function(DF) {
DF %>%
count(year)
}
with_group <- function(DF) {
DF %>%
group_by({{Group}}) %>%
count(year) %>%
pivot_wider(names_from=year, values_from=n) %>%
ungroup() %>%
mutate(across(.cols=2:ncol(.),.fns=~replace_na(.x, 0))) %>%
mutate(Mittelwert=round(rowMeans(.[,2:ncol(.)],na.rm=TRUE),2))
}
Obj <- DF %>%
ungroup() %>%
{if(Filter!=FALSE) filter(.,eval(rlang::parse_expr(Filter))) else filter(.,.$year==.$year)} %>%
{if(is.na(Group)) without_group(.) else with_group(.)}
return(Obj)
}
For NA it already works:
> Data %>%
+ userfunction(FALSE,NA)
# A tibble: 2 x 2
year n
<int> <int>
1 1999 117
2 2008 117
With argument it does not work:
> Data %>%
+ userfunction(FALSE,manufacturer)
Error in DF %>% ungroup() %>% { : object 'manufacturer' not found
Edit: What I would expect from the above function would be the following output:
> Data %>% userfunction_exp(FALSE,manufacturer)
# A tibble: 15 x 4
manufacturer `1999` `2008` Mittelwert
<chr> <dbl> <dbl> <dbl>
1 audi 9 9 9
2 chevrolet 7 12 9.5
3 dodge 16 21 18.5
4 ford 15 10 12.5
5 honda 5 4 4.5
6 hyundai 6 8 7
7 jeep 2 6 4
8 land rover 2 2 2
9 lincoln 2 1 1.5
10 mercury 2 2 2
11 nissan 6 7 6.5
12 pontiac 3 2 2.5
13 subaru 6 8 7
14 toyota 20 14 17
15 volkswagen 16 11 13.5
Data %>% userfunction_exp("cyl==4",manufacturer)
# A tibble: 9 x 4
manufacturer `1999` `2008` mean
<chr> <dbl> <dbl> <dbl>
1 audi 4 4 4
2 chevrolet 1 1 1
3 dodge 1 0 0.5
4 honda 5 4 4.5
5 hyundai 4 4 4
6 nissan 2 2 2
7 subaru 6 8 7
8 toyota 11 7 9
9 volkswagen 11 6 8.5
2021-04-01 14:55: edited to add some information and add some steps to the pipe for function with_group.
I don't know what is the use of Filter argument so I'll keep it as it is for now.
group_by(A) %>% count(B) is same as count(A, B) so you can change your function to :
library(tidyverse)
userfunction <- function(DF,Filter,Group = NULL) {
DF %>%
count(year, {{Group}}) %>%
pivot_wider(names_from=year, values_from=n)
}
Data %>% userfunction(FALSE)
# `1999` `2008`
# <int> <int>
#1 117 117
Data %>% userfunction(FALSE,manufacturer)
# A tibble: 15 x 3
# manufacturer `1999` `2008`
# <chr> <int> <int>
# 1 audi 9 9
# 2 chevrolet 7 12
# 3 dodge 16 21
# 4 ford 15 10
# 5 honda 5 4
# 6 hyundai 6 8
# 7 jeep 2 6
# 8 land rover 2 2
# 9 lincoln 2 1
#10 mercury 2 2
#11 nissan 6 7
#12 pontiac 3 2
#13 subaru 6 8
#14 toyota 20 14
#15 volkswagen 16 11
Note that I have assigned the default value to Group as NULL so when you don't mention anything it ignores that argument.
Hi this is a good question!
There are multiple ways to achieve this as the previous answers pointed out. One way to do it in the tidyverse is tidy evaluation
Omitting your filter function (which you could explain in more detail...)
my_summary <- function(df, grouping_var) {
grp_var <- enquo(grouping_var) #capture group variable
df %>% my_group_by(grp_var)
}
my_group_by <- function(df, grouping_var){
# Check if group is supplied
if(rlang::quo_is_missing(grouping_var)) {
df %>% without_group()
} else {
df %>% with_group(grouping_var)
}
}
without_group <- function(df) {
# do whatever without group
df %>%
count(year)
}
with_group <- function(df, grouping_var) {
# do whatever with group
df %>%
group_by(!!grouping_var) %>% #Note the !!
count(year) %>%
pivot_wider(names_from=year, values_from=n)
}
Which will give you without any argument
> mpg %>% my_summary()
# A tibble: 2 x 2
year n
<int> <int>
1 1999 117
2 2008 117
With group passed to pipe
> mpg %>% my_summary(model)
# A tibble: 38 x 3
# Groups: model [38]
model `1999` `2008`
<chr> <int> <int>
1 4runner 4wd 4 2
2 a4 4 3
3 a4 quattro 4 4
4 a6 quattro 1 2
5 altima 2 4
6 c1500 suburban 2wd 1 4
7 camry 4 3
8 camry solara 4 3
9 caravan 2wd 6 5
10 civic 5 4
# ... with 28 more rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With