Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr mutate: create column using first occurrence of another column

Tags:

r

dplyr

I was wondering if there's a more elegant way of taking a dataframe, grouping by x to see how many x's occur in the dataset, then mutating to find the first occurrence of every x (y)

test <- data.frame(x = c("a", "b", "c", "d", 
                         "c", "b", "e", "f", "g"),
                   y = c(1,1,1,1,2,2,2,2,2)) 
  x y
1 a 1
2 b 1
3 c 1
4 d 1
5 c 2
6 b 2
7 e 2
8 f 2
9 g 2

Current Output

output <- test %>% 
  group_by(x) %>%
  summarise(count = n())
  x     count
  <fct> <int>
1 a         1
2 b         2
3 c         2
4 d         1
5 e         1
6 f         1
7 g         1

Desired Output

  x     count first_seen
  <fct> <int> <dbl>
1 a         1     1
2 b         2     1
3 c         2     1
4 d         1     1
5 e         1     2
6 f         1     2
7 g         1     2

I can filter the test dataframe for the first occurrences then use a left_join but was hoping there's a more elegant solution using mutate?

# filter for first occurrences of y
right <- test %>% 
  group_by(x) %>% 
  filter(y == min(y)) %>% 
  slice(1) %>%
  ungroup()

# bind to the output dataframe
left_join(output, right, by = "x")
like image 823
MayaGans Avatar asked Oct 20 '25 18:10

MayaGans


2 Answers

We can use first after grouping by 'x' to create a new column, use that also in group_by and get the count with n()

library(dplyr)
test %>% 
   group_by(x) %>%
   group_by(first_seen = first(y), add = TRUE) %>% 
   summarise(count = n())
# A tibble: 7 x 3
# Groups:   x [7]
#  x     first_seen count
#  <fct>      <dbl> <int>
#1 a              1     1
#2 b              1     2
#3 c              1     2
#4 d              1     1
#5 e              2     1
#6 f              2     1
#7 g              2     1
like image 173
akrun Avatar answered Oct 23 '25 08:10

akrun


I have a question. Why not keep it simple? for example

test %>% 
  group_by(x) %>% 
  summarise(
    count = n(), 
    first_seen = first(y)
    )
#> # A tibble: 7 x 3
#>   x     count first_seen
#>   <chr> <int>      <dbl>
#> 1 a         1          1
#> 2 b         2          1
#> 3 c         2          1
#> 4 d         1          1
#> 5 e         1          2
#> 6 f         1          2
#> 7 g         1          2
like image 36
perlatex Avatar answered Oct 23 '25 09:10

perlatex



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!