while using dplyr i'm having trouble changing the last value my data frame. i want to group by user and tag and change the Time to 0 for the last value / row in the group.
     user_id     tag   Time
1  268096674       1    3
2  268096674       1    10
3  268096674       1    1
4  268096674       1    0
5  268096674       1    9999
6  268096674       2    0
7  268096674       2    9
8  268096674       2    500
9  268096674       3    0
10 268096674       3    1
...
Desired output:
     user_id     tag   Time
1  268096674       1    3
2  268096674       1    10
3  268096674       1    1
4  268096674       1    0
5  268096674       1    0
6  268096674       2    0
7  268096674       2    9
8  268096674       2    0
9  268096674       3    0
10 268096674       3    1
...
I've tried to do something like this, among others and can't figure it out:
df %>%
  group_by(user_id,tag) %>%
  mutate(tail(Time) <- 0)
I tried adding a row number as well, but couldn't quite put it all together. any help would be appreciated.
You can do that by using the function arrange from dplyr. 2. Use the dplyr filter function to get the first and the last row of each group.
Description. Most data operations are done on groups defined by variables. group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.
By using group_by() function from dplyr package we can perform group by on multiple columns or variables (two or more columns) and summarise on multiple columns for aggregations. Later, I will also explain how to apply summarise() on all columns and finally use multiple aggregation functions together.
table function, the base R function is almost 4 times and the dplyr function is 3 times slower!
Here's an option:
df %>%
  group_by(user_id, tag) %>%
  mutate(Time = c(Time[-n()], 0))
#Source: local data frame [10 x 3]
#Groups: user_id, tag
#
#     user_id tag Time
#1  268096674   1    3
#2  268096674   1   10
#3  268096674   1    1
#4  268096674   1    0
#5  268096674   1    0
#6  268096674   2    0
#7  268096674   2    9
#8  268096674   2    0
#9  268096674   3    0
#10 268096674   3    0
What I did here is: create a vector of the existing column "Time" with all elements except for the last one in the group, which has the index n() and add to that vector a 0 as last element using c() for concatenation.
Note that in my output the Time value in row 10 is also changed to 0 because it is considered the last entry of the group.
I would like to offer an alternative approach which will avoid copying the whole column (what both Time[-n()] and replace do) and allow modifying in place
library(data.table)
indx <- setDT(df)[, .I[.N], by = .(user_id, tag)]$V1 # finding the last incidences per group
df[indx, Time := 0L] # modifying in place
df
#       user_id tag Time
#  1: 268096674   1    3
#  2: 268096674   1   10
#  3: 268096674   1    1
#  4: 268096674   1    0
#  5: 268096674   1    0
#  6: 268096674   2    0
#  7: 268096674   2    9
#  8: 268096674   2    0
#  9: 268096674   3    0
# 10: 268096674   3    0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With