My data frame has the customerID, product and date. I need to create a column that generates a transaction id everytime the customerID or the date changes.
My dataframe currently looks like:
I need the transactionID column to get populated using r
I have the data in a csv file. I am not able to create this column in excel as the file is too big and excel is unable to open the entire file.
Thank you for your help!
Depending on your responses to my comment, you may also be interested in .GRP from "data.table":
library(data.table)
## In case rows get out of order
DT <- as.data.table(df, keep.rownames = TRUE)
DT[, transactionID := .GRP, by = list(CustomerID, date)][]
rn CustomerID Product date transactionID
1: 1 23 abv 12-12-14 1
2: 2 23 gfs 12-12-14 1
3: 3 18 gra 12-12-14 2
4: 4 52 wer 14-12-14 3
5: 5 23 qwe 16-12-14 4
Assuming that the columns CustomerID and date are ordered,
indx <- as.character(interaction(df[c(1,3)]))
df$transactionID <- cumsum(c(TRUE,indx[-1]!=indx[-length(indx)]))
df$transactionID
#[1] 1 1 2 3 4
Or if the columns are not ordered, for example, suppose there is one more row (6th row) that duplicates the first row
indx1 <- c(indx, indx[1])
as.numeric(factor(indx1, levels=unique(indx1)))
#[1] 1 1 2 3 4 1
Or
match(indx1, unique(indx1))
#[1] 1 1 2 3 4 1
df <- structure(list(CustomerID = c(23L, 23L, 18L, 52L, 23L), Product =
c("abv", "gfs", "gra", "wer", "qwe"), date = c("12-12-14", "12-12-14",
"12-12-14", "14-12-14", "16-12-14")), .Names = c("CustomerID",
"Product", "date"), class = "data.frame", row.names = c(NA, -5L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With