Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create indicator

I would like to create a numeric indicator for a matrix such that for each unique element in one variable, it creates a sequence of the length based on the element in another variable. For example:

frame<- data.frame(x = c("a", "a", "a", "b", "b"), y = c(3,3,3,2,2))
frame
  x y
1 a 3
2 a 3
3 a 3
4 b 2
5 b 2

The indicator, z, should look like this:

  x y z
1 a 3 1
2 a 3 2
3 a 3 3
4 b 2 1
5 b 2 2

Any and all help greatly appreciated. Thanks.

like image 334
coding_heart Avatar asked Jan 18 '26 23:01

coding_heart


2 Answers

No ave?

frame$z <- with(frame, ave(y,x,FUN=seq_along) )
frame

#  x y z
#1 a 3 1
#2 a 3 2
#3 a 3 3
#4 b 2 1
#5 b 2 2

A data.table version could be something like below (thanks to @mnel):

#library(data.table)
#frame <- as.data.table(frame)
frame[,z := seq_len(.N), by=x]

My original thought was to use:

frame[,z := .SD[,.I], by=x]

where .SD refers to each subset of the data.table split by x. .I returns the row numbers for an entire data.table. So, .SD[,.I] returns the row numbers within each group. Although, as @mnel points out, this is inefficient compared to the other method as the entire .SD needs to be loaded into memory for each group to run this calculation.

like image 98
thelatemail Avatar answered Jan 20 '26 16:01

thelatemail


Another approach:

frame$z <- unlist(lapply(rle(as.numeric(frame[, "x"]))$lengths, seq_len))
like image 28
Tyler Rinker Avatar answered Jan 20 '26 15:01

Tyler Rinker