Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assign cumulative identifier to unique sets for each ID

I have a dataframe which contains two columns, id column id and an observation column val with observations from a-g in random order.

data.frame(
  id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L),
  val = c("a", "b", "c", "d", "e", "f", "a", "b", "b", "a", "c", "e", "d", "a", "b","c","f","g")
)

   id val
1   1   a
2   1   b
3   1   c
4   1   d
5   1   e
6   2   f
7   2   a
8   2   b
9   3   b
10  3   a
11  3   c
12  3   e
13  3   d
14  4   a
15  4   b
16  4   c
17  4   f
18  4   g

I want to assign an (cumulative) identifier valid for each unique set of observations (belonging to each id) given in column val, like so:

   id val valid
1   1   a     1
2   1   b     1
3   1   c     1
4   1   d     1
5   1   e     1
6   2   f     2
7   2   a     2
8   2   b     2
9   3   b     1
10  3   a     1
11  3   c     1
12  3   e     1
13  3   d     1
14  4   a     3
15  4   b     3
16  4   c     3
17  4   f     3
18  4   g     3
like image 406
maxxzi Avatar asked Nov 01 '25 11:11

maxxzi


1 Answers

You can try the base R option like below

u <- with(df, by(val, id, sort))

dfout <- merge(
    df,
    setNames(
        stack(setNames(match(u, unique(u)), names(u))),
        c("valid", "id")
    ),
    all = TRUE
)

and you will obtain

> dfout
   id val valid
1   1   a     1
2   1   b     1
3   1   c     1
4   1   d     1
5   1   e     1
6   2   f     2
7   2   a     2
8   2   b     2
9   3   b     1
10  3   a     1
11  3   c     1
12  3   e     1
13  3   d     1
14  4   a     3
15  4   b     3
16  4   c     3
17  4   f     3
18  4   g     3
like image 189
ThomasIsCoding Avatar answered Nov 03 '25 03:11

ThomasIsCoding