Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ranking elements within a data.frame

Tags:

dataframe

r

Let's say I have a data frame, like this:

df <- data.frame(
  variable = rep(letters[1:10], 2),
  y2 = 1:10,
  y1 = c(10, 9, 8 ,7, 6, 5, 4, 2, 1, 3),
  stat = c(rep(letters[1], 10), rep(letters[2], 10))
)

By "stat", I would like to create three new columns, one that shows a numbered rank for y1 and y2, and another that calculates the change in rank between y1 and y2 (short for year 1 and year 2).

I've been tinkering with ddply, but I can't seem to get it to do what I want. Here's an example of what I've tried (which may also illustrate what I'm attempting to do):

ddply(df, .(stat), function(x) data.frame(
  df,
  y1rank = rank(x$x),
  y2rank = rank(x$y),
  change = rank(x$y) - rank(x$x)
))
like image 896
Brandon Bertelsen Avatar asked Mar 06 '26 16:03

Brandon Bertelsen


2 Answers

You can also use the new mutate function which saves you from recalculating the columns:

ddply(df, .(stat), mutate,
    y1rank = rank(y1),
    y2rank = rank(y2),
    change = y2rank - y1rank
)
like image 106
hadley Avatar answered Mar 08 '26 08:03

hadley


Would this work for you?

ddply(df, .(stat), transform,
    y1rank = rank(y1),
    y2rank = rank(y2),
    change = rank(y2) - rank(y1)
)
like image 38
crayola Avatar answered Mar 08 '26 09:03

crayola