Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Learning to understand plyr, ddply

Tags:

I've been attempting to understand what and how plyr works through trying different variables and functions and seeing what results. So I'm more looking for an explanation of how plyr works than specific fix it answers. I've read the documentation but my newbie brain is still not getting it.

Some data and names:

mydf<- data.frame(c("a","a","b","b","c","c"),c("e","e","e","e","e","e")                   ,c(1,2,3,10,20,30),                   c(5,10,20,20,15,10)) colnames(mydf)<-c("Model", "Class","Length", "Speed") mydf 

Question 1: Summarise versus Transform Syntax

So if I Enter: ddply(mydf, .(Model), summarise, sum = Length+Length)

I get:

`Model ..1 1     a   2 2     a   4 3     b   6 4     b  20 5     c  40 6     c  60 

and if I enter: ddply(mydf, .(Model), summarise, Length+Length) I get the same result.

Now if use transform: ddply(mydf, .(Model), transform, sum = (Length+Length))

I get:

  Model Class Length Speed sum 1     a     e      1     5   2 2     a     e      2    10   4 3     b     e      3    20   6 4     b     e     10    20  20 5     c     e     20    15  40 6     c     e     30    10  60 

But if I state it like the first summarise : ddply(mydf, .(Model), transform, (Length+Length))

  Model Class Length Speed 1     a     e      1     5 2     a     e      2    10 3     b     e      3    20 4     b     e     10    20 5     c     e     20    15 6     c     e     30    10 

So why does adding "sum =" make a difference?

Question 2: Why don't these work?

ddply(mydf, .(Model), sum, Length+Length) #Error in function (i) : object 'Length' not found

ddply(mydf, .(Model), length, mydf$Length) #Error in .fun(piece, ...) :  

2 arguments passed to 'length' which requires 1

These examples are more to show that somewhere I'm fundamentally not understanding how to use plyr.

Any anwsers or explanations are appreciated.

like image 420
rsgmon Avatar asked Jul 06 '12 22:07

rsgmon


Video Answer


2 Answers

I find that when I'm having trouble "visualizing" how any of the functional tools in R work, that the easiest thing to do is browser a single instance:

ddply(mydf, .(Model), function(x) browser() ) 

Then inspect x in real-time and it should all make sense. You can then test out your function on x, and if it works you're golden (barring other groupings being different than your first x).

like image 87
Ari B. Friedman Avatar answered Sep 22 '22 05:09

Ari B. Friedman


The syntax is:

ddply(data.frame, variable(s), function, optional arguments) 

where the function is expected to return a data.frame. In your situation,

  • summarise is a function that will transparently create a new data.frame, with the results of the expression that you provide as further arguments (...)

  • transform, a base R function, will transform the data.frames (first split by the variable(s)), adding new columns according to the expression(s) that you provide as further arguments. These need to be named, that's just the way transform works.

If you use other functions than subset, transform, mutate, with, within, or summarise, you'll need to make sure they return a data.frame (length and sum don't), or at the very least a vector of appropriate length for the output.

like image 42
baptiste Avatar answered Sep 21 '22 05:09

baptiste