Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Actions to speed up R calculations

Tags:

r

data.table

plyr

I'm asking this as a general/beginner question about R, not specific to the package I was using.

I have a dataframe with 3 million rows and 15 columns. I don't consider this a huge dataframe, but maybe I'm wrong.

I was running the following script and it's been running for 2+ hours - I imagine there must be something I can do to speed this up.

Code:

ddply(orders, .(ClientID), NumOrders=len(OrderID))

This is not an overly intensive script, or again, I don't think it is.

In a database, you could add an index to a table to increase join speed. Is there a similar action in R I should be doing on import to make functions/packages run faster?

like image 358
mikebmassey Avatar asked Jan 20 '26 23:01

mikebmassey


1 Answers

Looks to me that you might want:

orders$NumOrders <- with( orders( ave(OrderID  , ClientID) , FUN=length) )

(I'm not aware that len() function exists.)

like image 136
IRTFM Avatar answered Jan 23 '26 14:01

IRTFM