I'm asking this as a general/beginner question about R, not specific to the package I was using.
I have a dataframe with 3 million rows and 15 columns. I don't consider this a huge dataframe, but maybe I'm wrong.
I was running the following script and it's been running for 2+ hours - I imagine there must be something I can do to speed this up.
Code:
ddply(orders, .(ClientID), NumOrders=len(OrderID))
This is not an overly intensive script, or again, I don't think it is.
In a database, you could add an index to a table to increase join speed. Is there a similar action in R I should be doing on import to make functions/packages run faster?
Looks to me that you might want:
orders$NumOrders <- with( orders( ave(OrderID , ClientID) , FUN=length) )
(I'm not aware that len() function exists.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With