Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data frame (matrix) performance: memory layout

Tags:

r

I am a newbie to R. Assume the memory layout is the same for data frame and matrix.

In the following matrix

a=matrix(1:10000000,1000000,10)

it has 1M rows and 10 columns. Is the memory for row or for column sequential physically? Or is the physical memory first store [1,1],[2,1],[3,1],,[1M,1],[2,1] or [1,2],[1,2],..[1,10],[2,1]...?

Suppose the matrix with 10M element is of size 100M, and the L2 cache is 4M, then L2 cache can't store all these 10M element. If we process the data sequentially, we will have less L2 cache missing ratio. For our case, we need to process row by row and read several columns at the same time, such as column A, B, C, and then create some result. If the layout of the memory is first store 10 items in 1st row, then store 10 items in the 2nd row, then the performance might be better.

If there any way to control the memory layout?

like image 559
Daniel Wu Avatar asked Oct 29 '25 10:10

Daniel Wu


1 Answers

Matrices are stored column-wise:

> m=matrix(1:12,nrow=3)
> m
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

Data frames are just pretty lists, and lists are stored as vectors of elements. I'm not even sure that list elements are guaranteed to be contiguous in memory.

Read up on writing R extensions for more info on how memory is handled. As far as I know there's no way to control the memory layout. Don't worry about it until it becomes a problem.

like image 181
Spacedman Avatar answered Oct 31 '25 00:10

Spacedman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!