Here is the code,
int array[X][Y] = {0,};
// 1 way to access the data
for (int x = 0; x < X; x++)
  for(int y = 0; y < Y; y++)
    array[x][y] = compute();
// the other way to access the data
for (int y = 0; y < Y; y++)
  for (int x = 0; x < X; x++)
    array[x][y] = compute();
Is it true that the first way is more efficient than the second one since the CPU cache(L1, L2?) optimization? In other words, whether sequential access pattern is preferred even for RAM?
You'll understand this better if you draw a picture of your array in memory:
  Y ->
X xxxxx ...
| xxxxx
v xxxxx
  .
  .
The adress you access will grow linear in Y direction (345, 345+1, 345+2...), but jumps wildly in X direction if Y is large (345, 345+X, 345+X*2). As the cache loads blocks of memory, you'll jump out of them very soon with Y big enough, but will always be in the cache page when walking in Y direction until the cache has to be refreshed.
Also note that this effect can be more extreme when using dynamical allocation. Using the following program with full optimizations gives me the following output (times in seconds)
0.615000
9.878000
EDIT: Other interesting measures:
Replacing the array code with int array[X][Y]; will use stack memory which is limited, so you can't test much bigger X/Y values, but also very fast:
0.000000
0.000000
Using int array[X][Y]; as a global variable will use a block of heap memory and is slow again. So even without dynamical allocation, the first case is much better:
0.929000
8.944000
Using X=1500, Y=1500 shows that the effect is measurable even with smaller arrays:
0.008000
0.059000
EDIT2: Also note there are other possible optimizations to the code as jalf said in a comment to your question. Using this optimization indeed almost doubles the speed (0.453 seconds for X=Y=10000):
// an even faster way to access the array
for (int x = 0; x < X; x++) {
  int* arrayptr = array[x];
  for (int y = 0; y < Y; y++, arrayptr++)
    *arrayptr = x;
}
Code: (note you can also use this to measure your case where the difference shouldn't be that extreme except for big X and Y. Like others already said, measure this and you'll be enlightened).
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define X 10000
#define Y 10000
int main() {
  int** array = new int*[X];
  for (int x = 0; x < X; x++) {
    array[x] = new int[Y];
  }
  double c = clock();  
  // 1 way to access the data
  for (int x = 0; x < X; x++)
    for(int y = 0; y < Y; y++)
      array[x][y] = x;
  printf("%f\n", (clock() - c) / CLOCKS_PER_SEC);
  c = clock();  
  // the other way to access the data
  for (int y = 0; y < Y; y++)
    for (int x = 0; x < X; x++)
      array[x][y] = x;
  printf("%f\n", (clock() - c) / CLOCKS_PER_SEC);
  for (int x = 0; x < X; x++) {
    delete(array[x]);
  }
  delete(array);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With