Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Declaring a huge dynamic array with tiny cells [C++]

I have this project I'm working on. The following conditions apply

  1. In this project I need to create one huge array (hopefully I will be able to create one as big as ~7.13e+17, but this target is still up ahead.)
  2. Each cell inside the array can contain one of the three values: 0,1,2
  3. I'm using C++ as my language.

I tried using the normal dynamic array command

int * p;
int i;    
i=[size]; //This is calculated somewhere else.
p= new (nothrow) int[i];

But as far as I understand, this array makes an array with a possible maximum size of the maximum size of int. If I change my code and use the following code

long long * p;
long long i;    
i=[size]; //This is calculated somewhere else.
p= new (nothrow) long long [i];

Then each cell in the array is of type "long long" making the array very memory heavy. Is there any way to create an array using long long to determine the number of cells in the array and have every cell of size int?

Thanks a lot, Uriel.

EDIT: for further information.

  1. This problem is mainly theoretical, it is part of my Master's thesis. I would still like this program to work as best as it can.
  2. My current step is to make this work for an array with 2.56e+09 items, quick calculation shows we are talking about an array that's at least 0.6 Gigabytes, something my system should be able to cope with. Yet I cannot achieve this goal with my current coding solution even though the amount of space required is really at 4.5GB.
like image 632
Urielnam Avatar asked Jun 26 '26 20:06

Urielnam


2 Answers

Is there any way to create an array using long long to determine the number of cells in the array and have every cell of size int?

There is no reason the type of the array has to be the same as the type of the variable used to specify the size. So use long long for the variable that specifies the size and then int for the type of the array.

int * p;
long long i;    
i=[size]; //This is calculated somewhere else.
p= new (nothrow) int [i];

However, I am concerned when you say you need to create an array "as big as ~7.13e+17". I don't know whether you mean bytes or elements but either way that is incredibly huge for a straight array. That's getting into the realm of petabytes of data.

In a 32-bit program, that's simply not possible. In theory you could have an array up to a couple gigabytes (though in practice considerably less most times).

In a 64-bit program you could in theory allocate an array that large, as far as I know. However, I'm skeptical that most machines could actually handle it. Since this amount of data would far exceed the RAM in the machine, the operating system would be forced to push most of this array into a page file. But a petabyte sized page file would far exceed the harddrive space on most typical machines right now.

Either way, you will probably need to seriously consider a different scheme than just allocating that whole huge array at once.

like image 117
TheUndeadFish Avatar answered Jun 29 '26 10:06

TheUndeadFish


Since you want a to maximize packing density, you'd probably be best off using bit fields:

struct item_pack { 
    char a:2;
    char b:2:
    char c:2;
    char d:2;
};

Then you can create an array of these and use proxy objects to support reading and writing individual items -- with the proviso that there are some limitations on how much you can do with proxy objects, so you'll have to be a bit careful about how you try to use this. A bit of looking at some of the articles about vector<bool> should provide some reasonable hints -- most of its characteristics stem from this general type of implementation. Despite the shortcomings as a general purpose container, this can work within limits, and provides much tighter packing of information than most of the obvious alternatives.

like image 41
Jerry Coffin Avatar answered Jun 29 '26 10:06

Jerry Coffin