Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python multidimensional boolean array?

it would contain at most 1000 x 1000 x 1000 elements, which is too big for python dictionary.

with dict, around 30 x 1000 x 1000 elements, on my machine it already consumed 2gb of memory and everything got stoned.

any modules that can handle 3-dimension array whose value would be only True/False? I check bitarray http://pypi.python.org/pypi/bitarray, which seems reasonable and coded in C, however it seems more like a bit-stream instead of an array, since it supports only 1 dimension.

like image 961
thkang Avatar asked Feb 21 '26 02:02

thkang


2 Answers

numpy is your friend:

import numpy as np
a = np.zeros((1000,1000,1000), dtype=bool)
a[1,10,100] = True

Has a memory footprint as little as possible.

EDIT:

If you really need you can also look at the defaultdict class container in the collections module, which doesn't store the values that are of the default value. But if it's not really a must, use numpy.

like image 66
EnricoGiampieri Avatar answered Feb 23 '26 03:02

EnricoGiampieri


numpy has already been suggested by EnricoGiampieri, and if you can use this, you should.

Otherwise, there are two choices:

A jagged array, as suggested by NPE, would be a list of list of bitarrays. This allows you to have jagged bounds—e.g., each row could be a different width, or even independently resizable:

bits3d = [[bitarray.bitarray(1000) for y in range(1000)] for x in range(1000)]
myvalue = bits3d[x][y][z]

Alternatively, as suggested by Xymostech, do your own indexing on a 1-D array:

bits3d = bitarray.bitarray(1000*1000*1000)
myvalue = bits3d[x + y*1000 + z*1000*1000]

Either way, you'd probably want to wrap this up in a class, so you can do this:

bits3d = BitArray(1000, 1000, 1000)
myvalue = bits3d[x, y, z]

That's as easy as:

class Jagged3DBitArray(object):
    def __init__(self, xsize, ysize, zsize):
        self.lll = [[bitarray(zsize) for y in range(ysize)] 
                    for x in range(xsize)]
    def __getitem__(self, key):
        x, y, z = key
        return self.lll[x][y][z]
    def __setitem__(self, key, value):
        x, y, z = key
        self.lll[x][y][z] = value

class Fixed3DBitArray(object):
    def __init__(self, xsize, ysize, zsize):
        self.xsize, self.ysize, self.zsize = xsize, ysize, zsize
        self.b = bitarray(xsize * ysize * zsize)
    def __getitem__(self, key):
        x, y, z = key
        return self.b[x + y * self.ysize + z * self.ysize * self.zsize]
    def __setitem__(self, key, value):
        x, y, z = key
        self.b[x + y * self.ysize + z * self.ysize * self.zsize] = value

Of course if you want more functionality (like slicing), you have to write a bit more.

The jagged array will use a bit more memory (after all, you have the overhead of 1M bitarray objects and 1K list objects), and may be a bit slower, but this usually won't make much difference.

The important deciding factor should be whether it's inherently an error for your data to have jagged rows. If so, use the second solution; if it might be useful to have jagged or resizable rows, use the former. (Keeping in mind that I'd use numpy over either solution, if at all possible.)

like image 44
abarnert Avatar answered Feb 23 '26 03:02

abarnert



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!