Python: linear __getitem__ for a pair of list of lists

Question

I currently have a class which stores a list of lists. The inner lists are not of the same length. I made the class subscriptable with the following code (possibly not the best way of doing this, and perhaps overly fancy).

class MyClass:
    def __init__(self):
        #
        self.instructions = []

        # for demo purposes
        self.instructions.append([0, 1, 2])
        self.instructions.append([3, 4, 5, 6])
        self.instructions.append([7, 8])

    def __getitem__(self, ind):
        if ind >= 0:
            iterator = self.instructions.__iter__()
            compare = int.__gt__
            inc = int.__add__
        else:
            iterator = reversed(self.instructions)
            compare = int.__le__
            inc = int.__sub__

        s = 0
        for tp in iterator:
            L = len(tp)
            if compare(inc(s, L), ind):
                return tp[ind-s]
            else:
                s = inc(s, L)
        else:
            raise IndexError('index out of range')

This works. For instance

>>> x = MyClass()
>>> x[5]
5
>>> x[-5]
4

Now, I need to modify the class so it now stores two list of lists. The two lists are instructions and annotations, and both have the same length. But len(instructions[i]) does not have to equal len(annotations[i]).

class NewClass:
    def __init__(self):
        #
        self.instructions = []
        self.annotations = []

        # for demo purposes
        self.instructions.append([0, 1, 2])
        self.instructions.append([5, 6, 7, 8])
        self.instructions.append([12, 13])
        
        self.annotations.append([3, 4])
        self.annotations.append([9, 10, 11])
        self.annotations.append([14, 15, 16])

    def __getitem__(self, ind):
        pass

I want to make this subscriptable, with the order of elements oscillating between the instructions sublists and the annotations sublists. The demo data indicates the subscripting order. I want

>>> y = NewClass()
>>> y[9]
9
>>> y[-4]
13

What's an efficient way of doing this?

I could write a solution where I alternatively iterate through the two sublists. But I feel like I am straying far from the correct solution. I am also looking for a non-for-loop solution as I am dealing with long lists.

Davis Herring · Accepted Answer

The standard balance between storage cost and runtime cost for random access into the (non-stored) concatenation of several arrays is to store a table of the offsets of the beginning of each list (i.e., the sum of the length of every list before it) and use binary search on that table:

import itertools
import bisect

class Index:
    def __init__(self,ll):
        self.ll = ll
        self.oo = list(itertools.accumulate(map(len,ll), initial=0))

    def __getitem__(self, i):
        if i < 0:
            i += self.oo[-1]
        j = bisect.bisect(self.oo,i)
        if not 0 < j <= len(self.ll):
            raise IndexError
        return self.ll[j-1][i-self.oo[j-1]]

    def __iter__(self):
        return itertools.chain.from_iterable(self.ll)

# Example:
i = Index(
  [[9,1,7],
   [3,0],
   [],
   [4,4,4,2]]
)
assert i[4]==0 and i[8]==2

The j-1 is because the initial 0 causes an i of 0 to be assigned the insertion point of 1. You can omit the ,initial=0 (and in fact the last element of self.oo as well) at the cost of more complicated code for the edge/error cases. __iter__ is provided because it is asymptotically faster than indexing with successive integers, each of which must be subjected to the binary search even though usually the same sublist will be found.

Obviously extending this to support the interleaving of two lists (of equal length) is trivial: sum the interleaved lengths and then use divmod(j-1,2) to obtain the index into a list and the selection between the two lists (respectively).

Python: linear getitem for a pair of list of lists

Tags:

python

class

Abdullah Khalid

1 Answers

Davis Herring

Recent Activity

Donate For Us