Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python speed up large nested array processing

Tags:

python

I'm wondering if there is a faster way to do this.

"""
Structure
 -data[]
  -data[0] 
    -data[number, number, number, number, number, number, number]
    - ... ect X 12000
  -data[1]
    -data[number, number, number, number, number, number, number]
    - ... ect X 12000
  -data[2]
    -data[number, number, number, number, number, number, number]
    - ... ect X 12000
  -data[3] 
    -data[number, number, number, number, number, number, number]
    - ... ect X 12000
x and y are the first two numbers in each data array.
"""

I need to scan each item in layers 1,2,3 against each item in the first layer (0) looking to see if they fall within a given search radius. This takes a while.

    for i in range (len(data[0])):

        x = data[0][i][0]
        y = data[0][i][1]

        for x in range (len(data[1])):
            x1 = data[1][x][0]
            y1 = data[1][x][1]

            if( math.pow((x1 -x),2) + math.pow((y1 - y),2) < somevalue):
                matches1.append(data[0][i])
                matches2.append(data[1][x])
                continue
            else:
                continue

Thanks for any assistance!

like image 549
Rankinstudio Avatar asked Mar 17 '26 23:03

Rankinstudio


2 Answers

First you should write more readable python code:

for x,y in data[0]:
    for x1, y1 in data[1]:
        if (x1 - x)**2 + (y1 - y)**2 < somevalue:
            matches1.append((x,y))
            matches2.append((x1,y1))

The you can vectorize the inner loop with numpy:

for x,y in data[0]:
    x1, y1 = data[1].T
    indices = (x1 - x)**2 + (y1 - y)**2 < somevalue
    matches.append(((x,y), data[1][indices]))
like image 165
Daniel Avatar answered Mar 19 '26 11:03

Daniel


For this specific problem scipy.spatial.KDTree or rather its Cython workalike scipy.spatial.cKDTree would appear to be taylor-made:

import numpy as np
from scipy.spatial import cKDTree

# create some random data
data = np.random.random((4, 12000, 7))

# in each record discard all but x and y
data_xy = data[..., :2]

# build trees
trees = [cKDTree(d) for d in data_xy]

somevalue = 0.001

# find all close pairs between reference layer and other layers
pairs = []
for tree in trees[1:]:
    pairs.append(trees[0].query_ball_tree(tree, np.sqrt(somevalue)))

This example takes less than a second. Please note that the output format is different to the one your script produces. For each of the three non-reference layers it is a list of lists, where the inner list at index k contains the indices of the points that are close to point k in the reference list.

like image 33
Paul Panzer Avatar answered Mar 19 '26 13:03

Paul Panzer