Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient Algorithems to find position of vector in 2D array

I have an interesting problem but cannot find the relevant literature, so need algorithem suggestions or just the correct way to phrase the problem so I can find the literature.

I'm given a vector of floats of varying length coming from a 2D array. I know the orientation (i.e. which axis I'm looking in). I need to find the start index of the vector. Essentially, I'm cross-correlating a short vector with a long vector.

I have implemented a brute force approach but naturally it grows as O(n^2) where n is the dimension of the 2D array.

Q1) What is an efficient algorithem to tackel this problem?

Q2) How is this type of problem called (so I can find relevant papers)?

There is measurement error so it will never be an exact match.

Here the brute force approach, where I look for the minimum of the norm of two vectors:

import time
import numpy as np


def get_distance(a: np.ndarray, b: np.ndarray) -> float:
    """ Calculate the distance between two vectors. """
    return  np.linalg.norm(a - b)


def find_min_distance_subvector(array: np.ndarray, x: np.ndarray) -> tuple[int, float]:
    leng = len(x)
    min_index = 0
    min_distance = np.inf

    # Assuming we know the orientation of the vector
    for ii in range(0, len(array) - leng):
        # Extract a sub-vector of the same length as x, starting from index ii
        sub_vec = array[ii:ii + leng]
        dist = get_distance(sub_vec, x)
        if dist < min_distance:
            min_distance = dist
            min_index = ii
    return min_index, min_distance


def main():
    leng = 100
    size = 2000

    # Create the search map
    arr = np.random.random((size, size))

    # Pick a random sub-vector from the map
    index = np.random.randint(0, size - leng)
    x = arr[index:index + leng]

    start_time = time.time()
    min_index, min_metric = find_min_distance_subvector(arr, x)
    end_time = time.time()
    print(f"Minimum distance: {min_metric} at index {min_index}, correct index: {index}, "
          f"time taken: {end_time - start_time:.4f} seconds")


if __name__ == '__main__':
    main()

Thank you for your help

like image 617
Edgar H Avatar asked Dec 22 '25 22:12

Edgar H


1 Answers

Since you know the axis you're working on, let us reduce the problem to a single dimension: you want to find a subsequence matching some needle query="abcd" (or the closest thing to a match) inside a big heystack: corpus="azmpqrfksuxbabcdrtwerl". In other words, in a single dimension this problem can be reduced to a string search problem, with all the known algorithms for solving it.

However, if you're not certain your needle is indeed in the heystack and you want to choose instead some minimum distance subsequence instead of returning a False value (as your code might suggest), you have a different problem on your hands and I'm not aware of better than naive solutions for it.

like image 93
P.Diddy Avatar answered Dec 24 '25 10:12

P.Diddy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!