Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a type of pair tensor from cartesian product of 1D array with itself

Tags:

python

numpy

I have an array of strings, like

arr = np.array(['A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])

I want to create a torch tensor matrix with the type of pairs that will be created by cartesian product. So, the result would be a 8x8 tensor where:

  • if the row == 'A' and column is col == 'B', then the value in a matrix is for example 1;
  • if the row == 'B' and column is col == 'C', then the value in a matrix is for example 2,
  • 0 otherwise (i.e. if row is the same as col).

How can I achieve that?

So far, I have tried to do something as follows:

arr = np.array(['A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])
_, arr = np.unique(arr, return_inverse=True)
arr = torch.tensor(arr)
arr += 1

out = torch.cartesian_prod(arr, arr).reshape(8, 8, 2)
out = out[:, :, 1] * out[:, :, 0]
torch.where(torch.sqrt(out) == torch.sqrt(out).int(), 0, out)

tensor([[0, 2, 2, 2, 3, 3, 3, 3],
        [2, 0, 0, 0, 6, 6, 6, 6],
        [2, 0, 0, 0, 6, 6, 6, 6],
        [2, 0, 0, 0, 6, 6, 6, 6],
        [3, 6, 6, 6, 0, 0, 0, 0],
        [3, 6, 6, 6, 0, 0, 0, 0],
        [3, 6, 6, 6, 0, 0, 0, 0],
        [3, 6, 6, 6, 0, 0, 0, 0]])

But it feels a bit clumsy (even the values are quite random and not 0, 1, 2...), so I wanted to hear other ideas.

like image 279
Xaume Avatar asked Sep 13 '25 08:09

Xaume


1 Answers

Combine meshgrid and fancy indexing

Comments on the published solution

There are two flaws in the approach published in the original post.

First, the square root of the product of two unequal numbers can be an integer. For example, the root of 1 * 4 is two which is an integer number. Therefore, zeros will be inserted as identifiers of pairs like (1, 4), (2, 8), (3, 12), ..., which was certainly not intended.

Next, equal identifiers will be set for some different pairs. For example, we have the identifier 6 for pairs (1, 6) and (2, 3) in this model. To differentiate them, it is necessary to change the way the pair is identified.

New solution

Let's create a matrix of identifiers for all possible pairs based on range(number_of_pairs).reshape(...):

import numpy as np

data = np.array(['A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])
unique, codes = np.unique(data, return_inverse=True)

n = len(unique)
pairs_id = np.arange(n*n).reshape(n, n)   # assign unique id's to each pair
pairs_id = np.triu(pairs_id, k=1)         # put 0 on the diagonal and below it
pairs_id = pairs_id + pairs_id.T          # mirror the upper triangular matrix

# p.s. A hidden trick that is better expressed explicitly: 
# we can start the range from 0, because in this model
# the very first element is inevitably removed by replacing with zero

Now we have unique identifiers for each pair of initial unique values with the exception that the identifier is independent of the ordering of the pair, and for those pairs in which both values are equal, the identifier is zero.

To get the desirable answer, we can use numpy.meshgrid instead of Cartesian product and Integer array indexing:

left, right = np.meshgrid(codes, codes)
answer = pairs_id[left, right]

Summary code and output

import numpy as np

arr = np.array(['A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])
unique, codes = np.unique(arr, return_inverse=True)

n = len(unique)
pairs = np.arange(n*n).reshape(n, n)
pairs = np.triu(pairs, k=1)
pairs = pairs + pairs.T

left, right = np.meshgrid(codes, codes)
answer = pairs[left, right]

print(
    f'{unique = }',
    f'{codes = }',
    f'{pairs = }',
    f'{answer = }',
    sep='\n' + '\N{Horizontal Line Extension}'*30 + '\n'
)

Output:

unique = array(['A', 'B', 'C'], dtype='<U1')
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
codes = array([0, 1, 1, 1, 2, 2, 2, 2])
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
pairs = array([
       [0, 1, 2],
       [1, 0, 5],
       [2, 5, 0]])
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
answer = array([
       [0, 1, 1, 1, 2, 2, 2, 2],
       [1, 0, 0, 0, 5, 5, 5, 5],
       [1, 0, 0, 0, 5, 5, 5, 5],
       [1, 0, 0, 0, 5, 5, 5, 5],
       [2, 5, 5, 5, 0, 0, 0, 0],
       [2, 5, 5, 5, 0, 0, 0, 0],
       [2, 5, 5, 5, 0, 0, 0, 0],
       [2, 5, 5, 5, 0, 0, 0, 0]])
like image 170
Vitalizzare Avatar answered Sep 14 '25 20:09

Vitalizzare