I have an array of strings, like
arr = np.array(['A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])
I want to create a torch tensor matrix with the type of pairs that will be created by cartesian product. So, the result would be a 8x8 tensor where:
row == 'A'
and column is col == 'B'
, then the value in a matrix is for example 1
;row == 'B'
and column is col == 'C'
, then the value in a matrix is for example 2
,0
otherwise (i.e. if row is the same as col).How can I achieve that?
So far, I have tried to do something as follows:
arr = np.array(['A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])
_, arr = np.unique(arr, return_inverse=True)
arr = torch.tensor(arr)
arr += 1
out = torch.cartesian_prod(arr, arr).reshape(8, 8, 2)
out = out[:, :, 1] * out[:, :, 0]
torch.where(torch.sqrt(out) == torch.sqrt(out).int(), 0, out)
tensor([[0, 2, 2, 2, 3, 3, 3, 3],
[2, 0, 0, 0, 6, 6, 6, 6],
[2, 0, 0, 0, 6, 6, 6, 6],
[2, 0, 0, 0, 6, 6, 6, 6],
[3, 6, 6, 6, 0, 0, 0, 0],
[3, 6, 6, 6, 0, 0, 0, 0],
[3, 6, 6, 6, 0, 0, 0, 0],
[3, 6, 6, 6, 0, 0, 0, 0]])
But it feels a bit clumsy (even the values are quite random and not 0, 1, 2...), so I wanted to hear other ideas.
There are two flaws in the approach published in the original post.
First, the square root of the product of two unequal numbers can be an integer. For example, the root of 1 * 4
is two which is an integer number. Therefore, zeros will be inserted as identifiers of pairs like (1, 4), (2, 8), (3, 12), ...
, which was certainly not intended.
Next, equal identifiers will be set for some different pairs. For example, we have the identifier 6 for pairs (1, 6)
and (2, 3)
in this model. To differentiate them, it is necessary to change the way the pair is identified.
Let's create a matrix of identifiers for all possible pairs based on range(number_of_pairs).reshape(...)
:
import numpy as np
data = np.array(['A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])
unique, codes = np.unique(data, return_inverse=True)
n = len(unique)
pairs_id = np.arange(n*n).reshape(n, n) # assign unique id's to each pair
pairs_id = np.triu(pairs_id, k=1) # put 0 on the diagonal and below it
pairs_id = pairs_id + pairs_id.T # mirror the upper triangular matrix
# p.s. A hidden trick that is better expressed explicitly:
# we can start the range from 0, because in this model
# the very first element is inevitably removed by replacing with zero
Now we have unique identifiers for each pair of initial unique values with the exception that the identifier is independent of the ordering of the pair, and for those pairs in which both values are equal, the identifier is zero.
To get the desirable answer, we can use numpy.meshgrid instead of Cartesian product and Integer array indexing:
left, right = np.meshgrid(codes, codes)
answer = pairs_id[left, right]
import numpy as np
arr = np.array(['A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])
unique, codes = np.unique(arr, return_inverse=True)
n = len(unique)
pairs = np.arange(n*n).reshape(n, n)
pairs = np.triu(pairs, k=1)
pairs = pairs + pairs.T
left, right = np.meshgrid(codes, codes)
answer = pairs[left, right]
print(
f'{unique = }',
f'{codes = }',
f'{pairs = }',
f'{answer = }',
sep='\n' + '\N{Horizontal Line Extension}'*30 + '\n'
)
Output:
unique = array(['A', 'B', 'C'], dtype='<U1')
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
codes = array([0, 1, 1, 1, 2, 2, 2, 2])
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
pairs = array([
[0, 1, 2],
[1, 0, 5],
[2, 5, 0]])
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
answer = array([
[0, 1, 1, 1, 2, 2, 2, 2],
[1, 0, 0, 0, 5, 5, 5, 5],
[1, 0, 0, 0, 5, 5, 5, 5],
[1, 0, 0, 0, 5, 5, 5, 5],
[2, 5, 5, 5, 0, 0, 0, 0],
[2, 5, 5, 5, 0, 0, 0, 0],
[2, 5, 5, 5, 0, 0, 0, 0],
[2, 5, 5, 5, 0, 0, 0, 0]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With