Whats the most efficient way to convert a list of interactions such as this:
QWERT ASDF 12
QWERT ZXCV 15
QWERT HJKL 6
: : :
ASDF-XYZ HJKL-XYY 123
into an all vs all matrix representation such as this:
QWERT ASDF ZXCV ... ASDF-XYZ
QWERT 0 12 15 ... 9
ASDF 12 0 45 ... 35
ZXCV 15 45 0 ... 24
: : : : : :
ASDF-XYZ 9 35 24 ... 0
It could be a few thousand up to several hundred thousands of features, so speed does matter.
Edit: The input is a csv file. Please note that the feature names are arbitrary (but unique) strings and that missing interaction should be represented as 0 in the output matrix. Made the example more clear.
You can use numpy for this
lets say the input:
points = [(1,2,12), (1,3,15), (1,4,6)]
the first point is on the cordinates, (1,2) and it value is 12
you can use the the numpy function add.at:
table = numpy.zeros((5,5))
points = [(1,2,12), (1,3,15), (1,4,6)]
for point in points:
numpy.add.at(table, tuple(zip(i[0:2])), i[2])
np.rot90(table)
which leaves you with the output:
array([[ 0., 6., 0., 0., 0.],
[ 0., 15., 0., 0., 0.],
[ 0., 12., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
you can pretty easily modife the code so it print the headers too
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With