Considering the example data bellow what is the correct way to perform a matrix multiplication with data that is in Polars?
In []: matrix_1 = pl.DataFrame({"col_1":[1,2,3],"col_2":[4,5,6], "col_3":[7,8,9]})
In []: matrix_2 = pl.DataFrame({"col_1":[9,8,7],"col_2":[6,5,4], "col_3":[3,2,1]})
I've done the following using numpy to perform computation:
In []: np.matmul(matrix_1, matrix_2)
Out[]:
array([[ 30, 24, 18],
[ 84, 69, 54],
[138, 114, 90]])
In []: np.dot(matrix_1, matrix_2)
Out[]:
array([[ 30, 24, 18],
[ 84, 69, 54],
[138, 114, 90]])
I was just wondering if there's a native way to do it to avoid copies because IRL I'm using much more data and if I could have the ergonomy of not having to convert data in and out of numpy this would be great.
P.s.: Another great thing would be able to use the @
to use the __matmult__
that if I'm not mistaken is not implemented in Polars API.
The interoperability of polars with numpy is already pretty strong as per the link @jqurious already posted in comments.
You can also see that interoperability in the fact that you can even use polars dataframes as the input to np.dot
.
It seems what you really need/want is a way to do the following while getting back a DataFrame
matrix_1.dot(matrix_2)
shape: (3, 3)
┌───────┬───────┬───────┐
│ col_1 ┆ col_2 ┆ col_3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═══════╪═══════╪═══════╡
│ 30 ┆ 84 ┆ 138 │
│ 24 ┆ 69 ┆ 114 │
│ 18 ┆ 54 ┆ 90 │
└───────┴───────┴───────┘
You can achieve this by making a helper function and then monkey patching it into pl.DataFrame
Just do:
import polars as pl
import numpy as np
def dot(self, rightdf):
return pl.from_numpy(np.dot(self, rightdf), columns=rightdf.columns)
pl.DataFrame.dot=dot
and then when you create your matrix_1
and matrix_2
it will have the method dot built in as above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With