I know I can turn a normal polars series into a numpy array via .to_numpy()
.
import polars as pl
s = pl.Series("a", [1,2,3])
s.to_numpy()
# array([1, 2, 3])
However that does not work with a list type. What would be they way to turn such a construct into a 2-D array.
And even more general is there a way to turn a series of list[list[whatever]] into a 3-D and so on?
s = pl.Series("a", [[1,1,1],[1,2,3],[1,0,1]])
s.to_numpy()
# exceptions.ComputeError: 'to_numpy' not supported for dtype: List(Int64)
Desired output would be:
array([[1, 1, 1],
[1, 2, 3],
[1, 0, 1]])
Or one step further
s = pl.Series("a", [[[1,1],[1,2]],[[1,1],[1,1]]])
s.to_numpy()
# exceptions.ComputeError: 'to_numpy' not supported for dtype: List(Int64)
Desired output would be:
array([[[1, 1],
[1, 2]],
[[1, 1],
[1, 1]]])
You could explode
the series then reshape the numpy array after. That is probably the only way with the current ComputeError
specifying it's unsupported in polars. The list
dtype can have varying list lengths row to row, which would ruin any computation like this, so it makes sense it is not supported.
That said, if you know your list column is of uniform length for every row, this operation can be generally written for any arbitrary nesting of list
type. It just involves keeping track of the changed dimensions with each explode
, and then calculating the proper new dimensions:
from itertools import pairwise
def multidimensional_to_numpy(s):
dimensions = [1, len(s)]
while s.dtype == pl.List:
s = s.explode()
dimensions.append(len(s))
dimensions = [p[1] // p[0] for p in pairwise(dimensions)]
return s.to_numpy().reshape(dimensions)
multidimensional_to_numpy(pl.Series("a", [1,2,3]))
array([1, 2, 3], dtype=int64
multidimensional_to_numpy(pl.Series("a", [[1,1,1],[1,2,3],[1,0,1]]))
array([[1, 1, 1],
[1, 2, 3],
[1, 0, 1]], dtype=int64)
multidimensional_to_numpy(pl.Series("a", [[[1,1],[1,2]], [[1,1],[1,1]]]))
array([[[1, 1],
[1, 2]],
[[1, 1],
[1, 1]]], dtype=int64)
Note with the soon to be released Array dtype that guarantees same-length arrays throughout the column (and the current arr
will become list
), this answer could be improved upon in due time (maybe direct to_numpy support there?). In particular, the dimension calculating above should be able to be simplified to tracking the dtype.width
for each inner array dtype.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With