Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to list a 2d array in a tabular form along with two 1d arrays from which it was generated?

I'm trying to calculate a 2d variable z = x + y where x and y are 1d arrays of unequal dimensions (say, x- and y-coordinate points on a spatial grid). I'd like to display the result row-by-row in which the values of x and y are in the first two columns and the corresponding value of z calculated from these x and y values are in the third, something like the following for x = [1, 2] and y = [3, 4, 5]:

x  y  z
1  3  4
1  4  5
1  5  6
2  3  5
2  4  6
2  5  7

The code below works (using lists here, but will probably need numpy arrays later):

import pandas as pd

x = [1, 2]
y = [3, 4, 5]
col1 = []
col2 = []
z = []
for i in range(len(x)):
    for j in range(len(y)):
        col1.append(x[i])
        col2.append(y[j])
        z.append(x[i]+y[j])

df = pd.DataFrame(zip(col1, col2, z), columns=["x", "y", "z"])
print(df)

Just wondering, is there a better way of doing this without using the loop by some combination of meshgrid, indices, flatten, v/hstack, and reshape? The size of x and y will typically be around 100.

like image 517
Schat17 Avatar asked Dec 21 '25 06:12

Schat17


2 Answers

Here is one way:

import numpy as np
import pandas as pd
x = np.asarray([1, 2])[:, np.newaxis]
y = np.asarray([3, 4, 5])
x, y = np.broadcast_arrays(x, y)
z = x + y
df = pd.DataFrame(zip(x.ravel(), y.ravel(), z.ravel()), columns=["x", "y", "z"])
print(df)
#    x  y  z
# 0  1  3  4
# 1  1  4  5
# 2  1  5  6
# 3  2  3  5
# 4  2  4  6
# 5  2  5  7

But yes, you can also use meshgrid instead of orthogonal arrays + explicit broadcasting. You can also use NumPy instead of Pandas.

x = np.asarray([1, 2])
y = np.asarray([3, 4, 5])
x, y = np.meshgrid(x, y, indexing='ij')
z = x + y
print(np.stack((x.ravel(), y.ravel(), z.ravel())).T)
# array([[1, 3, 4],
#        [1, 4, 5],
#        [1, 5, 6],
#        [2, 3, 5],
#        [2, 4, 6],
#        [2, 5, 7]])
like image 164
Matt Haberland Avatar answered Dec 23 '25 20:12

Matt Haberland


Not as efficient as low level numpy broadcasting, but you could use a cross-merge:

x = [1, 2]
y = [3, 4, 5]

df = (pd.DataFrame({'x': x})
        .merge(pd.DataFrame({'y': y}), how='cross')
        .eval('z = x+y') # or .assign(z=lambda d: d['x']+d['y'])
     )

Alternative with MultiIndex.from_product if you have many combinations of arrays/lists:

df = (pd.MultiIndex.from_product([x, y], names=['x', 'y'])
        .to_frame(index=False)
        .eval('z = x+y')
     )

# or in pure python
df = (pd.DataFrame(product(x, y), columns=['x', 'y'])
        .eval('z = x+y')
     )

Output:

   x  y  z
0  1  3  4
1  1  4  5
2  1  5  6
3  2  3  5
4  2  4  6
5  2  5  7
like image 44
mozway Avatar answered Dec 23 '25 19:12

mozway