I'm trying to calculate a 2d variable z = x + y where x and y are 1d arrays of unequal dimensions (say, x- and y-coordinate points on a spatial grid). I'd like to display the result row-by-row in which the values of x and y are in the first two columns and the corresponding value of z calculated from these x and y values are in the third, something like the following for x = [1, 2] and y = [3, 4, 5]:
x y z
1 3 4
1 4 5
1 5 6
2 3 5
2 4 6
2 5 7
The code below works (using lists here, but will probably need numpy arrays later):
import pandas as pd
x = [1, 2]
y = [3, 4, 5]
col1 = []
col2 = []
z = []
for i in range(len(x)):
for j in range(len(y)):
col1.append(x[i])
col2.append(y[j])
z.append(x[i]+y[j])
df = pd.DataFrame(zip(col1, col2, z), columns=["x", "y", "z"])
print(df)
Just wondering, is there a better way of doing this without using the loop by some combination of meshgrid, indices, flatten, v/hstack, and reshape? The size of x and y will typically be around 100.
Here is one way:
import numpy as np
import pandas as pd
x = np.asarray([1, 2])[:, np.newaxis]
y = np.asarray([3, 4, 5])
x, y = np.broadcast_arrays(x, y)
z = x + y
df = pd.DataFrame(zip(x.ravel(), y.ravel(), z.ravel()), columns=["x", "y", "z"])
print(df)
# x y z
# 0 1 3 4
# 1 1 4 5
# 2 1 5 6
# 3 2 3 5
# 4 2 4 6
# 5 2 5 7
But yes, you can also use meshgrid instead of orthogonal arrays + explicit broadcasting. You can also use NumPy instead of Pandas.
x = np.asarray([1, 2])
y = np.asarray([3, 4, 5])
x, y = np.meshgrid(x, y, indexing='ij')
z = x + y
print(np.stack((x.ravel(), y.ravel(), z.ravel())).T)
# array([[1, 3, 4],
# [1, 4, 5],
# [1, 5, 6],
# [2, 3, 5],
# [2, 4, 6],
# [2, 5, 7]])
Not as efficient as low level numpy broadcasting, but you could use a cross-merge:
x = [1, 2]
y = [3, 4, 5]
df = (pd.DataFrame({'x': x})
.merge(pd.DataFrame({'y': y}), how='cross')
.eval('z = x+y') # or .assign(z=lambda d: d['x']+d['y'])
)
Alternative with MultiIndex.from_product if you have many combinations of arrays/lists:
df = (pd.MultiIndex.from_product([x, y], names=['x', 'y'])
.to_frame(index=False)
.eval('z = x+y')
)
# or in pure python
df = (pd.DataFrame(product(x, y), columns=['x', 'y'])
.eval('z = x+y')
)
Output:
x y z
0 1 3 4
1 1 4 5
2 1 5 6
3 2 3 5
4 2 4 6
5 2 5 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With