I have a table in a cross-tab format, example below:
State Item # x1 x2 x3 y1 y2 y3 z1 z2 z3
CA 1 6 4 3 7 5 3 11 5 1
CA 2 7 3 1 15 10 5 4 2 1
FL 3 3 2 1 5 3 2 13 7 2
FL 4 9 4 2 16 14 12 14 5 4
I am trying to use the melt function to put the data in the following format:
State Item # x xvalue y yvalue z zvalue
CA 1 x1 6 y1 7 z1 11
CA 1 x2 4 y2 5 z2 5
CA 1 x3 3 y3 3 z3 1
CA 2 x1 7 y1 15 z1 4
CA 2 x2 3 y2 10 z2 2
CA 2 x3 1 y3 5 z3 1
I know how to use the melt function to do it for just one of the values, say x. But I don't know how to do it with y and z too. See my code below for doing it for just x. Is there a way I can adjust this to do it for y and z also? Or should I try having separate melt functions for x, y, and z, and then somehow combine them?
df_m = pd.melt(df, id_vars=['State', 'Item #'],
value_vars=['x1','x2','x3'],
var_name='x', value_name='xvalue')
I don't think so, but you could use two line solution:
values = [['x1','x2','x3'], ['y1', 'y2', 'y3'], ['z1', 'z2', 'z3']]
df_m = pd.concat([pd.melt(df, id_vars=['State', 'Item_#'], value_vars=val, var_name='var', value_name='value') for val in values])
The pd.concat function is a powerful (i.e. fast) way to stack DataFrames vertically.
Here is a version that does not use melt but works for any number of xyz 'groups'.
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO('''
State ItemN x1 x2 x3 y1 y2 y3 z1 z2 z3
CA 1 6 4 3 7 5 3 11 5 1
CA 2 7 3 1 15 10 5 4 2 1
FL 3 3 2 1 5 3 2 13 7 2
FL 4 9 4 2 16 14 12 14 5 4'''),
sep=r' +')
# prepare index
df = df.set_index(list(df.columns[:2]))
df.columns = pd.MultiIndex.from_tuples([(c[0], c) for c in df.columns])
# x y z
# x1 x2 x3 y1 y2 y3 z1 z2 z3
# State ItemN
# CA 1 6 4 3 7 5 3 11 5 1
# 2 7 3 1 15 10 5 4 2 1
# FL 3 3 2 1 5 3 2 13 7 2
# 4 9 4 2 16 14 12 14 5 4
# stack and concat each 'group'
df2 = pd.concat((
df[c].stack().reset_index(-1)
for c in df.columns.levels[0]),
axis=1)
# rename the columns
new_cols = [None for _ in range(df2.shape[1])]
new_cols[::2] = [c for c in df.columns.levels[0]]
new_cols[1::2] = [c + 'value' for c in df.columns.levels[0]]
df2.columns = new_cols
# x xvalue y yvalue z zvalue
# State ItemN
# CA 1 x1 6 y1 7 z1 11
# 1 x2 4 y2 5 z2 5
# 1 x3 3 y3 3 z3 1
# 2 x1 7 y1 15 z1 4
# 2 x2 3 y2 10 z2 2
# 2 x3 1 y3 5 z3 1
# FL 3 x1 3 y1 5 z1 13
# 3 x2 2 y2 3 z2 7
# 3 x3 1 y3 2 z3 2
# 4 x1 9 y1 16 z1 14
# 4 x2 4 y2 14 z2 5
# 4 x3 2 y3 12 z3 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With