How do you combine multiple columns into one staggered column? For example, if I have data:
Column 1 Column 2
0 A E
1 B F
2 C G
3 D H
And I want it in the form:
Column 1
0 A
1 E
2 B
3 F
4 C
5 G
6 D
7 H
What is a good, vectorized pythonic way to go about doing this? I could probably do some sort of df.apply() hack but I'm betting there is a better way. The application is putting multiple dimensions of time series data into a single stream for ML applications.
First stack the columns and then drop the multiindex:
df.stack().reset_index(drop=True)
Out:
0 A
1 E
2 B
3 F
4 C
5 G
6 D
7 H
dtype: object
To get a dataframe:
pd.DataFrame(df.values.reshape(-1, 1), columns=['Column 1'])

For a series answering OP question:
pd.Series(df.values.flatten(), name='Column 1')
For a series timing tests:
pd.Series(get_df(n).values.flatten(), name='Column 1')
code
def get_df(n=1):
df = pd.DataFrame({'Column 2': {0: 'E', 1: 'F', 2: 'G', 3: 'H'},
'Column 1': {0: 'A', 1: 'B', 2: 'C', 3: 'D'}})
return pd.concat([df for _ in range(n)])
Given Sample

Given Sample * 10,000

Given Sample * 1,000,000

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With