I have a Multiindex DataFrame with pd.timestamp as toplevel index and some arbitrary parameters as second level. You can use this snippet to reproduce some mock data:
level0 = pd.period_range(start='2020-01-01-00:00:00', end='2020-01-01-00:01:00', freq='s')
level1 = ['foo', 'bar', 'foobar']
values = np.random.rand(len(level1)*len(level0))
idx = pd.MultiIndex.from_product([level0, level1], names=['time', 'level1'])
col = ['value']
df = pd.DataFrame(values, idx, col)
print(df)
Which creates the output:
value
time level1
2020-01-01 00:00:00 foo 0.345507
bar 0.147654
foobar 0.617000
2020-01-01 00:00:01 foo 0.430975
bar 0.783075
... ...
2020-01-01 00:00:59 bar 0.027083
foobar 0.553220
2020-01-01 00:01:00 foo 0.253957
bar 0.569881
foobar 0.976768
Now I want to take every n-th second as data for my further calculations. My first approach was to use .iloc[::n], which works fine, if the Dataframe is unstacked first. So the code df.unstack().iloc[::5].stack() produces exactly my desired output:
value
time level1
2020-01-01 00:00:00 bar 0.147654
foo 0.345507
foobar 0.617000
2020-01-01 00:00:05 bar 0.083129
foo 0.591585
foobar 0.660372
2020-01-01 00:00:10 bar 0.460798
foo 0.308138
foobar 0.622412
... ...
2020-01-01 00:00:55 bar 0.700964
foo 0.556782
foobar 0.601582
2020-01-01 00:01:00 bar 0.569881
foo 0.253957
foobar 0.976768
However, unstacking and stacking gets hugely ressource intensive when the df becomes large. And I feel that there is a simple, elegant and 'cheap' solution, that I just can't grasp.
Is there a solution to select every n-th timestamp that avoids unstacking the data?
EDIT: For anyone reading this: Although jezraels' answer is a nice way to do it and tought me a few things on how to deal with the problem, it turns out for my case (month worth of secondly data and ca 300 "level1"-entries), that the unstack().iloc[::n].stack() method is faster and scales better.
Select values by first level and then slicing each 5 values, last pass to DataFrame.loc:
df1 = df.loc[df.index.levels[0][::5]]
print(df1)
value
time level1
2020-01-01 00:00:00 foo 0.350853
bar 0.998113
foobar 0.077340
2020-01-01 00:00:05 foo 0.029292
bar 0.394105
foobar 0.375882
2020-01-01 00:00:10 foo 0.878306
bar 0.152500
foobar 0.299017
2020-01-01 00:00:15 foo 0.821039
bar 0.298991
foobar 0.496110
2020-01-01 00:00:20 foo 0.523729
bar 0.928747
foobar 0.902535
2020-01-01 00:00:25 foo 0.128496
bar 0.126517
foobar 0.802517
2020-01-01 00:00:30 foo 0.539462
bar 0.754518
foobar 0.243328
2020-01-01 00:00:35 foo 0.159771
bar 0.058187
foobar 0.508651
2020-01-01 00:00:40 foo 0.847019
bar 0.688022
foobar 0.368563
2020-01-01 00:00:45 foo 0.575865
bar 0.531090
foobar 0.756400
2020-01-01 00:00:50 foo 0.584265
bar 0.155704
foobar 0.734554
2020-01-01 00:00:55 foo 0.035894
bar 0.047678
foobar 0.746624
2020-01-01 00:01:00 foo 0.254821
bar 0.756032
foobar 0.694809
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With