I have time series data per row (with columns as time steps) and I'd like to left and right pad each row with 0s based on a conditional row value (i.e. 'Padding amount'). This is what I have:
Padding amount T1 T2 T3
0 3 2.9 2.8
1 2.9 2.8 2.7
1 2.8 2.3 2.0
2 4.4 3.3 2.3
And this is what I'd like to produce:
Padding amount T1 T2 T3 T4 T5
0 3 2.9 2.8 0 0 (--> padding = 0, so no change)
1 0 2.9 2.8 2.7 0 (--> shifted one to the left)
1 0 2.8 2.3 2.0 0
2 0 0 4.4 3.3 2.3 (--> shifted two to the right)
I see that Keras has sequence padding, but not sure how this would work considering all rows have the same number of entries. I'm looking at Shift and np.roll but I'm sure a solution exists for this already somewhere.
In numpy, you could construct an array of indices for the locations where you want to place your array elements.
Let's say you have
padding = np.array([0, 1, 1, 2])
data = np.array([[3.0, 2.9, 2.8],
[2.9, 2.8, 2.7],
[2.8, 2.3, 2.0],
[4.4, 3.3, 2.3]])
M, N = data.shape
The output array would be
output = np.zeros((M, N + padding.max()))
You can make an index of where the data goes:
rows = np.arange(M)[:, None]
cols = padding[:, None] + np.arange(N)
Since the shape of the index broadcasts to the shape of the shape of the data, you can assign the output directly:
output[rows, cols] = data
Not sure how this applies to a DataFrame exactly, but you could probably construct a new one after operating on the values of the old one. Alternatively, you could probably implement all these operations equivalently directly in pandas.
This is one way of doing it, i've made the process really flexible in terms of how many time periods/steps it can take:
import pandas as pd
#data
d = {'Padding amount': [0, 1, 1, 2],
'T1': [3, 2.9, 2.8, 4.4],
'T2': [2.9, 2.7, 2.3, 3.3],
'T3': [2.8, 2.7, 2.0, 2.3]}
#create DF
df = pd.DataFrame(data = d)
#get max padding amount
maxPadd = df['Padding amount'].max()
#list of time periods
timePeriodsCols = [c for c in df.columns.tolist() if 'T' in c]
#reverse list
reverseList = timePeriodsCols[::-1]
#number of periods
noOfPeriods = len(timePeriodsCols)
#create new needed columns
for i in range(noOfPeriods + 1, noOfPeriods + 1 + maxPadd):
df['T' + str(i)] = ''
#loop over records
for i, row in df.iterrows():
#get padding amount
padAmount = df.at[i, 'Padding amount']
#if zero then do nothing
if padAmount == 0:
continue
#else: roll column value by padding amount and set old location to zero
else:
for col in reverseList:
df.at[i, df.columns[df.columns.get_loc(col) + padAmount]] = df.at[i, df.columns[df.columns.get_loc(col)]]
df.at[i, df.columns[df.columns.get_loc(col)]] = 0
print(df)
Padding amount T1 T2 T3 T4 T5
0 0 3.0 2.9 2.8
1 1 0.0 2.9 2.7 2.7
2 1 0.0 2.8 2.3 2
3 2 0.0 0.0 4.4 3.3 2.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With