Padding rows based on conditional

Question

I have time series data per row (with columns as time steps) and I'd like to left and right pad each row with 0s based on a conditional row value (i.e. 'Padding amount'). This is what I have:

Padding amount     T1     T2     T3
   0               3      2.9    2.8
   1               2.9    2.8    2.7
   1               2.8    2.3    2.0
   2               4.4    3.3    2.3

And this is what I'd like to produce:

Padding amount     T1     T2     T3     T4     T5
   0               3      2.9    2.8    0      0    (--> padding = 0, so no change)
   1               0      2.9    2.8    2.7    0    (--> shifted one to the left)
   1               0      2.8    2.3    2.0    0
   2               0      0      4.4    3.3    2.3  (--> shifted two to the right)

I see that Keras has sequence padding, but not sure how this would work considering all rows have the same number of entries. I'm looking at Shift and np.roll but I'm sure a solution exists for this already somewhere.

Mad Physicist · Accepted Answer

In numpy, you could construct an array of indices for the locations where you want to place your array elements.

Let's say you have

padding = np.array([0, 1, 1, 2])
data = np.array([[3.0, 2.9, 2.8],
                 [2.9, 2.8, 2.7],
                 [2.8, 2.3, 2.0],
                 [4.4, 3.3, 2.3]])
M, N = data.shape

The output array would be

output = np.zeros((M, N + padding.max()))

You can make an index of where the data goes:

rows = np.arange(M)[:, None]
cols = padding[:, None] + np.arange(N)

Since the shape of the index broadcasts to the shape of the shape of the data, you can assign the output directly:

output[rows, cols] = data

Not sure how this applies to a DataFrame exactly, but you could probably construct a new one after operating on the values of the old one. Alternatively, you could probably implement all these operations equivalently directly in pandas.

Mit · Answer

This is one way of doing it, i've made the process really flexible in terms of how many time periods/steps it can take:

import pandas as pd

#data
d = {'Padding amount': [0, 1, 1, 2],
 'T1': [3, 2.9, 2.8, 4.4],
 'T2': [2.9, 2.7, 2.3, 3.3],
 'T3': [2.8, 2.7, 2.0, 2.3]}
#create DF
df = pd.DataFrame(data = d)
#get max padding amount
maxPadd = df['Padding amount'].max()
#list of time periods
timePeriodsCols = [c for c in df.columns.tolist() if 'T' in c]
#reverse list
reverseList = timePeriodsCols[::-1]
#number of periods
noOfPeriods = len(timePeriodsCols)

#create new needed columns
for i in range(noOfPeriods + 1, noOfPeriods + 1 + maxPadd):
    df['T' + str(i)] = ''

#loop over records
for i, row in df.iterrows():
    #get padding amount
    padAmount = df.at[i, 'Padding amount']
    #if zero then do nothing
    if padAmount == 0:
        continue
    #else: roll column value by padding amount and set old location to zero
    else:
        for col in reverseList:
            df.at[i, df.columns[df.columns.get_loc(col) + padAmount]] = df.at[i, df.columns[df.columns.get_loc(col)]]
            df.at[i, df.columns[df.columns.get_loc(col)]] = 0

print(df)

   Padding amount   T1   T2   T3   T4   T5
0               0  3.0  2.9  2.8          
1               1  0.0  2.9  2.7  2.7     
2               1  0.0  2.8  2.3    2     
3               2  0.0  0.0  4.4  3.3  2.3

Padding rows based on conditional

Tags:

python

pandas

numpy

keras

Ellio

2 Answers

Mad Physicist

Mit

Recent Activity

Donate For Us

Padding rows based on conditional

Tags:

python

pandas

numpy

keras

Ellio

2 Answers

Mad Physicist

Mit

Related questions

Recent Activity

Donate For Us