Here is a test example to show what I am trying to achieve. Here's a toy data frame:
df = pd.DataFrame(np.random.randn(10,7),index=range(1,11),columns=headers)
Which gives
Time A_x A_y A_z B_x B_y B_z
1 -0.075509 -0.123527 -0.547239 -0.453707 -0.969796 0.248761 1.369613
2 -0.206369 -0.112098 -1.122609 0.218538 -0.878985 0.566872 -1.048862
3 -0.194552 0.818276 -1.563931 0.097377 1.641384 -0.766217 -1.482096
4 0.502731 0.766515 -0.650482 -0.087203 -0.089075 0.443969 0.354747
5 1.411380 -2.419204 -0.882383 0.005204 -0.204358 -0.999242 -0.395236
6 1.036695 1.115630 0.081825 -1.038442 0.515798 -0.060016 2.669702
7 0.392943 0.226386 0.039879 0.732611 -0.073447 1.164285 1.034357
8 -1.253264 0.389148 0.158289 0.440282 -1.195860 0.872064 0.906377
9 -0.133580 -0.308314 -0.839347 -0.517989 0.652120 0.477232 -0.391767
10 0.623841 0.473552 0.059428 0.726088 -0.593291 -3.186297 -0.846863
What I want to do is simply to calculate the length of the vector for each header (A and B) in this case, for each index, and divide by the Time column. Hence, this function needs to be np.sqrt(A_x^2 + A_y^2 + A_z^2) and the same for B of course. I.e. I am looking to calculate the velocity for each row, but three columns contribute to one velocity result.
I have tried using df.groupby and df.filter to loop-over the columns but I cannot really get it to work, because I am not at all sure how I apply effectively the same function to chunks of the data-frame, all in one go (as apparently one is to avoid looping over rows). I have tried doing
df = df.apply(lambda x: np.sqrt(x.dot(x)), axis=1)
This works of course, but only if the input data frame has the right number of columns (3), if longer then the dot-product is calculated over the entire row, and not in chunks of three columns which is what I want (because this is turns corresponds to the tag coordinates, which are three dimensional).
So this is what I am eventually trying to get with the above example (the below arrays are just filled with random numbers, not the actual velocities which I am trying to calculate - just to show what sort of shape I trying to achieve):
Velocity_A Velocity_B
1 -0.975633 -2.669544
2 0.766405 -0.264904
3 0.425481 -0.429894
4 -0.437316 0.954006
5 1.073352 -1.475964
6 -0.647534 0.937035
7 0.082517 0.438112
8 -0.387111 -1.417930
9 -0.111011 1.068530
10 0.451979 -0.053333
My actual data is 50,000 x 36 (so there are 12 tags with x,y,z coordinates), and I want to calculate the velocity all in one go to avoid iterating (if at all possible). There is also a time column of the same length (50,000x1).
How do you do this?
Thanks, Astrid
A possible start.
Filtering out column names corresponding to a particular vector. For example
In [20]: filter(lambda x: x.startswith("A_"),df.columns)
Out[20]: ['A_x', 'A_y', 'A_z']
Sub selecting these columns from the DataFrame
In [22]: df[filter(lambda x: x.startswith("A_"),df.columns)]
Out[22]:
A_x A_y A_z
1 -0.123527 -0.547239 -0.453707
2 -0.112098 -1.122609 0.218538
3 0.818276 -1.563931 0.097377
4 0.766515 -0.650482 -0.087203
5 -2.419204 -0.882383 0.005204
6 1.115630 0.081825 -1.038442
7 0.226386 0.039879 0.732611
8 0.389148 0.158289 0.440282
9 -0.308314 -0.839347 -0.517989
10 0.473552 0.059428 0.726088
So, using this technique you can get chunks of 3 columns. For example.
column_initials = ["A","B"]
for column_initial in column_initials:
df["Velocity_"+column_initial]=df[filter(lambda x: x.startswith(column_initial+"_"),df.columns)].apply(lambda x: np.sqrt(x.dot(x)), axis=1)/df.Time
In [32]: df[['Velocity_A','Velocity_B']]
Out[32]:
Velocity_A Velocity_B
1 -9.555311 -22.467965
2 -5.568487 -7.177625
3 -9.086257 -12.030091
4 2.007230 1.144208
5 1.824531 0.775006
6 1.472305 2.623467
7 1.954044 3.967796
8 -0.485576 -1.384815
9 -7.736036 -6.722931
10 1.392823 5.369757
I do not get the same answer as yours. But, I borrowed your df.apply(lambda x: np.sqrt(x.dot(x)), axis=1) and assume it is correct.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With