I have a dataframe like this. I want to sum up special rows containing exact characters that matched my targets.
Ko_EC FPKM count
0 1.1.1.1 16.7 1
1 1.1.1.15 30.0 7
2 4.2.1.128 40.5 9
3 4.2.1.12 57.0 10
4 3.2.1.1 1.1.1.1 22.0 4
Here are my dataframe and my targets.
# coding=utf-8
import pandas as pd
import numpy as np
#########
classes = [('1.1.1.1', 16.7, 1),
('1.1.1.15', 30, 7),
('4.2.1.128', 40.5, 9),
('4.2.1.12', 57, 10),
('3.2.1.1 1.1.1.1', 22, 4)]
labels = ['Ko_EC','FPKM', 'count']
alls = pd.DataFrame.from_records(classes, columns=labels)
target_list = ['1.1.1.1', '4.2.1.128', '4.2.1.12; 1.1.1.15; 3.2.1.1', '1.1.1.15']
I want to sum up the alls['Ko_EC'] rows containing exact characters that matched the unique target_list.
Based on the answer to a previous question, I used this code:
result = pd.DataFrame()
for target in target_list:
mask = alls.apply(lambda x: any([cls in target for cls in x['Ko_EC'].split(' ')]),
axis=1)
target_sum = alls.loc[mask,["FPKM", 'count']].sum().reset_index().rename(columns={0:target}).iloc[:,1:]
result = pd.concat([result,target_sum], axis=1) #concat as result
result = result.T
result.columns = ['FPKM', 'Count']
It results like:
FPKM Count
1.1.1.1 38.7 5.0
4.2.1.128 97.5 19.0
4.2.1.12; 1.1.1.15; 3.2.1.1 125.7 22.0
1.1.1.15 68.7 12.0
What I want is
FPKM Count
1.1.1.1 38.7 5.0
4.2.1.128 40.5 9.0
4.2.1.12; 1.1.1.15; 3.2.1.1 109.0 21.0
1.1.1.15 30.0 7.0
Here, the 4.2.1.128 row FPKM Count only sum that of itself, and 4.2.1.12; 1.1.1.15; 3.2.1.1 rows sum that of 4.2.1.12, 1.1.1.15 and 3.2.1.1 1.1.1.1 in origin alls dataframe, but my code confused the characters 1.1.1.1/1.1.1.15 and 4.2.1.128/4.2.1.12, because of their similarity.
Could anyone tell me how to do this and why my code cannot work? Thanks!
for each target in target_list, create a regex pattern from target to find the all the rows which contains the target values then sum all such rows and yield the result
def func():
for target in target_list:
pat = r'\b(?:%s)\b' % target.replace('; ', '|')
mask = alls['Ko_EC'].str.contains(pat, na=False)
yield alls[mask].sum(numeric_only=True).to_frame(target).T
result = pd.concat(func())
FPKM count
1.1.1.1 38.7 5.0
4.2.1.128 40.5 9.0
4.2.1.12; 1.1.1.15; 3.2.1.1 109.0 21.0
1.1.1.15 30.0 7.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With