I have a pandas dataframe that looks like this:
**I SI weights**
1 3 0.3
2 4 0.2
1 3 0.5
1 5 0.5
I need to do this: given a value of I, consider each value of SI and add the total weight. At the end, I should have, for each realization, something like that:
I = 1 SI = 3 weight = 0.8
SI = 5 weight = 0.5
I = 2 SI = 4 weight = 0.2
This is easily achievable by calling groupby and sum:
name = ['I', 'SI','weight']
Location = 'Simulationsdata/prova.csv'
df = pd.read_csv(Location, names = name,sep='\t',encoding='latin1')
results = df.groupby(['I', 'real', 'SI']).weight.sum()
Now I want the weights to be normalized to one, so that it should be something like that:
I = 1 SI = 3 weight = 0.615
SI = 5 weight = 0.385
I = 2 SI = 4 weight = 1
I tried this:
for idx2, j in enumerate(results.index.get_level_values(1).unique()):
norm = [float(i)/sum(results.loc[j]) for i in results.loc[j]]
but when I try to plot, for each I, the distribution of the SI, I find that also the SI are normalized, and I do not want this to happen.
P.s. this question is related to this one but, since it involves another aspect of the problem, I tought that it would be better to ask it separately
You should be able to divide the weight column by its own sum:
# example data
df
I SI weight
0 1 3 0.3
1 2 4 0.2
2 1 3 0.5
3 1 5 0.5
# two-level groupby, with the result as a DataFrame instead of Series:
# df['col'] gives a Series, df[['col']] gives a DF
res = df.groupby(['I', 'SI'])[['weight']].sum()
res
weight
I SI
1 3 0.8
5 0.5
2 4 0.2
# Get the sum of weights for each value of I,
# which will serve as denominators in normalization
denom = res.groupby('I')['weight'].sum()
denom
I
1 1.3
2 0.2
Name: weight, dtype: float64
# Divide each result value by its index-matched
# denominator value
res.weight = res.weight / denom
res
weight
I SI
1 3 0.615385
5 0.384615
2 4 1.000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With