Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculation of Active Share Measure

Tags:

python

pandas

Suppose I observe the weights for each stock in a set of portfolios. I would like to compute the "active share" measure. This is defined as active_share_{ij} = active_share_{ji} = \sum^N_k | \omega^k_i - \omega^k_j | where N is the number of all stocks that exist in the market, \omega^k_i is the weight of stock k in portfolio i and the vertical bars indicate the absolute difference. Note that the sum runs over all stocks in the market: if, for example, portfolio i holds stock k while portfolio j does not, then weight \omega^k_j=0 and the absolute difference should evaluate to | \omega^k_i - 0 |.

In my data structure, I observe the three relevant pieces of information portfolio, stock and weight. In my MWE, this looks like

  portfolio   stock  weight
0        P1  stock1     0.3
1        P1  stock2     0.6
2        P1  stock3     0.1
3        P2  stock2     0.1
4        P2  stock3     0.2
5        P2  stock4     0.3
6        P2  stock5     0.1
7        P2  stock8     0.3
8        P3  stock3     0.4
9        P3  stock4     0.6

I would like to produce

                         active_share
portfolio_i portfolio_j              
P1          P1                    0.0
            P2                    1.6
            P3                    1.8
P2          P1                    1.6
            P2                    0.0
            P3                    1.0
P3          P1                    1.8
            P2                    1.0
            P3                    0.0

Lets look at the calculation of pair (P1,P2) as an example: |0.3-0|+|0.6-0.1|+|0.1-0.2|+|0-0.3|+|0-0.1|+|0-0.3| = 1.6

My few lines of code are capable of producing the results. However, I am facing a very large dataset and need to increase the performance.

Do you see a way to optimize the computation? For example by making use of the fact that the relation is symmetric, i.e. the measure for (i,j) is the same as for (j,i)?

MWE:

# generate example data
data  = {
    'portfolio': ['P1','P1','P1','P2','P2','P2','P2','P2','P3','P3'], 
    'stock': ['stock1','stock2','stock3','stock2','stock3','stock4','stock5','stock8','stock3','stock4'], 
    'weight':[0.3, 0.6, 0.1, 0.1, 0.2, 0.3, 0.1, 0.3, 0.4, 0.6]}
df = pd.DataFrame(data)

# help dataframe
df['key'] = df.groupby('portfolio').ngroup()
portfolio_list = df['portfolio'].unique()
groups = df['key'].unique()
index = pd.MultiIndex.from_product([portfolio_list, groups], names=['portfolio','key'])
join = pd.DataFrame(index=index, columns=[]).reset_index()

# perform some sort of outer join
composite = df.merge(join, on='key', suffixes=('_i','_j'), how='inner')
composite = composite.loc[:,composite.columns!='key']
composite.rename(columns={'weight':'weight_i'}, inplace=True)

# identify whether counterparty has same holdings
composite = composite.merge(df, left_on=['portfolio_j','stock'], right_on=['portfolio','stock'], how='left', suffixes=('_i','_j'))
composite = composite.loc[:,~composite.columns.isin(['portfolio','key'])]
composite.rename(columns={'weight':'weight_j'}, inplace=True)

# compute the sum of absolute differences for overlapping portfolio
composite['abs_weight_difference'] = (composite['weight_i'] - composite['weight_j']).abs()
result = composite.groupby(['portfolio_i','portfolio_j'])['abs_weight_difference'].sum().to_frame('sum_overlap')

# compute the sum of weights of stocks that are in portfolio i but not in j
result['sum_unique_i'] = composite.loc[composite['weight_j'].isnull()].groupby(['portfolio_i','portfolio_j'])['weight_i'].sum()

# add sum of weights of stocks (1) in portfolio overlap, (2) distinct to portfolio i, (3) distinct to portolio j
result = result.reset_index()
result = result.merge(result, left_on=['portfolio_i','portfolio_j'], right_on=['portfolio_j','portfolio_i'])
result = result.loc[:,['portfolio_i_x','portfolio_j_x','sum_overlap_x','sum_unique_i_x','sum_unique_i_y']]
result.columns = ['sum_unique_j' if col=='sum_unique_i_y' else col[:-2] for col in result.columns]

result = result.fillna(0)
result.set_index(['portfolio_i','portfolio_j'], inplace=True)
result = result.sum(axis=1).to_frame(name='active_share')

Edit: Currently, it takes 5.82 minutes on a small subset of my dataset (as a future reference point)

like image 635
Jhonny Avatar asked Dec 02 '25 00:12

Jhonny


1 Answers

You can to pivot the data first, then use broadcast to compute your actual data:

a = df.pivot_table(index='portfolio', 
               columns='stock', 
               values='weight',
               fill_value=0)

idx = a.index
a = a.to_numpy()

pd.DataFrame(np.abs(a[:,None] - a[None,:]).sum(axis=-1),
             index=idx, columns=idx)

Output:

portfolio   P1   P2   P3
portfolio               
P1         0.0  1.6  1.8
P2         1.6  0.0  1.0
P3         1.8  1.0  0.0

From there, you can stack to get the expected output in the question:

pd.DataFrame(np.abs(a[:,None] - a[None,:]).sum(axis=-1),
             index=idx, columns=idx).stack()

gives:

portfolio  portfolio
P1         P1           0.0
           P2           1.6
           P3           1.8
P2         P1           1.6
           P2           0.0
           P3           1.0
P3         P1           1.8
           P2           1.0
           P3           0.0
dtype: float64

Update: since you have two many Stocks/Portfolios, you can choose to loop over Stocks, which will reduce memory requirement:

a = df.pivot_table(index='portfolio', 
           columns='stock', 
           values='weight',
           fill_value=0)

idx = a.index

ret = pd.DataFrame(0, index=idx, columns=idx)

for col in a:
    u = a[col].to_numpy()
    ret += np.abs(u-u[:,None])

Outcome:

portfolio   P1   P2   P3
portfolio               
P1         0.0  1.6  1.8
P2         1.6  0.0  1.0
P3         1.8  1.0  0.0
like image 191
Quang Hoang Avatar answered Dec 04 '25 14:12

Quang Hoang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!