Calculation of Active Share Measure

Question

Suppose I observe the weights for each stock in a set of portfolios. I would like to compute the "active share" measure. This is defined as active_share_{ij} = active_share_{ji} = \sum^N_k | \omega^k_i - \omega^k_j | where N is the number of all stocks that exist in the market, \omega^k_i is the weight of stock k in portfolio i and the vertical bars indicate the absolute difference. Note that the sum runs over all stocks in the market: if, for example, portfolio i holds stock k while portfolio j does not, then weight \omega^k_j=0 and the absolute difference should evaluate to | \omega^k_i - 0 |.

In my data structure, I observe the three relevant pieces of information portfolio, stock and weight. In my MWE, this looks like

  portfolio   stock  weight
0        P1  stock1     0.3
1        P1  stock2     0.6
2        P1  stock3     0.1
3        P2  stock2     0.1
4        P2  stock3     0.2
5        P2  stock4     0.3
6        P2  stock5     0.1
7        P2  stock8     0.3
8        P3  stock3     0.4
9        P3  stock4     0.6

I would like to produce

                         active_share
portfolio_i portfolio_j              
P1          P1                    0.0
            P2                    1.6
            P3                    1.8
P2          P1                    1.6
            P2                    0.0
            P3                    1.0
P3          P1                    1.8
            P2                    1.0
            P3                    0.0

Lets look at the calculation of pair (P1,P2) as an example: |0.3-0|+|0.6-0.1|+|0.1-0.2|+|0-0.3|+|0-0.1|+|0-0.3| = 1.6

My few lines of code are capable of producing the results. However, I am facing a very large dataset and need to increase the performance.

Do you see a way to optimize the computation? For example by making use of the fact that the relation is symmetric, i.e. the measure for (i,j) is the same as for (j,i)?

MWE:

# generate example data
data  = {
    'portfolio': ['P1','P1','P1','P2','P2','P2','P2','P2','P3','P3'], 
    'stock': ['stock1','stock2','stock3','stock2','stock3','stock4','stock5','stock8','stock3','stock4'], 
    'weight':[0.3, 0.6, 0.1, 0.1, 0.2, 0.3, 0.1, 0.3, 0.4, 0.6]}
df = pd.DataFrame(data)

# help dataframe
df['key'] = df.groupby('portfolio').ngroup()
portfolio_list = df['portfolio'].unique()
groups = df['key'].unique()
index = pd.MultiIndex.from_product([portfolio_list, groups], names=['portfolio','key'])
join = pd.DataFrame(index=index, columns=[]).reset_index()

# perform some sort of outer join
composite = df.merge(join, on='key', suffixes=('_i','_j'), how='inner')
composite = composite.loc[:,composite.columns!='key']
composite.rename(columns={'weight':'weight_i'}, inplace=True)

# identify whether counterparty has same holdings
composite = composite.merge(df, left_on=['portfolio_j','stock'], right_on=['portfolio','stock'], how='left', suffixes=('_i','_j'))
composite = composite.loc[:,~composite.columns.isin(['portfolio','key'])]
composite.rename(columns={'weight':'weight_j'}, inplace=True)

# compute the sum of absolute differences for overlapping portfolio
composite['abs_weight_difference'] = (composite['weight_i'] - composite['weight_j']).abs()
result = composite.groupby(['portfolio_i','portfolio_j'])['abs_weight_difference'].sum().to_frame('sum_overlap')

# compute the sum of weights of stocks that are in portfolio i but not in j
result['sum_unique_i'] = composite.loc[composite['weight_j'].isnull()].groupby(['portfolio_i','portfolio_j'])['weight_i'].sum()

# add sum of weights of stocks (1) in portfolio overlap, (2) distinct to portfolio i, (3) distinct to portolio j
result = result.reset_index()
result = result.merge(result, left_on=['portfolio_i','portfolio_j'], right_on=['portfolio_j','portfolio_i'])
result = result.loc[:,['portfolio_i_x','portfolio_j_x','sum_overlap_x','sum_unique_i_x','sum_unique_i_y']]
result.columns = ['sum_unique_j' if col=='sum_unique_i_y' else col[:-2] for col in result.columns]

result = result.fillna(0)
result.set_index(['portfolio_i','portfolio_j'], inplace=True)
result = result.sum(axis=1).to_frame(name='active_share')

Edit: Currently, it takes 5.82 minutes on a small subset of my dataset (as a future reference point)

Quang Hoang · Accepted Answer

You can to pivot the data first, then use broadcast to compute your actual data:

a = df.pivot_table(index='portfolio', 
               columns='stock', 
               values='weight',
               fill_value=0)

idx = a.index
a = a.to_numpy()

pd.DataFrame(np.abs(a[:,None] - a[None,:]).sum(axis=-1),
             index=idx, columns=idx)

Output:

portfolio   P1   P2   P3
portfolio               
P1         0.0  1.6  1.8
P2         1.6  0.0  1.0
P3         1.8  1.0  0.0

From there, you can stack to get the expected output in the question:

pd.DataFrame(np.abs(a[:,None] - a[None,:]).sum(axis=-1),
             index=idx, columns=idx).stack()

gives:

portfolio  portfolio
P1         P1           0.0
           P2           1.6
           P3           1.8
P2         P1           1.6
           P2           0.0
           P3           1.0
P3         P1           1.8
           P2           1.0
           P3           0.0
dtype: float64

Update: since you have two many Stocks/Portfolios, you can choose to loop over Stocks, which will reduce memory requirement:

a = df.pivot_table(index='portfolio', 
           columns='stock', 
           values='weight',
           fill_value=0)

idx = a.index

ret = pd.DataFrame(0, index=idx, columns=idx)

for col in a:
    u = a[col].to_numpy()
    ret += np.abs(u-u[:,None])

Outcome:

portfolio   P1   P2   P3
portfolio               
P1         0.0  1.6  1.8
P2         1.6  0.0  1.0
P3         1.8  1.0  0.0

Calculation of Active Share Measure

Tags:

python

pandas

Jhonny

1 Answers

Quang Hoang

Recent Activity

Donate For Us

Calculation of Active Share Measure

Tags:

python

pandas

Jhonny

1 Answers

Quang Hoang

Related questions

Recent Activity

Donate For Us