Python: weighted median algorithm with pandas

Tags:

I have a dataframe that looks like this:

Out[14]:
    impwealth  indweight
16     180000     34.200
21     384000     37.800
26     342000     39.715
30    1154000     44.375
31     421300     44.375
32    1210000     45.295
33    1062500     45.295
34    1878000     46.653
35     876000     46.653
36     925000     53.476

I want to calculate the weighted median of the column impwealth using the frequency weights in indweight. My pseudo code looks like this:

# Sort `impwealth` in ascending order 
df.sort('impwealth', 'inplace'=True)

# Find the 50th percentile weight, P
P = df['indweight'].sum() * (.5)

# Search for the first occurrence of `impweight` that is greater than P 
i = df.loc[df['indweight'] > P, 'indweight'].last_valid_index()

# The value of `impwealth` associated with this index will be the weighted median
w_median = df.ix[i, 'impwealth']

This method seems clunky, and I'm not sure it's correct. I didn't find a built in way to do this in pandas reference. What is the best way to go about finding weighted median?

600

asked Sep 29 '14 14:09

3 Answers

If you want to do this in pure pandas, here's a way. It does not interpolate either. (@svenkatesh, you were missing the cumulative sum in your pseudocode)

df.sort_values('impwealth', inplace=True)
cumsum = df.indweight.cumsum()
cutoff = df.indweight.sum() / 2.0
median = df.impwealth[cumsum >= cutoff].iloc[0]

This gives a median of 925000.

answered Oct 22 '22 05:10

Have you tried the wquantiles package? I had never used it before, but it has a weighted median function that seems to give at least a reasonable answer (you'll probably want to double check that it's using the approach you expect).

In [12]: import weighted

In [13]: weighted.median(df['impwealth'], df['indweight'])
Out[13]: 914662.0859091772

answered Oct 22 '22 05:10

chrisb

You can use this solution to Weighted percentile using numpy:

def weighted_quantile(values, quantiles, sample_weight=None, 
                      values_sorted=False, old_style=False):
    """ Very close to numpy.percentile, but supports weights.
    NOTE: quantiles should be in [0, 1]!
    :param values: numpy.array with data
    :param quantiles: array-like with many quantiles needed
    :param sample_weight: array-like of the same length as `array`
    :param values_sorted: bool, if True, then will avoid sorting of
        initial array
    :param old_style: if True, will correct output to be consistent
        with numpy.percentile.
    :return: numpy.array with computed quantiles.
    """
    values = np.array(values)
    quantiles = np.array(quantiles)
    if sample_weight is None:
        sample_weight = np.ones(len(values))
    sample_weight = np.array(sample_weight)
    assert np.all(quantiles >= 0) and np.all(quantiles <= 1), \
        'quantiles should be in [0, 1]'

    if not values_sorted:
        sorter = np.argsort(values)
        values = values[sorter]
        sample_weight = sample_weight[sorter]

    weighted_quantiles = np.cumsum(sample_weight) - 0.5 * sample_weight
    if old_style:
        # To be convenient with numpy.percentile
        weighted_quantiles -= weighted_quantiles[0]
        weighted_quantiles /= weighted_quantiles[-1]
    else:
        weighted_quantiles /= np.sum(sample_weight)
    return np.interp(quantiles, weighted_quantiles, values)

Call as weighted_quantile(df.impwealth, quantiles=0.5, df.indweight).

answered Oct 22 '22 05:10

Max Ghenis

Related questions
                            
                                How can I use conda to install pydot?
                            
                                How do I check if my SSL Certificate is SHA1 or SHA2 on the commandline
                            
                                Do int ref parameter get boxed?
                            
                                Error: zipping up workbook failed when trying to write.xlsx
                            
                                "Gradle files have changed since last project sync." message always shows up
                            
                                Any way to stream a map like "(k,v)" instead of working with (entry)?
                            
                                How to compare the working copy, staging copy and committed copy of a file using git
                            
                                jQuery setTimeout() Function [duplicate]
                            
                                Sort Swift Array of Dictionaries by Value of a Key
                            
                                Mongoose - Get list of _ids instead of array of objects with _id
                            
                                Swift: AVPlayer - How to get length of mp3 file from URL?
                            
                                TextInputLayout error after enter value into edittext

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: weighted median algorithm with pandas

Tags:

python

algorithm

pandas

svenkatesh

People also ask

3 Answers

prooffreader

chrisb

Max Ghenis

Recent Activity

Donate For Us