I confess that I am no expert in C++.
I am looking for a fast way to compute weighted median, which Boost seemed to have. But it seems I am not able to make it work.
#include <iostream>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/median.hpp>
#include <boost/accumulators/statistics/weighted_median.hpp>
using namespace boost::accumulators;    
int main()
{
  // Define an accumulator set
  accumulator_set<double, stats<tag::median > > acc1;
  accumulator_set<double, stats<tag::median >, float> acc2;
  // push in some data ...
  acc1(0.1);
  acc1(0.2);
  acc1(0.3);
  acc1(0.4);
  acc1(0.5);
  acc1(0.6);
  acc2(0.1, weight=0.);
  acc2(0.2, weight=0.);
  acc2(0.3, weight=0.);
  acc2(0.4, weight=1.);
  acc2(0.5, weight=1.);
  acc2(0.6, weight=1.);
  // Display the results ...
  std::cout << "         Median: " << median(acc1) << std::endl;
  std::cout << "Weighted Median: " << median(acc2) << std::endl;
  return 0;
}
produces the following output, which is clearly wrong.
         Median: 0.3
Weighted Median: 0.3
Am I doing something wrong? Any help will be greatly appreciated.
* however, the weighted sum works correctly *
@glowcoder: The weighted sum works perfectly fine like this.
#include <iostream>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/sum.hpp>
#include <boost/accumulators/statistics/weighted_sum.hpp>
using namespace boost::accumulators;
int main()
{
  // Define an accumulator set
  accumulator_set<double, stats<tag::sum > > acc1;
  accumulator_set<double, stats<tag::sum >, float> acc2;
  // accumulator_set<double, stats<tag::median >, float> acc2;
  // push in some data ...
  acc1(0.1);
  acc1(0.2);
  acc1(0.3);
  acc1(0.4);
  acc1(0.5);
  acc1(0.6);
  acc2(0.1, weight=0.);
  acc2(0.2, weight=0.);
  acc2(0.3, weight=0.);
  acc2(0.4, weight=1.);
  acc2(0.5, weight=1.);
  acc2(0.6, weight=1.);
  // Display the results ...
  std::cout << "         Median: " << sum(acc1) << std::endl;
  std::cout << "Weighted Median: " << sum(acc2) << std::endl;
  return 0;
}
and the result is
         Sum: 2.1
Weighted Sum: 1.5
The boost function is not broken.
The problem is that you do not provide enough data for the P^2 estimator to work. If you put a loop around your data input such as
for(int i=0;i<100000;i++){
  acc2(0.1, weight=0.);
  acc2(0.2, weight=0.);
  acc2(0.3, weight=0.);
  acc2(0.4, weight=1.);
  acc2(0.5, weight=1.);
  acc2(0.6, weight=1.);
}
you get the correct result of
Median: 0.3
Weighted Median: 0.5
alternatively, you can specify
 accumulator_set<double, 
    stats<tag::weighted_median(with_p_square_cumulative_distribution) >, 
    double> acc2 ( p_square_cumulative_distribution_num_cells = 5 );
which gives Weighted Median: 0.55 as an answer even with only 6 points added as in your question.
What is weighted median supposed to mean? A median considers only the order of the items, not the content. A weight doesn't change the order (it can change the mean or the sum though). If you used occurence counts (natural integers) instead of floats, you could extend the definition of the median, but I don't think that's what you're trying to do here.
What about:
accumulator_set<double, stats<tag::weighted_median(with_weighted_density) >, float> acc2;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With