Downsampling continuous variable to uniform distribution

Question

Given a distribution, let's say, a gaussian:

import pandas as pd
import numpy as np

gaussian_distribution = np.random.normal(0,1,10_000)

This sample looks like this:

enter image description here

What I want to do is to resample this distribution to somehow get a uniform distribution, so:

Pr(X) = Pr(X+W)

I am not worried with ending with n < 10_000, I just want to remove the distribution peak.

I read something about interpolating a distribution on this one, but I could not figure it out how this works.

Myrl Marmarelis · Accepted Answer

I am not sure why you would want to do this, or why it is important to keep the original samples as opposed to resampling a uniform distribution with boundaries corresponding to your histogram's. But here is an approach, as you requested: take a histogram of sufficient granularity and resample the points falling into each bin inverse-proportionally to the bin height. You would end up taking an equal number (roughly) of points from each bin interval.

x = np.random.randn(10_000)
counts, bins = np.histogram(x, bins=10)
subsampled = []
for i in range(len(bins)-1):
  if i == len(bins)-2:
    # last bin is inclusive on both sides
    section = x[(x>=bins[i]) & (x<=bins[i+1])]
  else:
    section = x[(x>=bins[i]) & (x<bins[i+1])]
  sub_section = np.random.choice(section, np.amin(counts), replace=False)
  subsampled.extend(sub_section)

A limitation of this quick & dirty solution is that the smallest bin gets to dictate the height of your resultant uniform distribution. As a consequence, fewer bins in your histogram will not make the subsampled points as uniform but will allow you to retain more of them. You could cut off the tails as well to remedy this.

Original: histogram of x

Subsampled: histogram of subsampled

Downsampling continuous variable to uniform distribution

Tags:

python

pandas

numpy

resampling

downsampling

Victor Maricato

1 Answers

Myrl Marmarelis

Recent Activity

Donate For Us

Downsampling continuous variable to uniform distribution

Tags:

python

pandas

numpy

resampling

downsampling

Victor Maricato

1 Answers

Myrl Marmarelis

Related questions

Recent Activity

Donate For Us