Given a distribution, let's say, a gaussian:
import pandas as pd
import numpy as np
gaussian_distribution = np.random.normal(0,1,10_000)
This sample looks like this:

What I want to do is to resample this distribution to somehow get a uniform distribution, so:
Pr(X) = Pr(X+W)
I am not worried with ending with n < 10_000, I just want to remove the distribution peak.
I read something about interpolating a distribution on this one, but I could not figure it out how this works.
I am not sure why you would want to do this, or why it is important to keep the original samples as opposed to resampling a uniform distribution with boundaries corresponding to your histogram's. But here is an approach, as you requested: take a histogram of sufficient granularity and resample the points falling into each bin inverse-proportionally to the bin height. You would end up taking an equal number (roughly) of points from each bin interval.
x = np.random.randn(10_000)
counts, bins = np.histogram(x, bins=10)
subsampled = []
for i in range(len(bins)-1):
if i == len(bins)-2:
# last bin is inclusive on both sides
section = x[(x>=bins[i]) & (x<=bins[i+1])]
else:
section = x[(x>=bins[i]) & (x<bins[i+1])]
sub_section = np.random.choice(section, np.amin(counts), replace=False)
subsampled.extend(sub_section)
A limitation of this quick & dirty solution is that the smallest bin gets to dictate the height of your resultant uniform distribution. As a consequence, fewer bins in your histogram will not make the subsampled points as uniform but will allow you to retain more of them. You could cut off the tails as well to remedy this.
Original:

Subsampled:

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With