I'm looking for a way to get the kernel density function of a data set and plot it for arbitrary data points. Using Scipy stats module, I came up with the following code:
import numpy as np
import scipy.stats as st
def get_pdf(data):
    a = np.array(data)
    ag = st.gaussian_kde(a)
    x = np.linspace(0, max(data), max(data)*10)
    y = ag(x)
    return x, y
This gives the expected result, but the performance is very poor, when the data set size is large.
I found fastkde as an implementation for fast kernel density estimation. But I could not figure out a way to use this in the same way I used Scipy stats KDE. 
Can someone give me some insight?
Thanks
You may be looking for something like this:
from fastkde.fastKDE import pdf
def get_pdf(data):
    y, x = pdf(data)
    return x, y
Note that, in general, fastKDE.pdf() returns pdf, axes (the PDF and the axes of the PDF, analogous to hist, bins for a histogram).
If there are multiple input variables, the axes variable is a list of the axes, with each axis corresponding to an input variable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With