I am trying to implement a kernel density estimation. However my code does not provide the answer it should. It is also written in julia but the code should be self explanatory.
Here is the algorithm:

where
 
 
So the algorithm tests whether the distance between x and an observation X_i weighted by some constant factor (the binwidth) is less then one. If so, it assigns 0.5 / (n * h) to that value, where n = #of observations.
Here is my implementation:
#Kernel density function.
#Purpose: estimate the probability density function (pdf)
#of given observations
#@param data: observations for which the pdf should be estimated
#@return: returns an array with the estimated densities 
function kernelDensity(data)
|   
|   #Uniform kernel function. 
|   #@param x: Current x value
|   #@param X_i: x value of observation i
|   #@param width: binwidth
|   #@return: Returns 1 if the absolute distance from
|   #x(current) to x(observation) weighted by the binwidth
|   #is less then 1. Else it returns 0.
|  
|   function uniformKernel(x, observation, width)
|   |   u = ( x - observation ) / width
|   |   abs ( u ) <= 1 ? 1 : 0
|   end
|
|   #number of observations in the data set 
|   n = length(data)
|
|   #binwidth (set arbitraily to 0.1
|   h = 0.1 
|   
|   #vector that stored the pdf
|   res = zeros( Real, n )
|   
|   #counter variable for the loop 
|   counter = 0
|
|   #lower and upper limit of the x axis
|   start = floor(minimum(data))
|   stop = ceil (maximum(data))
|
|   #main loop
|   #@linspace: divides the space from start to stop in n
|   #equally spaced intervalls
|   for x in linspace(start, stop, n) 
|   |   counter += 1
|   |   for observation in data
|   |   |
|   |   |   #count all observations for which the kernel
|   |   |   #returns 1 and mult by 0.5 because the
|   |   |   #kernel computed the absolute difference which can be
|   |   |   #either positive or negative
|   |   |   res[counter] += 0.5 * uniformKernel(x, observation, h)
|   |   end
|   |   #devide by n times h
|   |   res[counter] /= n * h
|   end
|   #return results
|   res
end
#run function
#@rand: generates 10 uniform random numbers between 0 and 1
kernelDensity(rand(10))
and this is being returned:
> 0.0
> 1.5
> 2.5
> 1.0
> 1.5
> 1.0
> 0.0
> 0.5
> 0.5
> 0.0
the sum of which is: 8.5 (The cumulative distibution function. Should be 1.)
So there are two bugs:
For example:
> kernelDensity(rand(1000))
> 953.53 
I believe that I implemented the formula 1:1, hence I really don't understand where the error is.
I'm not an expert on KDEs, so take all of this with a grain of salt, but a very similar (but much faster!) implementation of your code would be:
function kernelDensity{T<:AbstractFloat}(data::Vector{T}, h::T)
  res = similar(data)
  lb = minimum(data); ub = maximum(data)
  for (i,x) in enumerate(linspace(lb, ub, size(data,1)))
    for obs in data
      res[i] += abs((obs-x)/h) <= 1. ? 0.5 : 0.
    end
    res[i] /= (n*h)
 end
 sum(res)
end
If I'm not mistaken, the density estimate should integrate to 1, that is we would expect kernelDensity(rand(100), 0.1)/100 to get at least close to 1. In the implementation above I'm getting there, give or take 5%, but then again we don't know that 0.1 is the optimal bandwith (using h=0.135 instead I'm getting there to within 0.1%), and the uniform Kernel is known to only be about 93% "efficient".
In any case, there's a very good Kernel Density package in Julia available here, so you probably should just do Pkg.add("KernelDensity") instead of trying to code your own Epanechnikov kernel :)
To point out the mistake: You have n bins B_i of size 2h covering [0,1], a random point X lands in expected number of  bins. You divide by 2 n h.
 bins. You divide by 2 n h.
For n points, the expected value of your function is   .
. 
Actually, you have some bins of size < 2h. (for example if start = 0, half of first the bin is outside of [0,1]), factoring this in gives the bias.
Edit: Btw, the bias is easy to calculate if you assume that the bins have random locations in [0,1]. Then the bins are on average missing h/2 = 5% of their size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With