The Statistics package provides algorithms for computing, plotting and sampling from kernel density estimates. A kernel density estimate is a continuous probability distribution used to approximate the population of a sample, constructed by considering a normalized sum of kernel functions for each data point.
The following is an example of Maple's kernel density estimation routines in action.
>
|
|
Consider the following bimodal data sample (hypothesized as bimodal since there appear to be two distinct clusterings of data - those in the range -1.2 to -0.8 and those in the range 0.7 to 0.9).
>
|
|
By applying kernel density estimation, we can create a function to interpolate the data. Since our data sample is relatively small, we can perform exact kernel density estimation. The exact method of kernel density estimation returns a probability density function which can then be evaluated at specific points.
>
|
|
| (2.1) |
We can convert the kernel density estimate to a distribution using one of the standard RandomVariable constructors.
>
|
|
>
|
|
| (2.2) |
>
|
|
| (2.3) |
This probability density function can also be plotted, in this case against the cumulative distribution function.
With the KernelDensitySample function, similar data can be quickly drawn from a data sample.
A kernel density estimate can be directly plotted using the KernelDensityPlot function. The following example demonstrates the difference between different choices of bandwidth.
In most cases, only a few hundred samples are needed to roughly approximate the original probability distribution with a kernel density estimate.
Available Kernels
Kernel density estimation requires the use of a kernel function - a normalized continuous function that is mapped to each data point. Five standard kernel functions are available with kernel density estimation.
|
2.1 Gaussian Kernel
|
|
The Gaussian kernel should be used with continuous data that is defined on the whole real line. It possesses the familiar bell shape and is based on the Gaussian probability density function.
>
|
|
|
|
2.2 Triangular Kernel
|
|
The triangular kernel is a piecewise function related to the triangular distribution. This kernel generally creates a kernel density estimate with sharp edges, although remaining relatively smooth.
>
|
|
|
|
2.3 Rectangular Kernel
|
|
The rectangular kernel is a piecewise function related to the uniform distribution. This kernel creates a kernel density estimate that resembles a staircase function.
>
|
|
|
|
2.4 Biweight Kernel
|
|
The biweight kernel is a smooth kernel that is defined on a finite interval, unlike the gaussian kernel. It should be used for bounded data that is smooth along the interval it is defined upon.
>
|
|
|
|
2.5 Epanechnikov Kernel
|
|
The Epanechnikov kernel is the standard kernel for kernel density estimation. It generally provides the closest match to a probability density function under most circumstances. The kernel itself is a rounded function similar to the biweight, except it is not differentiable at its boundaries.
>
|
|
|