Statistics[KernelDensity] - estimate the probability density function of a data set
|
Calling Sequence
|
|
KernelDensity(data, options)
|
|
Parameters
|
|
data
|
-
|
rtable; data sample
|
options
|
-
|
(optional) equation(s) of the form option=value where option is one of kernel, bandwidth, bins, left, right, ignore, weights, method or output; specify options for the KernelDensity function
|
|
|
|
|
Description
|
|
•
|
The KernelDensity function attempts to perform kernel density estimation on a data set in order to develop an approximation to the probability density function that the data could have been drawn from.
|
•
|
The first parameter data is the data set, which must be a one-dimensional rtable (e.g. Array or Vector).
|
|
|
Options
|
|
|
The options argument can contain one or more of the options shown below.
|
•
|
kernel='gaussian', biweight, epanechnikov, triangular, rectangular -- By default this is 'gaussian'. This option allows a non-gaussian kernel to be used in developing the estimate. The gaussian and biweight kernels are smooth - if they are used no sharp edges will be present in the final estimate. The epanechnikov kernel is often considered to be the optimal kernel. The triangular and rectangular kernels are sharp kernels - they will impose sharp edges in the final estimate.
|
•
|
bandwidth=realcons -- The bandwidth is a positive quantity that specifies the width of the kernel (the amount each data point affects distant portions of the probability density estimate). Each kernel is scaled such that the bandwidth is equal to the standard deviation of the kernel.
|
•
|
bins=posint -- The number of bins in which to categorize data points (512 by default). This value must be a power of 2 and is equal to the size of the array returned from the routine when the option method='discrete' is specified. This parameter is ignored if method='exact'.
|
•
|
left=realcons -- This option specifies the lower boundary on valid data values. Any data values that are smaller than this value are discarded. By default the algorithm attempts to determine the minimum value in the data set and further includes padding of width 3*bandwidth.
|
•
|
right=realcons -- This option specifies the upper boundary on valid data values. Any data values that are smaller than this value are discarded. By default the algorithm attempts to determine the maximum value in the data set and further includes padding of width 3*bandwidth.
|
•
|
ignore=truefalse -- This option is used to specify how to handle non-numeric data. If ignore is set to true all non-numeric items in data will be ignored.
|
•
|
weights=rtable -- Vector of weights (one-dimensional rtable). If weights are given, the KernelDensity function will return scale each data point to have given weight. Note that the weights provided must have type,realcons and the results are floating-point, even if the problem is specified with exact values. Both the data array and the weights array must have the same number of elements.
|
•
|
method='discrete', exact, piecewise or none -- This option specifies the method of performing the kernel density estimate (by default this is 'discrete'). If this value is 'discrete', the procedure will return an array of size bins that contains the value of the kernel density estimate at equally-spaced points along the range. If this value is 'exact', the function will return a procedure which can be called to return the value of the kernel density estimate at any real point. If this value is 'piecewise', the function will return a piecewise procedure that linearly interpolates the array returned by 'discrete'. Lastly, if this value is 'none', no calculation will be performed and instead the procedure will return a range indicating the interval over which the computation is done.
|
•
|
output='value' or solutionmodule -- This option specifies the requested form of the output. If 'value' is specified then kernel density will simply return the result of the computation. If 'solutionmodule' is specified then it will return the result within a module that also contains other resources relevant to the calculation.
|
•
|
eval='false' or realcons -- This option, if specified, indicates that kernel density estimation should immediately attempt to evaluate the estimate function at the specified point and return the result.
|
|
|
Notes
|
|
•
|
Kernel density estimation works by considering the location of each data point and replacing that data point with a kernel function which has an area of one. The kernels are then individually summed up over all data points and normalized so that the estimate is a probability density function.
|
•
|
Note that the discrete case of kernel density estimation employs a discrete fourier transform in order to calculate the kernel sums as quickly as possible. However, this results in an estimating function which is periodic over the given range. Hence estimates near the lower and upper boundaries of the range will often be more imprecise than points within the range.
|
•
|
Note that points that do not fall into the range left..right and missing data are not considered for this operation and are normally discarded. If ignore=false and this procedure encounters missing data, it will spawn a userinfo message.
|
|
|
Examples
|
|
>
|
|
>
|
|
| (1) |
>
|
|
| (2) |
>
|
|
| (3) |
>
|
|
>
|
|
| (4) |
>
|
|
| (5) |
>
|
|
>
|
|
| (6) |
>
|
|
| (7) |
>
|
|
>
|
|
| (8) |
|
|
Download Help Document
Was this information helpful?