RousseeuwCrouxQn - Maple Help

Statistics

 RousseeuwCrouxQn
 compute Rousseeuw and Croux' Qn

 Calling Sequence RousseeuwCrouxQn(A, ds_options) RousseeuwCrouxQn(X, rv_options)

Parameters

 A - X - algebraic; random variable or distribution ds_options - (optional) equation(s) of the form option=value where option is one of correction, ignore, or weights; specify options for computing Rousseeuw and Croux' Qn statistic of a data set rv_options - (optional) equation of the form numeric=value; specifies options for computing Rousseeuw and Croux' Qn statistic of a random variable

Description

 • The RousseeuwCrouxQn function computes a robust measure of the dispersion of the specified data set or random variable, as introduced by Rousseeuw and Croux in [2].
 • This statistic, referred to as ${Q}_{n}$ in the remainder of this help page, is defined for a sorted data set ${A}_{1}\le {A}_{2}\le \dots \le {A}_{n}$ as:

${Q}_{n}=\mathrm{OrderStatistic}\left(\left[\mathrm{seq}\left(\mathrm{seq}\left({A}_{i}-{A}_{j},i=j+1..n\right),j=1..n-1\right)\right],k\right)$

 where $k$ is $\left(\genfrac{}{}{0}{}{⌊\frac{n}{2}⌋+1}{2}\right)$.
 • ${Q}_{n}$ is a robust statistic: it has a high breakdown point (the proportion of arbitrarily large observations it can handle before giving an arbitrarily large result). The breakdown point of ${Q}_{n}$ is the maximum possible value, $\frac{1}{2}$.
 • ${Q}_{n}$ is a measure of dispersion, also called a measure of scale: if ${Q}_{n}\left(X\right)=a$, then for all real constants $\mathrm{\alpha }$ and $\mathrm{\beta }$, we have ${Q}_{n}\left(\mathrm{\alpha }X+\mathrm{\beta }\right)=\left|\mathrm{\alpha }\right|a$.
 • The first parameter can be a data set, a distribution (see Statistics[Distribution]), a random variable, or an algebraic expression involving random variables (see Statistics[RandomVariable]). For a data set $A$, RousseeuwCrouxQn computes ${Q}_{n}$ as defined above. For a distribution or random variable $X$, RousseeuwCrouxQn computes the asymptotic equivalent - the value that ${Q}_{n}$ converges to for ever larger samples of $X$.

Computation

 • By default, all computations involving random variables are performed symbolically (see option numeric below).
 • All computations involving data are performed in floating-point; therefore, all data provided must have type/realcons and all returned solutions are floating-point, even if the problem is specified with exact values.
 • For more information about computation in the Statistics package, see the Statistics[Computation] help page.

Data Set Options

 • The ds_options argument can contain one or more of the options shown below. More information for some options is available in the Statistics[DescriptiveStatistics] help page.
 • ignore=truefalse -- This option controls how missing data is handled by the RousseeuwCrouxQn command. Missing items are represented by undefined or Float(undefined). So, if ignore=false and A contains missing data, the RousseeuwCrouxQn command may return undefined. If ignore=true all missing items in A will be ignored. The default value is false.
 • weights=Vector -- Data weights. The number of elements in the weights array must be equal to the number of elements in the original data sample. By default all elements in A are assigned weight $1$.
 • correction=samplesize or correction=none -- In [2], Rousseeuw and Croux define a correction factor ${c}_{n}$ for finite sample size as:

${d}_{n}=\left\{\begin{array}{cc}0.399& n=2\\ 0.994& n=3\\ 0.512& n=4\\ 0.844& n=5\\ 0.611& n=6\\ 0.857& n=7\\ 0.669& n=8\\ 0.872& n=9\\ \frac{n}{n+1.4}& n>9\mathbf{and}n\colon\colon \mathrm{odd}\\ \frac{n}{n+3.8}& n>9\mathbf{and}n\colon\colon \mathrm{even}\end{array}$

 If the option correction = samplesize is given, then this correction factor is applied before the result is returned. The default is correction = none, that is, no correction factor is applied.

Random Variable Options

 The rv_options argument can contain one or more of the options shown below. More information for some options is available in the Statistics[RandomVariables] help page.
 • numeric=truefalse -- By default, ${Q}_{n}$ is computed using exact arithmetic. To compute ${Q}_{n}$ numerically, specify the numeric or numeric = true option.

Examples

 > $\mathrm{with}\left(\mathrm{Statistics}\right):$

Compute ${Q}_{n}$ for a data sample.

 > $s≔⟨1,5,2,2,7,4,1,6,9⟩$
 ${s}{≔}\left[\begin{array}{c}{1}\\ {5}\\ {2}\\ {2}\\ {7}\\ {4}\\ {1}\\ {6}\\ {9}\end{array}\right]$ (1)
 > $\mathrm{RousseeuwCrouxQn}\left(s\right)$
 ${2.}$ (2)

Employ Rousseeuw and Croux's finite sample size correction.

 > $\mathrm{RousseeuwCrouxQn}\left(s,'\mathrm{correction}=\mathrm{samplesize}'\right)$
 ${1.74400000000000}$ (3)

Let's replace four of the values with very large values.

 > $t≔\mathrm{copy}\left(s\right):$
 > $t\left[1..4\right]≔{10}^{100}:$
 > $t$
 $\left[\begin{array}{c}{10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000}\\ {10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000}\\ {10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000}\\ {10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000}\\ {7}\\ {4}\\ {1}\\ {6}\\ {9}\end{array}\right]$ (4)
 > $\mathrm{RousseeuwCrouxQn}\left(t\right)$
 ${3.}$ (5)

The value of ${Q}_{n}$ stays bounded, because it has a high breakdown point.

Compute ${Q}_{n}$ for a normal distribution.

 > $\mathrm{RousseeuwCrouxQn}\left('\mathrm{Normal}'\left(3,5\right),'\mathrm{numeric}'\right)$
 ${2.25312055012086}$ (6)

The symbolic result is an expression involving the inverse (see RootOf) of the error function (see erf). It evaluates to the same floating-point number.

 > $\mathrm{RousseeuwCrouxQn}\left('\mathrm{Normal}'\left(3,5\right)\right)$
 ${10}{}{\mathrm{RootOf}}{}\left({4}{}{\mathrm{erf}}{}\left({\mathrm{_Z}}\right){-}{1}\right)$ (7)
 > $\mathrm{evalf}\left(\right)$
 ${2.253120550}$ (8)

Generate a random sample of size 1000000 from the same distribution and compute the sample's ${Q}_{n}$.

 > $A≔\mathrm{Sample}\left('\mathrm{Normal}'\left(3,5\right),1000000\right):$
 > $\mathrm{RousseeuwCrouxQn}\left(A\right)$
 ${2.25266620862896}$ (9)

Consider the following Matrix data set.

 > $M≔\mathrm{Matrix}\left(\left[\left[3,1130,114694\right],\left[4,1527,127368\right],\left[3,907,88464\right],\left[2,878,96484\right],\left[4,995,128007\right]\right]\right)$
 ${M}{≔}\left[\begin{array}{ccc}{3}& {1130}& {114694}\\ {4}& {1527}& {127368}\\ {3}& {907}& {88464}\\ {2}& {878}& {96484}\\ {4}& {995}& {128007}\end{array}\right]$ (10)

We compute ${Q}_{n}$ for each of the columns.

 > $\mathrm{RousseeuwCrouxQn}\left(M\right)$
 $\left[\begin{array}{ccc}{1.}& {117.}& {12674.}\end{array}\right]$ (11)

References

 [1] Stuart, Alan, and Ord, Keith. Kendall's Advanced Theory of Statistics. 6th ed. London: Edward Arnold, 1998. Vol. 1: Distribution Theory.
 [2] Rousseeuw, Peter J., and Croux, Christophe. Alternatives to the Median Absolute Deviation. Journal of the American Statistical Association 88(424), 1993, pp.1273-1283.

Compatibility

 • The Statistics[RousseeuwCrouxQn] command was introduced in Maple 18.
 • For more information on Maple 18 changes, see Updates in Maple 18.