Repeated Median Estimator

All Products Maple MapleSim

Home : Support : Online Help : Statistics and Data Analysis : Statistics Package : Quantities : Repeated Median Estimator

Statistics

RepeatedMedianEstimator

compute the repeated median estimator

	Calling Sequence
	RepeatedMedianEstimator(X, Y, v) RepeatedMedianEstimator(XY, v)

Parameters

X	-	values of independent variable(s)
Y	-	values of dependent variable
XY	-	values of independent and dependent variables
v	-	algebraic expression in which to express the result

Description

•	The RepeatedMedianEstimator function computes a robust linear estimator from a collection of points in the plane. It is a quantity first described by Siegel [2], using an algorithm described by Matoušek, Mount, and Netanyahu [3].

•	The repeated median estimator is a robust estimator. This means that it will continue to perform well if some points are replaced by outliers. Least-squares linear regression, the type of regression most commonly used and implemented by LinearFit and NonlinearFit, is very susceptible to outliers.

•

Conceptually, one obtains the repeated median estimator by computing, for each data point $P$ , the slopes of the lines connecting $P$ to every other point, and then taking the median of those slopes. Then, one takes the median of these median slopes as $P$ ranges over all data points. This gives the slope $s$ for the repeated median estimator.

•	Then one computes for all points $P$ the intercept that would have the line with slope $s$ going through $P$ . The intercept for the repeated median estimator is the median of these intercepts.

•	The repeated median estimator has a breakdown point of $\frac{1}{2}$ . This means that when less than half of all data points in a sample are changed, the value of the repeated median estimator at a given value can change only a limited amount.

•

In the first calling sequence, the parameter X is a Vector (or a k-by-1 Matrix, or a list) containing the values of the independent variables. The ith element is the independent value for the ith data point. The parameter Y is a Vector containing the k values of the dependent variable, in the same way. Alternatively, these values can be specified in a single k-by-2 Matrix, XY, by using the second calling sequence.

•	The returned value is an expression representing the estimator evaluated at the value v. By supplying a variable name here, you obtain the general expression for the estimator. If you supply a number, you obtain the value of the estimator at that number.

Notes

•

The underlying computation is done in floating-point; therefore, all data points must have type realcons and all returned solutions are floating-point, even if the problem is specified with exact values. For more information about numeric computation in the Statistics package, see the Statistics/Computation help page.

Examples

>	$with (Statistics) &colon;$

Suppose we have a set of points in the plane where the X-coordinate is generated uniformly from the interval $−1 .. 1$ , and the Y-coordinate is given by $y = 2.4 x + 0.9 + noise$ , with the noise coming from the $Cauchy (0, 1)$ distribution. This typically generates serious outliers.

>	$N ≔ 1000$

$N ≔ 1000$

(1)

>	$xvalues ≔ Sample (Uniform (- 1, 1), N)$

$xvalues ≔ [0.629447372786358, 0.811583874151238, −0.746026367412988, 0.826751712278039, 0.264718492450819, −0.804919190001181, −0.443003562265903, 0.0937630384099677, 0.915013670868595, 0.929777070398553, −0.684773836644903, 0.941185563521231, 0.914333896485891, −0.0292487025543176, 0.600560937777600, −0.716227322745569, −0.156477434747450, 0.831471050378134, 0.584414659119109, 0.918984852785806, 0.311481398313174, −0.928576642851621, 0.698258611737554, 0.867986495515101, 0.357470309715547, 0.515480261156667, 0.486264936249832, −0.215545960931664, 0.310955780355113, −0.657626624376876, 0.412092176039218, −0.936334307245159, −0.446154030078220, −0.907657218737692, −0.805736437528305, 0.646915656654585, 0.389657245951634, −0.365801039878279, 0.900444097676710, −0.931107838994182, −0.122511280687204, −0.236883085813983, 0.531033576298005, 0.590399802274126, −0.626254790891243, −0.0204712084235379, −0.108827598578201, 0.292626020222529, 0.418729661716145, 0.509373363964722, −0.447949846002843, 0.359405353707350, 0.310196007947681, −0.674776529610739, −0.762004636883247, −0.00327189603571409, 0.919487917032162, −0.319228546667734, 0.170535501959555, −0.552376121017726, 0.502534118611306, −0.489809769081462, 0.0119141033302848, 0.398153445313372, 0.781806505071597, 0.918582850410889, 0.0944310599276061, −0.722751114342642, −0.701411988881885, −0.484983491752527, 0.681434511967325, −0.491435642056938, 0.628569652137633, −0.512950062550021, 0.858527246374456, −0.300032468030383, −0.606809499137584, −0.497832284047938, 0.232089352293278, −0.0534223021945415, −0.296680985874006, 0.661657255792582, 0.170528182305449, 0.0994472165822791, 0.834387327659620, −0.428321962359253, 0.514400458221443, 0.507458188556991, −0.239108306049287, 0.135643281450442, −0.848291420873873, −0.892099762666786, 0.0615951060179454, 0.558334460204022, 0.868021368458366, −0.740187583052540, 0.137647321744385, −0.0612187178835883, −0.976195860997517, −0.325754711202237, \dots, ⋯ 900 row vector entries not shown]$

(2)

>	$yvalues ≔ `~` [`+`] (2.4 xvalues + Sample (Cauchy (0, 1), N), ` $`, 0.9)$

$yvalues ≔ [2.84954997993141, 2.35633770347317, 169.700364990167, 1.70319491115426, 2.05652936778591, −0.688078112485780, −0.534515203769601, −0.973747778770400, −10.1214458204039, 2.87737179493891, −2.28115957977184, 4.02000749189061, 2.66133636002989, 2.67854077159860, 3.24986044739278, −0.591075832043268, −1.09274794766349, 10.5087585734168, 1.39150700023547, 7.29927510971766, 0.463522909357217, −1.74973443280146, −1.00675127325916, 3.45597933364955, 0.182719241875987, −5.04834993420510, 2.91406140899646, −0.138960510707466, 2.20060977190904, −1.06175860365872, 2.31354957084230, −16.8203030707879, 3.32936772527244, 0.108084306187146, −0.0563119765609136, 3.96119389519444, 1.44969045186962, 0.409683690047485, 3.30380404684536, −1.23955552620817, −0.250576193804985, −0.680497267950874, 2.02071089818608, 1.16225770374808, 0.819825932642122, 25.7547703380145, −10.3424920784841, 1.71530371365245, −1.69617976503718, 3.52313810302508, 32.5292433415646, −2.99094601420271, 6.92006195688593, −19.3557331296477, −0.274840414047646, 2.13449165206531, 3.21489743294267, 2.80829827439304, 4.38052776134977, −0.00639879526670317, −0.0684036563797316, −1.50566482868770, −0.630045973402213, −5.90853952174534, −0.109857754936273, 3.48918116809614, 6.42622128016991, −1.32848226363211, −1.07296463093892, 21.6244740472985, 8.43774498857405, 0.342704628480285, 32.5016911886728, 0.784390662579479, 2.39506348984933, 0.741516798405378, −1.59711747092574, −1.04568896741367, 2.09468088899425, 0.859875307677620, −0.0983191566760852, 2.82344078120888, 2.31584846114760, 1.40841277884102, 3.06739610388102, 0.141893352024681, 2.17187393569548, −1.69150270801205, 1.15390670556561, 118.536666807695, −1.62947152991569, 10.2410660710996, 0.522214580110322, 4.94520795949801, 2.83933446201815, −1.15649277613884, −0.000317166380492728, −1.66221739254356, −2.12985661566211, 0.981153459844459, \dots, ⋯ 900 row vector entries not shown]$

(3)

>	$points ≔ PointPlot (yvalues,'xcoords' = xvalues,'color' ='green') &colon;$ $points$

We shrink the view a little, so that we can see more of what's going on.

>	$points ≔ plots :- display (points,'view' = [- 1 .. 1, - 3 .. 5]) &colon;$ $points$

Now we would like to find the model $y = 2.4 x + 0.9$ from the data. Using standard (least-squares) linear regression, we get an unsatisfactory fit:

>	$leastsquares ≔ Fit (a x + b, xvalues, yvalues, x)$

$leastsquares ≔ - 5.35146588080964 x - 6.16980901266835$

(4)

>	$plots :- display (points, plot (leastsquares, x = - 1 .. 1,'thickness' = 3))$

The repeated median estimator, however, deals well with the errors.

>	$repeatedmedian ≔ RepeatedMedianEstimator (xvalues, yvalues, x)$

$repeatedmedian ≔ 0.958872442473601 + 2.42820627384362 x$

(5)

>	$plots :- display (points, plot ([leastsquares, repeatedmedian], x = - 1 .. 1,'thickness' = 3,'legend' = [least squares, repeated median]))$

In the following example, we have one outlier in the data.

>	$xydata ≔ [[0, 11], [1, 0], [2, 8], [3, 9], [4, 8], [5, 4], [6, 4], [7, 3], [8, 4], [9, 0], [10, - 1]]$

$xydata ≔ [[0, 11], [1, 0], [2, 8], [3, 9], [4, 8], [5, 4], [6, 4], [7, 3], [8, 4], [9, 0], [10, −1]]$

(6)

>	$points ≔ PointPlot (xydata [.., 2],'xcoords' = xydata [.., 1],'color' ='green')$

Once again, we compare the least squares and repeated median estimators.

>	$leastsquares ≔ Fit (a x + b, xydata, x)$

$leastsquares ≔ - 0.799999999999999 x + 8.54545454545454$

(7)

>	$repeatedmedian ≔ RepeatedMedianEstimator (xydata, x)$

$repeatedmedian ≔ 10. - 1. x$

(8)

>	$plots :- display (points, plot ([leastsquares, repeatedmedian], x = 0 .. 10,'legend' = [least squares, repeated median]))$

In this case, the difference is less dramatic than in the first example. It is clear, though, that the outlier has very little influence on the repeated median estimator, and some influence on the least squares fit.

References

[1] Stuart, Alan, and Ord, Keith. Kendall's Advanced Theory of Statistics. 6th ed. London: Edward Arnold, 1998. Vol. 1: Distribution Theory.

[2] Siegel, Andrew F. Robust Regression Using Repeated Medians. Biometrika 69 (1), 1982, pp.242-244.

[3] Matoušek, Jiří, Mount, David, and Netanyahu, Nathan. Efficient Randomized Algorithms for the Repeated Median Line Estimator. Algorithmica 20 (2), 1998, pp.136-150.

Compatibility

•	The Statistics[RepeatedMedianEstimator] command was introduced in Maple 2015.

•	For more information on Maple 2015 changes, see Updates in Maple 2015.

Maple

Erweiterungen für Maple

Math Success Platform

Verbesserung der Studienerfolgsquote

Maple Flow

MapleSim

Beratende Dienstleistungen

Produkte zur Online-Ausbildung

Ausbildung

Industrie

Automobil und Luftfahrt

Robotik

Maschinendesign & industrielle Automation

Andere

Lösungen für die Industrie

Kaufen: Einzelheiten und Preise

Kaufen

Institutionelle Studentenlizenzierung

Elite-Wartungsprogramm

Support

Produktschulung

Online Hilfe zu Produkten

Webinare und Veranstaltungen

Veröffentlichungen

Content Hubs

Anwendungsbeispiele

Community

Über Maplesoft

Pressezentrum

User Community

Kontakt

Online Help

All Products Maple MapleSim

Maple

Leistungsfähige intuitive Mathematiksoftware

Erweiterungen für Maple

Math Success Platform

Verbesserung der Studienerfolgsquote

Maple Flow

Engineering calculations & documentation

MapleSim

Modellierung auf Systemebene

Beratende Dienstleistungen

Produkte zur Online-Ausbildung

Ausbildung

Industrie

Automobil und Luftfahrt

Robotik

Maschinendesign & industrielle Automation

Andere

Lösungen für die Industrie

Kaufen: Einzelheiten und Preise

Kaufen

Institutionelle Studentenlizenzierung

Elite-Wartungsprogramm

Support

Produktschulung

Online Hilfe zu Produkten

Webinare und Veranstaltungen

Veröffentlichungen

Content Hubs

Anwendungsbeispiele

Community

Über Maplesoft

Pressezentrum

User Community

Kontakt

Online Help

All Products Maple MapleSim