Statistics and Data Analysis - New Features in Maple 2019 - Maplesoft

What's New in Maple 2019

Statistics and Data Analysis



Least Trimmed Squares Regression 

The LeastTrimmedSquares command computes least trimmed squares regression for some data. 

> with(Statistics); -1
 

In this example, we have 1000 data points. There is a single independent variable, x, with values uniformly distributed between 0 and 10. The dependent variable is a linear function of the independent variable plus some additive noise, y=5 x + 10 + noise, where the noise is from a probability distribution known to have severe outliers - the Cauchy distribution, with location parameter 0 and scale parameter 5. 

> x := Sample(Uniform(0, 10), 1000); -1
 

> noise := Sample(Cauchy(0, 1), 1000); -1
 

> y := `~`[`+`](`+`(`*`(5, `*`(x)), noise), 10); -1
 

Here we see all data points: 

> pp := PointPlot(y, xcoords = x, size = [.7,
 

Plot_2d
 

Linear least squares regression will be severely affected by the outliers. 

> ls_regression_result := Fit(`+`(`*`(X, `*`(a)), b), x, y, X); 1
 

Typesetting:-mprintslash([ls_regression_result := `+`(`*`(3.44970682383807, `*`(X)), 10.6816568681413)], [`+`(`*`(HFloat(3.449706823838068), `*`(X)), HFloat(10.681656868141332))])
 

> ls_deviation_from_model := `+`(`*`(`^`(`+`(coeff(ls_regression_result, X, 1), `-`(5)), 2)), `*`(`^`(`+`(coeff(ls_regression_result, X, 0), `-`(10)), 2))); 1
 

Typesetting:-mprintslash([ls_deviation_from_model := 2.86806501793850], [HFloat(2.8680650179385)])
 

Least trimmed squares regression gets much closer to the true line without noise. 

> lts_regression_result := LeastTrimmedSquares(x, y, [X]); 1
 

Typesetting:-mprintslash([lts_regression_result := `+`(`*`(5.03537530551575, `*`(X)), 9.82475419561272)], [`+`(`*`(HFloat(5.035375305515746), `*`(X)), HFloat(9.824754195612718))])
 

> lts_deviation_from_model := `+`(`*`(`^`(`+`(coeff(lts_regression_result, X, 1), `-`(5)), 2)), `*`(`^`(`+`(coeff(lts_regression_result, X, 0), `-`(10)), 2))); 1
 

Typesetting:-mprintslash([lts_deviation_from_model := 0.319625041956780e-1], [HFloat(0.031962504195678046)])
 

The result is even better if we include 900 out of the 1000 points, instead of the default of a little over 500. 

> lts_900_regression_result := LeastTrimmedSquares(x, y, [X], include = 900); 1
 

Typesetting:-mprintslash([lts_900_regression_result := `+`(`*`(5.00862730339998, `*`(X)), 10.0156318668695)], [`+`(`*`(HFloat(5.008627303399978), `*`(X)), HFloat(10.015631866869496))])
 

> lts_900_deviation_from_model := `+`(`*`(`^`(`+`(coeff(lts_900_regression_result, X, 1), `-`(5)), 2)), `*`(`^`(`+`(coeff(lts_900_regression_result, X, 0), `-`(10)), 2))); 1
 

Typesetting:-mprintslash([lts_900_deviation_from_model := 0.318785625780919e-3], [HFloat(3.1878562578091854e-4)])
 

The other robust regression method, implemented in the RepeatedMedianEstimator procedure, also gets a good result. 

> rme_regression_result := RepeatedMedianEstimator(x, y, X); 1
 

Typesetting:-mprintslash([rme_regression_result := `+`(10.0306661686300, `*`(5.00564125476873, `*`(X)))], [`+`(HFloat(10.030666168629978), `*`(HFloat(5.005641254768725), `*`(X)))])
 

> rme_deviation_from_model := `+`(`*`(`^`(`+`(coeff(rme_regression_result, X, 1), `-`(5)), 2)), `*`(`^`(`+`(coeff(rme_regression_result, X, 0), `-`(10)), 2))); 1
 

Typesetting:-mprintslash([rme_deviation_from_model := 0.972237653807886e-3], [HFloat(9.72237653807886e-4)])
 

In order to visualize these results, we show the same point plot as before, including the four regression lines. The three regression lines from robust methods cannot be distinguished, but the least squares method is clearly off. We zoom in on the vertical range that includes most points. 

> plots[display](pp, plot([ls_regression_result, lts_regression_result, lts_900_regression_result, rme_regression_result], X = 0 .. 10, legend = [
plots[display](pp, plot([ls_regression_result, lts_regression_result, lts_900_regression_result, rme_regression_result], X = 0 .. 10, legend = [
plots[display](pp, plot([ls_regression_result, lts_regression_result, lts_900_regression_result, rme_regression_result], X = 0 .. 10, legend = [
 

Plot_2d
 

 

Correlogram 

The Correlogram command computes autocorrelations of a data set and displays the result as a column plot with dashed lines indicating the lower and upper 95% confidence bands for the normal distribution N(0,1/L), where L is the size of the sample 'X', and a caption reporting how many of the displayed columns lie outside of the bands of plus or minus 2, 3, and 4 standard deviations respectively. AutoCorrelationPlot is an alias for the Correlogram command. 

> Correlogram(Import(
 

Plot_2d
 

Detrend 

The Detrend command removes any trend from a set of data. 

> restart; -1
 

> with(Statistics); -1
 

For example, specify some data: 

> data := Matrix([[0, 1.8], [1, .7], [2.5, 2.8], [4, 4.2], [6.2, 3]])
 

Typesetting:-mfenced(Typesetting:-mrow(Typesetting:-mtable(Typesetting:-mtr(Typesetting:-mtd(Typesetting:-mn(
 

Fit a linear model to the data: 

> lm := LinearFit(`+`(`*`(b, `*`(t)), a), data, t)
 

Typesetting:-mprintslash([lm := `+`(1.49598376946009, `*`(.366429281218947, `*`(t)))], [`+`(HFloat(1.4959837694600864), `*`(HFloat(0.36642928121894663), `*`(t)))])
 

It can be observed that from the plot of the data and the linear model that there is some upward trend. The Detrend command removes any trend from the data. 

> detrend_data := Detrend(data)
 

Typesetting:-mfenced(Typesetting:-mrow(Typesetting:-mtable(Typesetting:-mtr(Typesetting:-mtd(Typesetting:-mn(
 

This can be observed in the following plot: 

> plots:-display([ScatterPlot(data, color =
plots:-display([ScatterPlot(data, color =
plots:-display([ScatterPlot(data, color =
plots:-display([ScatterPlot(data, color =
plots:-display([ScatterPlot(data, color =
plots:-display([ScatterPlot(data, color =
 

Plot_2d
 

Detrend has also been added as an option to several routines in SignalProcessing including SignalPlot, Periodogram, and Spectrogram. 

Difference 

The Difference command computes lagged differences between elements in a data set. 

> with(Statistics); -1
 

Define some data: 

> x := `<,>`(seq(`*`(`^`(i, 2)), i = 1 .. 10))
 

Typesetting:-mfenced(Typesetting:-mrow(Typesetting:-mtable(Typesetting:-mtr(Typesetting:-mtd(Typesetting:-mn(
 

> Difference(x)
 

Typesetting:-mfenced(Typesetting:-mrow(Typesetting:-mtable(Typesetting:-mtr(Typesetting:-mtd(Typesetting:-mn(
 

More Updates 

Display of Data Structures 

The default display of data structures such as rtables (matrices, vectors, and arrays) as well as data frames and data series has been changed. Previously a summary would be shown for the data structure, but now the display shows the header of the data (by default the first 10 rows and 10 columns) and the size of the data structure. 

> Matrix(20,20,rand(1..10));
 

_rtable[18446883718659727478]
 

 

DataFrames and DataSeries 

Several commands have been updated to support DataFrames and DataSeries, including remove, select and selectremove. Other new commands such as Detrend and Difference also support DataFrames and DataSeries. 

Biplot 

The Biplot command has a new option, components, which specifies the principal components used in the biplot. 

DataSummary, FivePointSummary, FrequencyTable 

The DataSummary, FivePointSummary, and FrequencyTable commands have a new option, tableweights, which specifies the relative column widths in the displayed embedded table. 

 



Bereit für den nächsten Schritt?

Sprechen Sie mit unseren Produktspezialisten über eine kostenlose Demoversion von Maple

*Die Maple-Evaluation ist für Schüler und Studenten bzw. die private Nutzung zurzeit nicht verfügbar.