Statistics - Maple Programming Help

Home : Support : Online Help : Statistics and Data Analysis : Statistics Package : Data Manipulation : Statistics/Scale

Statistics

 Scale
 center and/or scale a set of data

 Calling Sequence Scale( V ) Scale( V, options )

Parameters

 V - list, matrix, vector, DataFrame, DataSeries; numeric list of elements options - (optional) equation(s) of the form option=value

Options

 • center  : truefalse, numeric, procedure; controls if the returned list of values is centered or not. The default is set to true, which uses the Statistics[Mean] command to compute the center. If this is set to false, the list is not centered and the list of values will have the same central mean value as before. If a numeric value is entered, the list is centered using that value as its center. If a procedure is entered, the list is centered using the value returned from the procedure.  If a procedure is entered as the first value of a list, subsequent arguments in the list are passed to the procedure.
 • scale   : truefalse, numeric, procedure; controls if the returned list of values is scaled or not. The default is set to true, which uses the Statistics[StandardDeviation] command to compute the standard deviation. If this is set to false, the list is not scaled and has the same standard deviation as before. If a numeric value is entered, the list is scaled using that value as its standard deviation. If a procedure is entered, the list is scaled using the value returned from the procedure.  If a procedure is entered as the first value of a list, subsequent arguments in the list are passed to the procedure.
 • ignore  : truefalse; controls how missing data is handled. Missing items are represented by undefined or Float(undefined). If ignore=false and V contains missing data, the Scale command will return undefined. If ignore = true all missing items in V will be ignored. The default value is false. The ignore option is not passed to any procedures entered for either of the center or scale options.
 Description The Scale command is used to center and scale list of elements. If V is a matrix, the Scale command will scale each column of the matrix. The Scale command can also be used with arbitrary numeric values for the center and scaling factor. In addition, either center or scale can be turned off in order to just return a new scaled or centered set of values respectively.

Examples

 > $\mathrm{with}\left(\mathrm{Statistics}\right):$

An application for the Scale command is to compute standard scores (z-scores) for a list of values. Standardized scores are useful measures to determine how far a given element is from the sample mean. Standardized scores are computed by first subtracting the mean of the set of data from each element and then dividing the result by the standard deviation of the data set, resulting in a new set of values with mean 0 and standard deviation 1. Given a list of values corresponding to grades on a final examination with a highest possible grade of 100, we can determine how each of the grades rank in terms of the sample mean, and what percentage of scores are higher or lower than a particular grade.

 > $\mathrm{Grades}≔\left[34,55,61,75,80,91\right]$
 ${\mathrm{Grades}}{≔}\left[{34}{,}{55}{,}{61}{,}{75}{,}{80}{,}{91}\right]$ (1)

Say that we wanted to compare a grade of 75 with the other grades. Computing the standard scores using the Scale command, we see that:

 > $\mathrm{Scale}\left(\mathrm{Grades}\right)$
 $\left[\begin{array}{c}-1.5719549837837186\\ -0.5403595256756533\\ -0.2456179662162059\\ 0.4421123391891708\\ 0.6877303054053767\\ 1.2280898310810304\end{array}\right]$ (2)

A grade of 75 corresponds to a standardized score of 0.44, which means that the grade is approximately 1/2 standard deviation above the sample mean.

The percentage of grades that are lower than this standardized score can be found from a standard normal distribution probability table.

 > $\mathrm{Student}:-\mathrm{Statistics}:-\mathrm{ProbabilityTable}\left('\mathrm{Normal}',0.442112339189170811\right)$
 ${0.670796042104607}$ (3)

From the above, 67% of the grades are lower than a grade of 75, or conversely, a grade of 75 is in the top 33% percentile.

Standard scores can also be used to compare two sets of data from different distributions. Say that we wanted to again investigate how a score of 75 ranks for another set of data taken from another examination with a with a highest possible grade of 120:

 > $\mathrm{Grades2}≔\left[60,75,90,100,111,115\right]$
 ${\mathrm{Grades2}}{≔}\left[{60}{,}{75}{,}{90}{,}{100}{,}{111}{,}{115}\right]$ (4)

The two lists of grades can be put into a grade matrix:

 > $\mathrm{GradeMatrix}≔{\mathrm{Matrix}\left(\left[\mathrm{Grades},\mathrm{Grades2}\right]\right)}^{\mathrm{%T}}$
 $\left[\begin{array}{rr}34& 60\\ 55& 75\\ 61& 90\\ 75& 100\\ 80& 111\\ 91& 115\end{array}\right]$ (5)

Applying scale to a matrix results in two new columns of data both with mean 0 and standard deviation 1:

 > $\mathrm{Scale}\left(\mathrm{GradeMatrix}\right)$
 $\left[\begin{array}{cc}-1.5719549837837186& -1.4937382041810303\\ -0.5403595256756533& -0.7898825058758328\\ -0.2456179662162059& -0.08602680757063563\\ 0.4421123391891708& 0.3832103246328291\\ 0.6877303054053767& 0.8993711700566411\\ 1.2280898310810304& 1.0870660229380267\end{array}\right]$ (6)

In this case, the grade of 75 has a standard score of -0.79, which corresponds to a probability value of:

 > $\mathrm{Student}:-\mathrm{Statistics}:-\mathrm{ProbabilityTable}\left('\mathrm{Normal}',-0.789882505875832\right)$
 ${0.214798194477153}$ (7)

In the first set of grades, a score of 75 is in the top 33% of the grades. In the second, a score of 75 is in the bottom 21%.

The Scale command can also be used to just center data:

 > $\mathrm{Scale}\left(\mathrm{Grades},\mathrm{scale}=\mathrm{false}\right)$
 $\left[\begin{array}{c}-32.0\\ -11.0\\ -5.0\\ 9.0\\ 14.0\\ 25.0\end{array}\right]$ (8)

Or just scale the data by a numeric factor:

 > $\mathrm{Scale}\left(\mathrm{Grades},\mathrm{scale}=2\right)$
 $\left[\begin{array}{c}-16.0\\ -5.5\\ -2.5\\ 4.5\\ 7.0\\ 12.5\end{array}\right]$ (9)

The following plot illustrates the differences between applying various options to the Scale command. In order to show a greater contrast between transformed data, we will use the grades as decimal percentages:

 > $\mathrm{PercentGrades}≔\frac{\mathrm{Grades}}{100.0}$
 ${\mathrm{PercentGrades}}{≔}\left[{0.3400000000}{,}{0.5500000000}{,}{0.6100000000}{,}{0.7500000000}{,}{0.8000000000}{,}{0.9100000000}\right]$ (10)
 > $\mathrm{dataplot}\left(\mathrm{Matrix}\left(\left[\mathrm{Vector}\left(\mathrm{PercentGrades}\right),\mathrm{Scale}\left(\mathrm{PercentGrades},\mathrm{center}=\mathrm{false}\right),\mathrm{Scale}\left(\mathrm{PercentGrades}\right),\mathrm{Scale}\left(\mathrm{PercentGrades},\mathrm{scale}=\mathrm{false}\right)\right]\right),'\mathrm{axes}'=\mathrm{boxed},'\mathrm{legend}'=\left["Data","Scaled Data","Scaled and Centered Data","Centered Data"\right],'\mathrm{color}'=\left["Red","DarkBlue","RoyalBlue","Orange"\right],'\mathrm{axes}'=\mathrm{framed},'\mathrm{axis}\left[1\right]'=\left[\mathrm{color}=\mathrm{white}\right]\right)$

The red line shows the original data and the orange line shows the data minus the center value. The dark blue line shows the data divided by the scale value (the standard deviation) and the light blue line shows the data minus the center then divided by the scale value.

It is possible to use another procedure or command from Statistics to determine the center or scale.

 > $\mathrm{Scale}\left(\mathrm{Grades2},\mathrm{center}=\mathrm{Median},\mathrm{scale}=\mathrm{Variance}\right)$
 $\left[\begin{array}{c}-0.07706422018348622\\ -0.0440366972477064\\ -0.011009174311926606\\ 0.011009174311926606\\ 0.035229357798165134\\ 0.0440366972477064\end{array}\right]$ (11)

It is also possible to send arguments to the procedures used for the center or scale options by entering the procedure in a list:

 > $W≔\left[10,5,10,5,10,20\right]:$
 > $\mathrm{Scale}\left(\mathrm{Grades2},\mathrm{center}=\left[\mathrm{Mean},\mathrm{weights}=W\right]\right)$
 $\left[\begin{array}{c}-1.708805223107618\\ -1.0049495248024205\\ -0.3010938264972234\\ 0.16814330570624136\\ 0.6843041511300534\\ 0.8719990040114389\end{array}\right]$ (12)

If the collection of data has any undefined values, these can optionally be ignored using the ignore option:

 > $\mathrm{Grades3}≔\left[34,55,61,Float\left(\mathrm{undefined}\right),75,80,91\right]$
 ${\mathrm{Grades3}}{≔}\left[{34}{,}{55}{,}{61}{,}{Float}{}\left({\mathrm{undefined}}\right){,}{75}{,}{80}{,}{91}\right]$ (13)
 > $\mathrm{Scale}\left(\mathrm{Grades3},\mathrm{ignore}=\mathrm{true}\right)$
 $\left[\begin{array}{c}-1.5719549837837186\\ -0.5403595256756533\\ -0.2456179662162059\\ \mathrm{HFloat}{}\left(\mathrm{undefined}\right)\\ 0.4421123391891708\\ 0.6877303054053767\\ 1.2280898310810304\end{array}\right]$ (14)

If the center or scale options are called with a procedure, the ignore option is not passed to the procedure. To pass ignore to another procedure, it can be given in the list of options for the procedure:

 > $\mathrm{Scale}\left(\mathrm{Grades3},\mathrm{center}=\left[\mathrm{Mean},\mathrm{ignore}=\mathrm{true}\right],\mathrm{scale}=\left[\mathrm{Variance},\mathrm{ignore}=\mathrm{true}\right]\right)$
 $\left[\begin{array}{c}-0.0772200772200772\\ -0.026544401544401547\\ -0.01206563706563707\\ \mathrm{HFloat}{}\left(\mathrm{undefined}\right)\\ 0.02171814671814673\\ 0.0337837837837838\\ 0.060328185328185346\end{array}\right]$ (15)
 > 

Compatibility

 • The Statistics[Scale] command was introduced in Maple 2015.