An application for the Scale command is to compute standard scores (z-scores) for a list of values. Standardized scores are useful measures to determine how far a given element is from the sample mean. Standardized scores are computed by first subtracting the mean of the set of data from each element and then dividing the result by the standard deviation of the data set, resulting in a new set of values with mean 0 and standard deviation 1. Given a list of values corresponding to grades on a final examination with a highest possible grade of 100, we can determine how each of the grades rank in terms of the sample mean, and what percentage of scores are higher or lower than a particular grade.
>
|
|
| (1) |
Say that we wanted to compare a grade of 75 with the other grades. Computing the standard scores using the Scale command, we see that:
A grade of 75 corresponds to a standardized score of 0.44, which means that the grade is approximately 1/2 standard deviation above the sample mean.
The percentage of grades that are lower than this standardized score can be found from a standard normal distribution probability table.
>
|
|
From the above, 67% of the grades are lower than a grade of 75, or conversely, a grade of 75 is in the top 33% percentile.
Standard scores can also be used to compare two sets of data from different distributions. Say that we wanted to again investigate how a score of 75 ranks for another set of data taken from another examination with a with a highest possible grade of 120:
>
|
|
| (4) |
The two lists of grades can be put into a grade matrix:
>
|
|
Applying scale to a matrix results in two new columns of data both with mean 0 and standard deviation 1:
In this case, the grade of 75 has a standard score of -0.79, which corresponds to a probability value of:
>
|
|
In the first set of grades, a score of 75 is in the top 33% of the grades. In the second, a score of 75 is in the bottom 21%.
The Scale command can also be used to just center data:
>
|
|
Or just scale the data by a numeric factor:
>
|
|
The following plot illustrates the differences between applying various options to the Scale command. In order to show a greater contrast between transformed data, we will use the grades as decimal percentages:
>
|
|
| (10) |
>
|
|
The red line shows the original data and the orange line shows the data minus the center value. The dark blue line shows the data divided by the scale value (the standard deviation) and the light blue line shows the data minus the center then divided by the scale value.
It is possible to use another procedure or command from Statistics to determine the center or scale.
>
|
|
It is also possible to send arguments to the procedures used for the center or scale options by entering the procedure in a list:
>
|
|
>
|
|
If the collection of data has any undefined values, these can optionally be ignored using the ignore option:
>
|
|
| (13) |
>
|
|
If the center or scale options are called with a procedure, the ignore option is not passed to the procedure. To pass ignore to another procedure, it can be given in the list of options for the procedure:
>
|
|