|
Calling Sequence
|
|
SimilarityScore( essay1, essay2 )
SimilarityScore( essays )
|
|
Parameters
|
|
essay1, essay2, essays
|
-
|
strings, or lists of strings: the essays to be compared
|
binarycount
|
-
|
(optional) truefalse
|
methods
|
-
|
(optional) list of procedures
|
tobaseline
|
-
|
(optional) truefalse
|
filter
|
-
|
(optional) procedure or literal name lemma, stem, or none
|
symmetric
|
-
|
(optional) truefalse
|
|
|
|
|
Description
|
|
•
|
The SimilarityScore command compares the word use in two or more essays, and returns a matrix of scores such that location [i,j] rates the similarity of essays i and essays j. A score of zero indicates there is no overlap between the two essays. A score of 1 indicates that there is complete overlap between the two essays (which does not necessarily mean the essays are identical).
|
•
|
If essay1 and essay2 are lists or arrays of essays, each essay in the first list will be compared to each essay in the second list. If essay2 is not given, every essay in the essay1 list will be compared to each other.
|
•
|
There are many different methods available for comparing the word use similarity score. The methods option can be used to select a custom procedure, or one or more of the builtin procedures: CosineCoefficient, JaccardCoefficient, DiceCoefficient, or their binary counterparts.
|
•
|
The binarycount option, when set to true, will force the word count vectors used to record which words are present in the compared essays to contain only 1's and 0's to indicate the presence or absence of words rather than a count of the number of occurrences.
|
•
|
The filter option specifies the definition of a "word". The default is filter=lemma which reduces the set of words based on their meaning. Other options are filter=stem which applies a stemming algorithm, and filter=none. Custom filters can also be applied, such as filter=StringTools:-LowerCase.
|
•
|
When comparing multiple essays the tobaseline option controls the list of words used in comparison. When tobaseline = true, the pool of words consists of all words from all essays. When tobaseline = false the pool of words consists of just the words found in the two essays being compared at each step.
|
•
|
The symmetric option can be set to true as an optimization if you know that f(essay[i],essay[j]) = f(essay[j],essay[i]) in all cases. Specifying this option will compute only the results where , and store the results twice.
|
•
|
This function is part of the EssayTools package, so it can be used in the short form SimilarityScore(..) only after executing the command with(EssayTools). However, it can always be accessed through the long form of the command by using EssayTools[SimilarityScore](..).
|
|
|
Examples
|
|
>
|
|
>
|
|
>
|
|
>
|
|
>
|
|
| (2) |
>
|
|
| (3) |
>
|
|
| (4) |
|
|
Compatibility
|
|
•
|
The EssayTools[SimilarityScore] command was introduced in Maple 17.
|
|
|
|