StringTools[Stem] - attempt to derive an English word stem using Porter's algorithm
|
Calling Sequence
|
|
Stem( word )
|
|
Parameters
|
|
word
|
-
|
Maple string; English word
|
|
|
|
|
Description
|
|
•
|
The Stem(word) command attempts to process the English language word word, using (currently) Porter's algorithm. See publication reference at the bottom of this help page.
|
•
|
Porter's algorithm processes words by removing recognizable suffixes of a word until the stem of the word is left. The algorithm does not guarantee that a minimal word stem will be produced, and does not remove word prefixes. The algorithm is suitable only for English language words.
|
•
|
The maximum length of the input string word is 127 characters (which is sufficient for any English language word).
|
•
|
One typical use of a stemmer for a spelling checker is as follows. A word list or dictionary, is prepared by applying the stemmer to each of its members, resulting in a list of stemmed dictionary words. This reduces the size of the dictionary that must be searched, since a typical word list may have multiple words with the same word stem. An input word is checked for validity by comparing its word stem against the dictionary of stemmed words. The input word is considered valid if its word stem belongs to the stemmed dictionary.
|
•
|
A second typical use of a stemmer is in forming document collection indices for searching. For example, an index of a Web site can be prepared by stemming all the significant words that occur in the titles of the Web pages on that site. A search query consisting of a number of words is processed with the stemmer, and a list of the documents that are indexed by the stemmed search terms is returned.
|
|
|
Examples
|
|
>
|
|
>
|
|
| (1) |
>
|
|
| (2) |
>
|
|
| (3) |
>
|
|
| (4) |
>
|
|
| (5) |
>
|
|
| (6) |
>
|
|
| (7) |
>
|
|
| (8) |
|
|
References
|
|
|
Porter, M. F. "An algorithm for suffix stripping." Program, Vol. 14 No. 3. (July 1980): 130-137.
|
|
|
Download Help Document
Was this information helpful?