This is a demonstration of stemming and
lemmatization for the 17 languages
supported by the
NLTK 2.0.4 stem
Stemming is a process of removing and replacing word suffixes to arrive at a common root form of the word.
For stemming English words with NLTK, you can choose between the PorterStemmer or the LancasterStemmer. The Porter Stemming Algorithm is the oldest stemming algorithm supported in NLTK, originally published in 1979. The Lancaster Stemming Algorithm is much newer, published in 1990, and can be more aggressive than the Porter stemming algorithm.
Stemming for Portuguese is available in NLTK with the RSLPStemmer and also with the SnowballStemmer. Arabic stemming is supported with the ISRIStemmer.
Snowball is actually a language for creating stemmers, and was added to NLTK version 2.0b9 as the SnowballStemmer class. The NLTK Snowball stemmer currently supports the following languages:
If you answered yes to any of these questions, please fill out this Natural Language Processing Services Survey.