This is a demonstration of stemming and
lemmatization for the 18 languages
supported by the NLTK 3.9.1
stem
package.
Stemming is a process of removing and replacing word suffixes to arrive at a common root form of the word.
For stemming English words with NLTK, you can choose between the PorterStemmer or the LancasterStemmer. The Porter Stemming Algorithm is the oldest stemming algorithm supported in NLTK, originally published in 1979. The Lancaster Stemming Algorithm is much newer, published in 1990, and can be more aggressive than the Porter stemming algorithm.
The WordNet Lemmatizer uses the WordNet Database to lookup lemmas. Lemmas differ from stems in that a lemma is a canonical form of the word, while a stem may not be a real word.
Stemming for Portuguese is available in NLTK with the RSLPStemmer and also with the SnowballStemmer. Arabic stemming is supported with the ISRIStemmer.
Snowball is actually a language for creating stemmers, and was added to NLTK version 2.0b9 as the SnowballStemmer class. The NLTK Snowball stemmer currently supports the following languages:
If you'd like to use this thru an API, please see the Stemming API Docs. And for higher limits and premium API access, signup for the Text-Processing RapidAPI.