Answers to Frequently Asked Questions¶
- Why would the sentiment analysis return incorrect results?
The sentiment analyzer is composed of 2 classifiers trained on movie reviews. If your text is not similar to movie reviews, then it’s less likely to make a correct guess. There’s also some quirks of the data, such as “the Bourne Bias” (thanks to stuartrobinson for coining this phrase), which highly weights the words “Matt Damon” towards the
pos
label. This is not yet an industrial strength / enterprise grade sentiment analysis tool, but I plan to improve it in the future. For more details on how it’s implemented, see the following articles:- What stemmer should I use?
For all languages other than
english
andarabic
, use thesnowball
stemmer. Except forportuguese
, which also supportsrslp
, you don’t have another choice. Forarabic
, you must use theisri
stemmer. Withenglish
, you have a couple options:- porter:
the default choice - it’s consistent, though can be too aggressive
- lancaster:
also a good choice, and is slightly less aggressive than
porter
- wordnet:
if you want lemmatization instead of stemming, choose
wordnet
If you’re still not sure, try out the demo with some test data to see which one you like more.
- Why can’t the chunker find any named entities?
NLTK’s default chunker is a chunker first, and named entity recognizer second. It was not designed for NER the way other services have been. The phrase extraction API uses other NER chunkers as well, but these have only been trained on small data sets. Think of the entities it finds as a bonus, not the main point. More accurate named entity extractors may be provided in the future.
- Can I do tagging/chunking in other languages?
The following languages support both tagging and chunking/NER:
Dutch
English
Portugueuse
Spanish
And the following languages only support tagging:
Bangla
Catalan
Chinese
Hindi
Marathi
Polish
Telugu
- How can I process more text than your limits allow?
You can use the Text-Processing RapidAPI. This allows you to signup for a higher limit plan that meets your needs.
- Can I train my own tagger/chunker/classifier?
Please see the train-classifier documentation <https://nltk-trainer.readthedocs.io/en/latest/train_classifier.html> for how to train your own classifiers with nltk-trainer <https://github.com/japerk/nltk-trainer>