Table Of Contents

Answers to Frequently Asked Questions

Why would the sentiment analysis return incorrect results?

The sentiment analyzer is composed of 2 classifiers trained on movie reviews. If your text is not similar to movie reviews, then it’s less likely to make a correct guess. There’s also some quirks of the data, such as “the Bourne Bias” (thanks to stuartrobinson for coining this phrase), which highly weights the words “Matt Damon” towards the pos label. This is not yet an industrial strength / enterprise grade sentiment analysis tool, but I plan to improve it in the future. For more details on how it’s implemented, see the following articles:

What stemmer should I use?

For all languages other than english and arabic, use the snowball stemmer. Except for portuguese, which also supports rslp, you don’t have another choice. For arabic, you must use the isri stemmer. With english, you have a couple options:

porter:the default choice - it’s consistent, though can be too aggressive
lancaster:also a good choice, and is slightly less aggressive than porter
wordnet:if you want lemmatization instead of stemming, choose wordnet

If you’re still not sure, try out the demo with some test data to see which one you like more.

Why can’t the chunker find any named entities?
NLTK’s default chunker is a chunker first, and named entity recognizer second. It was not designed for NER the way other services have been. The phrase extraction API uses other NER chunkers as well, but these have only been trained on small data sets. Think of the entities it finds as a bonus, not the main point. More accurate named entity extractors may be provided in the future.
Can I do tagging/chunking in other languages?

The following languages support both tagging and chunking/NER:

  • Dutch
  • English
  • Portugueuse
  • Spanish

And the following languages only support tagging:

  • Bangla
  • Catalan
  • Chinese
  • Hindi
  • Marathi
  • Polish
  • Telugu
How can I process more text than your limits allow?
You can use the Mashape Text-Processing API. This allows you to signup for a higher limit plan that meets your needs.
Can I train my own tagger/chunker/classifier?
Please fill out this survey to let me know what your needs are.