The sentiment analyzer is composed of 2 classifiers trained on movie reviews. If your text is not similar to movie reviews, then it’s less likely to make a correct guess. There’s also some quirks of the data, such as “the Bourne Bias” (thanks to stuartrobinson for coining this phrase), which highly weights the words “Matt Damon” towards the pos label. This is not yet an industrial strength / enterprise grade sentiment analysis tool, but I plan to improve it in the future. For more details on how it’s implemented, see the following articles:
For all languages other than english and arabic, use the snowball stemmer. Except for portuguese, which also supports rslp, you don’t have another choice. For arabic, you must use the isri stemmer. With english, you have a couple options:
porter: | the default choice - it’s consistent, though can be too aggressive |
---|---|
lancaster: | also a good choice, and is slightly less aggressive than porter |
wordnet: | if you want lemmatization instead of stemming, choose wordnet |
If you’re still not sure, try out the demo with some test data to see which one you like more.
The following languages support both tagging and chunking/NER:
And the following languages only support tagging: