This is a demonstration of NLTK part of speech taggers
and NLTK chunkers using NLTK 3.9.1
.
These taggers can assign part-of-speech tags to each word in your text.
They can also identify certain phrases/chunks and named entities.
The default part of speech tagger is a classifier based tagger trained on the PENN Treebank corpus. The PENN Treebank corpus is composed of news articles from the reuters newswire. That means the tagger is more likely to be correct on text that looks like a news article, and less accurate on text that doesn't.
Similarly, spaCy taggers have been trained on written text such as news, blogs and media in the following languages:
All the other taggers have been trained on part-of-speech tagged
NLTK corpora
using train_tagger.py
from
nltk-trainer.
These NLTK taggers cover the following languages:
The default chunker is a classifier based chunker trained on the ACE corpus. This means it recognizes noun phrases and named entities, such as locations, names, organizations, and more. It will only work well with an English tagger, and will work best with the default tagger.
All other chunkers have been trained on chunked or
parsed NLTK corpora
using train_chunker.py
from
nltk-trainer.
These NLTK chunkers cover the following languages:
If you'd like to use this thru an API, please see the API docs for Tagging & Chunking and Phrase Extraction & Named Entity Recognition. And for higher limits and premium API access, signup for the Text-Processing RapidAPI.