Tagging, Chunking & Named Entity Recognition with NLTK

This is a demonstration of NLTK part of speech taggers and NLTK chunkers using NLTK 2.0.4. These taggers can assign part-of-speech tags to each word in your text. They can also identify certain phrases/chunks and named entities.

Tag and Chunk Text
  • Enter up to 50000 characters

How Part of Speech Tagging, Phrase Chunking, and NER Works

Trained Part of Speech Taggers

The default part of speech tagger is a classifier based tagger trained on the PENN Treebank corpus. The PENN Treebank corpus is composed of news articles from the reuters newswire. That means the tagger is more likely to be correct on text that looks like a news article, and less accurate on text that doesn't.

All the other taggers have been trained on part-of-speech tagged  NLTK corpora using train_tagger.py from nltk-trainer. These NLTK taggers cover the following languages:

  • Bangla
  • Catalan
  • Chinese
  • Dutch
  • English
  • Hindi
  • Marathi
  • Polish
  • Portuguese
  • Spanish
  • Telugu

Trained Phrase Chunkers and Named Entity Recognizers

The default chunker is a classifier based chunker trained on the ACE corpus. This means it recognizes noun phrases and named entities, such as locations, names, organizations, and more. It will only work well with an English tagger, and will work best with the default tagger.

All other chunkers have been trained on chunked or parsed NLTK corpora using train_chunker.py from nltk-trainer. These NLTK chunkers cover the following languages:

  • Dutch
  • English
  • Portuguese
  • Spanish

Natural Language Tagging and Phrase Extraction APIs

If you'd like to use this thru an API, please see the API docs for Tagging & Chunking and Phrase Extraction & Named Entity Recognition. And for higher limits and premium API access, signup for the Mashape Text-Processing API.

Natural Language Processing Services

  • Want to download/purchase any of these models?
  • Need a custom model, trained on a public or custom corpus?
  • Want help creating or bootstrapping a custom corpus?

If you answered yes to any of these questions, please fill out this Natural Language Processing Services Survey.


Real-time Web Analytics by Mixpanel  python powered  A Django project.  Powered by NLTK.
Python 3 Text Processing with NLTK 3 Cookbook

Natural Language Processing with Python

Bad Data Handbook