Table Of Contents

Part-of-Speech Tagging and Chunking

To tag & chunk text, do an HTTP POST to http://text-processing.com/api/tag/ with form encoded data containg the text you want to tag. You’ll get back a JSON object response whose text attribute contains the tagged text. Here’s some examples of how to do it using curl:

$ curl -d "text=hello world" http://text-processing.com/api/tag/
{
        "text": "(S hello/NN world/NN)"
}

IOB tags output:

$ curl -d "text=hello world&output=iob" http://text-processing.com/api/tag/
{
        "text": "hello NN O\nworld NN O"
}

A named entity recognition example:

$ curl -d "text=California is nice" http://text-processing.com/api/tag/
{
        "text": "(S (GPE California/NNP) is/VBZ nice/JJ)"
}

You can try out the tagging and chunking demo to get a feel for the results, but it does not show all the output formats available in the API.

Parameters

text:

Required - the text you want to tag. It must not exceed 2,000 characters.

language:

Optional, defaults to english, which also uses phrase chunker. There are 3 other languages other than english that support phrase chunking and/or named entity recognition:

  • dutch
  • portuguese
  • spanish

For these 4 languages, the default output is sexpr. There are 6 other language options that do only part-of-speech tagging, and their default output is tagged:

  • bangla
  • catalan
  • chinese
  • hindi
  • marathi
  • polish
  • telugu
output:

Optional, the default depends on your language choice. The tagged (and chunked) text can be returned in one of the following output formats:

  • tagged produces part-of-speech tagged text, ignoring any phrase chunks or named entities
  • sexpr produces s-expressions to represent parse trees that may include sub-trees for phrases and/or named entities
  • iob produces IOB tags for each word, but only works if the language supports phrase chunking

Return Value

On success, a 200 OK response will be returned containing a JSON object that looks like this:

{
        "text": "tagged text"
}

Errors

A 400 Bad Request response will be returned under the following conditions:

  • output=iob but language does not support phrase chunking or NER
  • no value for text is provided
  • text exceeds 2,000 characters
  • an incorrect language is specified

A 503 Throttled response will be returned if you exceed the daily request limit. Signup for the Mashape Text-Processing API to get a higher limit plan.