Part-of-Speech Tagging and Chunking

To tag & chunk text, do an HTTP POST to http://text-processing.com/api/tag/ with form encoded data containg the text you want to tag. You’ll get back a JSON object response whose text attribute contains the tagged text. Here’s some examples of how to do it using curl:

$ curl -d "text=hello world" http://text-processing.com/api/tag/
{
        "text": "(S hello/NN world/NN)"
}

IOB tags output:

$ curl -d "text=hello world&output=iob" http://text-processing.com/api/tag/
{
        "text": "hello NN O\nworld NN O"
}

A named entity recognition example:

$ curl -d "text=California is nice" http://text-processing.com/api/tag/
{
        "text": "(S (GPE California/NNP) is/VBZ nice/JJ)"
}

You can try out the tagging and chunking demo to get a feel for the results, but it does not show all the output formats available in the API.

Parameters

text:

Required - the text you want to tag. It must not exceed 2,000 characters.

language:

Optional, defaults to english, which also uses phrase chunker. There are 3 other languages other than english that support phrase chunking and/or named entity recognition:

  • dutch

  • portuguese

  • spanish

For these 4 languages, the default output is sexpr. There are other language options that do only part-of-speech tagging, and their default output is tagged:

  • french

  • german

  • greek

  • italian

tagger:

Optional, if you give a language value then it must be compatible as outlined below:

  • default english tagger/chunker

  • binary english tagger/chunker

  • ieer english tagger/chunker

  • timit english tagger/chunker

  • conll2002_ned dutch tagger/chunker

  • conll2002_esp spanish tagger/chunker

  • mac_morpho portuguese tagger/chunker

  • spacy/en_core_web_sm spacy english tagger

  • spacy/de_core_news_sm spacy german tagger

  • spacy/fr_core_news_sm spacy french tagger

  • spacy/es_core_news_sm spacy spanish tagger

  • spacy/pt_core_news_sm spacy portuguese tagger

  • spacy/it_core_news_sm spacy italian tagger

  • spacy/nl_core_news_sm spacy dutch tagger

  • spacy/el_core_news_sm spacy greek tagger

output:

Optional, the default depends on your language choice. The tagged (and chunked) text can be returned in one of the following output formats:

  • tagged produces part-of-speech tagged text, ignoring any phrase chunks or named entities

  • sexpr produces s-expressions to represent parse trees that may include sub-trees for phrases and/or named entities

  • iob produces IOB tags for each word, but only works if the language supports phrase chunking

Return Value

On success, a 200 OK response will be returned containing a JSON object that looks like this:

{
        "text": "tagged text"
}

Errors

A 400 Bad Request response will be returned under the following conditions:

  • output=iob but language does not support phrase chunking or NER

  • no value for text is provided

  • text exceeds 2,000 characters

  • an incorrect language is specified

A 503 Throttled response will be returned if you exceed the daily request limit. Signup for the Text-Processing RapidAPI to get a higher limit plan.