Part-of-Speech Tagging and Chunking¶

To tag & chunk text, do an HTTP POST to http://text-processing.com/api/tag/ with form encoded data containg the text you want to tag. You’ll get back a JSON object response whose text attribute contains the tagged text. Here’s some examples of how to do it using curl:

$ curl -d "text=hello world" http://text-processing.com/api/tag/
{
        "text": "(S hello/NN world/NN)"
}

IOB tags output:

$ curl -d "text=hello world&output=iob" http://text-processing.com/api/tag/
{
        "text": "hello NN O\nworld NN O"
}

A named entity recognition example:

$ curl -d "text=California is nice" http://text-processing.com/api/tag/
{
        "text": "(S (GPE California/NNP) is/VBZ nice/JJ)"
}

You can try out the tagging and chunking demo to get a feel for the results, but it does not show all the output formats available in the API.

Parameters¶

text:

Required - the text you want to tag. It must not exceed 2,000 characters.

language:

Optional, defaults to english, which also uses phrase chunker. There are 3 other languages other than english that support phrase chunking and/or named entity recognition:

dutch
portuguese
spanish

For these 4 languages, the default output is sexpr. There are other language options that do only part-of-speech tagging, and their default output is tagged:

french
german
greek
italian

tagger:

Optional, if you give a language value then it must be compatible as outlined below:

default english tagger/chunker
binary english tagger/chunker
ieer english tagger/chunker
timit english tagger/chunker
conll2002_ned dutch tagger/chunker
conll2002_esp spanish tagger/chunker
mac_morpho portuguese tagger/chunker
spacy/en_core_web_sm spacy english tagger
spacy/de_core_news_sm spacy german tagger
spacy/fr_core_news_sm spacy french tagger
spacy/es_core_news_sm spacy spanish tagger
spacy/pt_core_news_sm spacy portuguese tagger
spacy/it_core_news_sm spacy italian tagger
spacy/nl_core_news_sm spacy dutch tagger
spacy/el_core_news_sm spacy greek tagger

output:

Optional, the default depends on your language choice. The tagged (and chunked) text can be returned in one of the following output formats:

tagged produces part-of-speech tagged text, ignoring any phrase chunks or named entities
sexpr produces s-expressions to represent parse trees that may include sub-trees for phrases and/or named entities
iob produces IOB tags for each word, but only works if the language supports phrase chunking

Return Value¶

On success, a 200 OK response will be returned containing a JSON object that looks like this:

{
        "text": "tagged text"
}

Errors¶

A 400 Bad Request response will be returned under the following conditions:

output=iob but language does not support phrase chunking or NER
no value for text is provided
text exceeds 2,000 characters
an incorrect language is specified

A 503 Throttled response will be returned if you exceed the daily request limit. Signup for the Text-Processing RapidAPI to get a higher limit plan.