Part-of-Speech Tagging and Chunking¶
To tag & chunk text, do an HTTP POST to http://text-processing.com/api/tag/
with form encoded data containg the text
you want to tag. You’ll get back a JSON object response whose text
attribute contains the tagged text. Here’s some examples of how to do it using curl:
$ curl -d "text=hello world" http://text-processing.com/api/tag/
{
"text": "(S hello/NN world/NN)"
}
IOB tags output:
$ curl -d "text=hello world&output=iob" http://text-processing.com/api/tag/
{
"text": "hello NN O\nworld NN O"
}
A named entity recognition example:
$ curl -d "text=California is nice" http://text-processing.com/api/tag/
{
"text": "(S (GPE California/NNP) is/VBZ nice/JJ)"
}
You can try out the tagging and chunking demo to get a feel for the results, but it does not show all the output formats available in the API.
Parameters¶
- text:
Required - the text you want to tag. It must not exceed 2,000 characters.
- language:
Optional, defaults to
english
, which also uses phrase chunker. There are 3 other languages other thanenglish
that support phrase chunking and/or named entity recognition:dutch
portuguese
spanish
For these 4 languages, the default
output
issexpr
. There are other language options that do only part-of-speech tagging, and their defaultoutput
istagged
:french
german
greek
italian
- tagger:
Optional, if you give a
language
value then it must be compatible as outlined below:default
english tagger/chunkerbinary
english tagger/chunkerieer
english tagger/chunkertimit
english tagger/chunkerconll2002_ned
dutch tagger/chunkerconll2002_esp
spanish tagger/chunkermac_morpho
portuguese tagger/chunkerspacy/en_core_web_sm
spacy english taggerspacy/de_core_news_sm
spacy german taggerspacy/fr_core_news_sm
spacy french taggerspacy/es_core_news_sm
spacy spanish taggerspacy/pt_core_news_sm
spacy portuguese taggerspacy/it_core_news_sm
spacy italian taggerspacy/nl_core_news_sm
spacy dutch taggerspacy/el_core_news_sm
spacy greek tagger
- output:
Optional, the default depends on your language choice. The tagged (and chunked) text can be returned in one of the following output formats:
tagged
produces part-of-speech tagged text, ignoring any phrase chunks or named entitiessexpr
produces s-expressions to represent parse trees that may include sub-trees for phrases and/or named entitiesiob
produces IOB tags for each word, but only works if the language supports phrase chunking
Return Value¶
On success, a 200 OK response will be returned containing a JSON object that looks like this:
{
"text": "tagged text"
}
Errors¶
A 400 Bad Request response will be returned under the following conditions:
output=iob
butlanguage
does not support phrase chunking or NERno value for
text
is providedtext
exceeds 2,000 charactersan incorrect
language
is specified
A 503 Throttled response will be returned if you exceed the daily request limit. Signup for the Text-Processing RapidAPI to get a higher limit plan.