Table Of Contents

Phrase Extraction & Named Entity Recognition

To extract phrases & named entities from text, do an HTTP POST to http://text-processing.com/api/phrases/ with form encoded data containg the text you want to analyze. You’ll get back a JSON object containing lists of phrases and/or named entities.

Here’s an example of how to do it using curl:

$ curl -d "text=California is nice" http://text-processing.com/api/phrases/
{
        "NP": ["California"],
        "GPE": ["California"],
        "VP": ["is"],
        "LOCATION": ["California"]
}

You can try out the tagging and chunking demo to get a feel for the results and the kinds of phrases that can be extracted. English phrase extraction combines the results from 4 different phrase & named entity chunkers: the default named entity chunker, a treebank trained noun phrase chunker, a conll2000 trained phrase chunker, and an ieer trained named entity chunker.

An example for the Spanish language is:

$ curl -d "language=spanish&text=San Francisco, California" http://text-processing.com/api/phrases/
{
        "LOC": ["San Francisco"]
}

Parameters

text:

Required - the text you want to process. It must not exceed 1,000 characters.

language:

Optional, defaults to english. There are 3 other language choices for phrase extraction:

  • dutch uses a tagger & named entity chunker trained on the conll2002 corpus
  • portuguese uses a tagger & phrase chunker trained on the floresta corpus
  • spanish uses a tagger & named entity chunker trained on the conll2002 corpus

Return Value

On success, a 200 OK response will be returned containing a JSON object that looks like this:

{
        "NP": ["noun phrase"]
}

The object will have a key for each type of phrase or named entity, and the value for that key will be a list of strings, one for each phrase of that type. If no phrases or named entities could be extracted, you’ll get an empty object.

Errors

A 400 Bad Request response will be returned under the following conditions:

  • no value for text is provided
  • text exceeds 1,000 characters
  • an incorrect language is specified

A 503 Throttled response will be returned if you exceed the daily request limit. Signup for the Mashape Text-Processing API to get a higher limit plan.