Table Of Contents

Stemming

To stem text, do an HTTP POST to http://text-processing.com/api/stem/ with form encoded data containg the text you want to stem. You’ll get back a JSON object response whose text attribute contains the stemmed text. Here’s some examples of how to do it using curl:

$ curl -d "text=processing" http://text-processing.com/api/stem/
{
        "text": "process"
}

How to specify a stemmer other than porter, in this case wordnet:

$ curl -d "text=processing&stemmer=wordnet" http://text-processing.com/api/stem/
{
        "text": "processing"
}

Using the snowball stemmer with spanish:

$ curl -d "text=correr&stemmer=snowball&language=spanish" http://text-processing.com/api/stem/
{
        "text": "corr"
}

Specifying just the language, which in the case of portuguese defaults to using the snowball stemmer:

$ curl -d "text=correr&language=portuguese" http://text-processing.com/api/stem/
{
        "text": "corr"
}

Try out the stemming demo to get a feel for the results.

Paramterers

text:

Required - the text you want to stem. It must not exceed 60,000 characters.

language:

The default language is english, unless a non-english stemmer is given. In that case, the value of language must be compatible with the chosen stemmer. Currently, the following languages are supported:

  • arabic
  • english
  • danish
  • dutch
  • finnish
  • french
  • german
  • hungarian
  • italian
  • norwegian
  • portuguese
  • romanian
  • russian
  • spanish
  • swedish

The snowball stemmer is the default stemmer for all languages except english and arabic, which default to porter and isri respectively.

stemmer:

The stemmer parameter supports the following values

porter

The default porter stemmer supports any language but defaults to english

lancaster

A lancaster stemmer that supports any language but defaults to english

wordnet

Lemmatization using WordNet, only supports english

rslp

A portuguese stemmer

isri

An arabic stemmer

snowball

A stemmer that supports the following languages

  • danish
  • dutch
  • english
  • finnish
  • french
  • german
  • hungarian
  • italian
  • norwegian
  • porter
  • portuguese
  • romanian
  • russian
  • spanish
  • swedish

If you give both a stemmer and a language, the stemmer must support that language. Both porter and lancaster can be used with any language, while wordnet, rslp, and isri are limited to their respective languages. The snowball stemmer currently supports 14 languages, and is the default stemmer for those languages.

Return Value

On success, a 200 OK response will be returned containing a JSON object that looks like this:

{
        "text": "stemmed text"
}

Errors

A 400 Bad Request response will be returned under the following conditions:

  • the language is not compatible with the stemmer
  • no value for text is provided
  • text exceeds 60,000 characters

A 503 Throttled response will be returned if you exceed the daily request limit. Signup for the Mashape Text-Processing API to get a higher limit plan.