Intent Recognition

After your voice command has been transcribed by the speech to text system, the next step is to recognize your intent. The end result is a JSON event with information about the intent.

The following table summarizes the trade-offs of using each intent recognizer:

System Ideal Sentence count Training Speed Recognition Speed Flexibility
fsticuffs 1M+ very fast very fast ignores unknown words
fuzzywuzzy 12-100 fast fast fuzzy string matching
adapt 100-1K moderate fast ignores unknown words
rasaNLU 1K-100K very slow moderate handles unseen words
flair 1K-100K very slow moderate handles unseen words

Fsticuffs

Uses OpenFST to recognize only those sentences that were trained. While less flexible than the other intent recognizers, fsticuffs can be trained and perform recognition over millions of sentences in milliseconds. If you only plan to recognize voice commands from your training set (and not unseen ones via text chat), fsticuffs is the best choice.

Add to your profile:

"intent": {
  "system": "fsticuffs",
  "fsticuffs": {
    "intent_fst": "intent.fst",
    "ignore_unknown_words": true,
    "fuzzy": true
  }
}

By default, fuzzy mathing is enabled (fuzzy is true). This allows fsticuffs to be less strict when matching text, skipping over any words in stop_words.txt, and handling repeated words gracefully. Words must still appear in the correct order according to sentences.ini, but additional words will not cause a recognition failure.

When ignore_unknown_words is true, any word outside of sentences.ini is simply ignored. This allows a lot more sentences to be accepted, but may cause unexpected results when used with arbitrary input from text chat.

See rhasspy.intent.FsticuffsRecognizer for details.

Fuzzywuzzy

Finds the closest matching intent by using the Levenshtein distance between the text and the all of the training sentences you provided. Works best when you have a small number of sentences (dozens to hundreds) and need some resiliency to spelling errors (i.e., from text chat).

Add to your profile:

"intent": {
  "system": "fuzzywuzzy",
  "fuzzywuzzy": {
    "examples_json": "intent_examples.json"
  }
}

See rhasspy.intent.FuzzyWuzzyRecognizer for details.

Mycroft Adapt

Recognizes intents using Mycroft Adapt. Works best when you have a medium number of sentences (hundreds to thousands) and need to be able to recognize sentences not seen during training (no new words, though).

Add to your profile:

"intent": {
  "system": "adapt",
  "adapt": {
      "stop_words": "stop_words.txt"
  }
}

The intent.adapt.stop_words text file contains words that should be ignored (i.e., cannot be "required" or "optional").

See rhasspy.intent.AdaptIntentRecognizer for details.

Flair

Recognizes intents using the flair NLP framework. Works best when you have a large number of sentences (thousands to hundreds of thousands) and need to handle sentences and words not seen during training.

Add to your profile:

"intent": {
  "system": "flair",
  "flair": {
      "data_dir": "flair_data",
      "max_epochs": 25,
      "do_sampling": true,
      "num_samples": 10000
  }
}

By default, the flair recognizer will generate 10,000 random sentences (num_samples) from each intent in your sentences.ini file. If you set do_sampling to false, Rhasspy will generate all possible sentences and use them as training data. This will produce the most accurate models, but may take a long time depending on the complexity of your grammars.

A flair TextClassifier will be trained to classify unseen sentences by intent, and a SequenceTagger will be trained for each intent that has at least one tag. During recognition, sentences are first classified by intent and then run through the appropriate SequenceTagger model to determine slots/entities.

See rhasspy.intent.FlairRecognizer for details.

RasaNLU

Recognizes intents remotely using a Rasa NLU server. You must install a Rasa NLU server somewhere that Rhasspy can access. Works well when you have a large number of sentences (thousands to hundreds of thousands) and need to handle sentences and words not seen during training. This needs Rasa 1.0 or higher.

Add to your profile:

"intent": {
  "system": "rasa",
  "rasa": {
    "examples_markdown": "intent_examples.md",
    "project_name": "rhasspy",
    "url": "http://localhost:5005/"
  }
}

See rhasspy.intent.RasaIntentRecognizer for details.

Remote HTTP Server

Uses a remote Rhasppy server to do intent recognition. POSTs the text to an HTTP endpoint and receives the intent as JSON.

Add to your profile:

"intent": {
  "system": "remote",
  "remote": {
    "url": "http://my-server:12101/api/text-to-intent"
  }
}

See rhasspy.intent.RemoteRecognizer for details.

Home Assistant Conversation

Sends transcriptions from speech to text to Home Assistant's conversation API. If the response contains speech, Rhasspy can optionally speak it.

Add to your profile:

"intent": {
  "system": "conversation",
  "conversation": {
    "handle_speech": true
  }
}

When handle_speech is true, Rhasspy will forward the returned speech to your text to speech system.

The settings from your profile's home_assistant section are automatically used (URL, access token, etc.).

Because Home Assistant will already handle your intent (probably using an intent script), Rhasspy will always generate an empty intent with this recognizer.

See rhasspy.intent.HomeAssistantConversationRecognizer for details.

MQTT/Hermes

Publishes intent recognitions/failures to hermes/intent/<INTENT_NAME> or hermes/nlu/intentNotRecognized (Hermes protocol).

This is enabled by default and controlled by the mqtt.publish_intents setting in your profile.

Command

Recognizes intents from text using a custom external program.

Add to your profile:

"intent": {
  "system": "command",
  "command": {
    "program": "/path/to/program",
    "arguments": []
  }
}

Rhasspy recognizes intents from text using one of several systems, such as fuzzywuzzy or Rasa NLU. You can call a custom program that does intent recognition from a text command.

When a voice command is successfully transcribed, your program will be called with the text transcription printed to standard in. Your program should return JSON on standard out, something like:

{
  "intent": {
    "name": "ChangeLightColor",
    "confidence": 1.0
  },
  "entities": [
    { "entity": "name",
      "value": "bedroom light" },
    { "entity": "color",
      "value": "red" }
  ],
  "text": "set the bedroom light to red"
}

The following environment variables are available to your program:

  • $RHASSPY_BASE_DIR - path to the directory where Rhasspy is running from
  • $RHASSPY_PROFILE - name of the current profile (e.g., "en")
  • $RHASSPY_PROFILE_DIR - directory of the current profile (where profile.json is)

See text2intent.sh for an example program.

If you intent recognition system requires some special training, you should also override Rhasspy's intent training system.

See rhasspy.intent.CommandRecognizer for details.

Dummy

Disables intent recognition.

Add to your profile:

"intent": {
  "system": "dummy"
}

See rhasspy.intent.DummyRecognizer for details.