Reference
Supported Languages
The table below lists which components and compatible with Rhasspy's supported languages.
Category | Name | Offline? | ca | cs | de | fr | el | en | es | hi | it | nl | pl | pt | ru | sv | vi | zh |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Wake Word | raven | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
pocketsphinx | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
precise | ✓ | • | • | • | • | • | ✓ | • | • | • | • | • | • | • | • | • | • | |
porcupine | ✓ | ✓ | ||||||||||||||||
snowboy | requires account | • | • | • | • | • | ✓ | • | • | • | • | • | • | • | • | • | • | |
Speech to Text | pocketsphinx | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
kaldi | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||||
deepspeech | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||||||
Intent Recognition | fsticuffs | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
fuzzywuzzy | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
adapt | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
flair | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||
rasaNLU | needs extra software | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Text to Speech | espeak | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
flite | ✓ | ✓ | ✓ | |||||||||||||||
picotts | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||||||
nanotts | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||||||
marytts | needs extra software | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||||||
opentts | needs extra software | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
larynx | ✓ | ✓ | ✓ | ✓ | ||||||||||||||
wavenet | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
• - yes, but requires training/customization
MQTT API
Rhasspy implements a superset of the Hermes protocol in rhasspy-hermes for the following components:
- Audio Server
- Automated Speech Recognition
- Dialogue Manager
- Grapheme to Phoneme
- Hotword Detection
- Intent Handling
- Natural Language Understanding
- Text to Speech
Audio Server
Messages for audio input and audio output.
- hermes/audioServer/<siteId>/audioFrame (binary)
- Chunk of WAV audio data for site
wav_bytes: bytes
- WAV data to play (message payload)siteId: string
- Hermes site ID (part of topic)
- hermes/audioServer/<siteId>/<sessionId>/audioSessionFrame (binary)
- Chunk of WAV audio data for session
wav_bytes: bytes
- WAV data to play (message payload)siteId: string
- Hermes site ID (part of topic)sessionId: string
- session ID (part of topic)
- hermes/audioServer/<siteId>/playBytes/<requestId> (JSON)
- Play WAV data
wav_bytes: bytes
- WAV data to play (message payload)requestId: string
- unique ID for request (part of topic)siteId: string
- Hermes site ID (part of topic)- Response(s)
- hermes/audioServer/<siteId>/playFinished
- Indicates that audio has finished playing
- Response to
hermes/audioServer/<siteId>/playBytes/<requestId>
siteId: string
- Hermes site ID (part of topic)id: string = ""
-requestId
from request message
- hermes/audioServer/toggleOff (JSON)
- Disable audio output
siteId: string = "default"
- Hermes site ID
- hermes/audioServer/toggleOn (JSON)
- Enable audio output
siteId: string = "default"
- Hermes site ID
- hermes/error/audioServer/play (JSON, Rhasspy only)
- Sent when an error occurs in the audio output system
error: string
- description of the errorcontext: string? = null
- system-defined context of the errorsiteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID
- hermes/error/audioServer/record (JSON, Rhasspy only)
- Sent when an error occurs in the audio input system
error: string
- description of the errorcontext: string? = null
- system-defined context of the errorsiteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID
- rhasspy/audioServer/getDevices (JSON, Rhasspy only)
- Request available input or output audio devices
modes: [string]
- list of modes ("input" or "output")id: string? = null
- unique ID returned in responsesiteId: string = "default"
- Hermes site IDtest: bool = false
- if true, test input devices
- rhasspy/audioServer/devices (JSON, Rhasspy only)
- Response to
rhasspy/audioServer/getDevices
devices: [object]
- list of available devicesmode: string
- "input" or "output"id: string
- unique device IDname: string? = null
- human readable name for devicedescription: string? = null
- detailed description of deviceworking: boolean? = null
- true/false if test succeeded or not, null if not tested
id: string? = null
- unique ID from requestsiteId: string = "default"
- Hermes site ID
- Response to
- rhasspy/audioServer/setVolume (JSON, Rhasspy only)
- Set the volume at one or more sites
volume: float
- volume level to set (0 = off, 1 = full volume)siteId: string = "default"
- Hermes site ID
Automated Speech Recognition
Messages for speech to text.
- hermes/asr/toggleOn (JSON)
- Enables ASR system
siteId: string = "default"
- Hermes site IDreason: string = ""
- Reason for toggle on
- hermes/asr/toggleOff (JSON)
- Disables ASR system
siteId: string = "default"
- Hermes site IDreason: string = ""
- Reason for toggle off
- hermes/asr/startListening (JSON)
- Tell ASR system to start recording/transcribing
siteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session IDstopOnSilence: bool = true
- detect silence and automatically end voice command (Rhasspy only)sendAudioCaptured: bool = false
- sendaudioCaptured
after stop listening (Rhasspy only)wakewordId: string? = null
- id of wake word that triggered session (Rhasspy only)
- hermes/asr/stopListening (JSON)
- Tell ASR system to stop recording
- Emits
textCaptured
if silence has was not detected earlier siteId: string = "default"
- Hermes site IDsessionId: string = ""
- current session ID
- hermes/asr/textCaptured (JSON)
- Successful transcription, sent either when silence is detected or on
stopListening
text: string
- transcription textlikelihood: float
- confidence from ASR systemseconds: float
- transcription time in secondssiteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session IDwakewordId: string? = null
- id of wake word that triggered session (Rhasspy only)asrTokens: [[object]]? = null
- details of individual tokens (words) in captured text (also see ASR confidence, note that list is two levels deep)value: string
- text of the tokenconfidence: float
- confidence score of token (0-1, 1 is more confident)rangeStart: int
- start index of token in input (0-based)rangeEnd: int
- end index of token in input (0-based)time: object
- structured time of when token was detectedstart: float
- start time in seconds (relative to start of utterance)end: float
- end time in seconds (relative to start of utterance)
- Successful transcription, sent either when silence is detected or on
- hermes/error/asr (JSON)
- Sent when an error occurs in the ASR system
error: string
- description of the errorcontext: string? = null
- system-defined context of the errorsiteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID
- rhasspy/asr/<siteId>/train (JSON, Rhasspy only)
- Instructs the ASR system to re-train
graph_path: str
- path to intent graph from rhasspy-nlu encoded as a pickle and gzippedid: string? = null
- unique ID for request (copied totrainSuccess
)graph_format: string? = null
- format of the graph (not used)siteId: string
- Hermes site ID (part of topic)- Response(s)
- rhasspy/asr/<siteId>/trainSuccess (JSON, Rhasspy only)
- Indicates that training was successful
id: string? = null
- unique ID from request (copied fromtrain
)siteId: string
- Hermes site ID (part of topic)- Response to
rhasspy/asr/<siteId>/train
- rhasspy/asr/<siteId>/<sessionId>/audioCaptured (binary, Rhasspy only)
- WAV audio data captured by ASR session
siteId: string
- Hermes site ID (part of topic)sessionId: string
- current session ID (part of topic)- Only sent if
sendAudioCaptured = true
instartListening
Dialogue Manager
Messages for managing dialogue sessions. These can be initiated by a hotword detected
message (or /api/listen-for-command
), and manually with a startSession
message (or /api/start-recording
).
- hermes/dialogueManager/startSession (JSON)
- Starts a new dialogue session (done automatically on hotword
detected
) init: object
- JSON object with one of two forms:- Action
type: string = "action"
- requiredcanBeEnqueued: bool
- true if session can be queued if there is already one (required)text: string? = null
- sentence to speak using text to speechintentFilter: [string]? = null
- valid intent names (null
means all)sendIntentNotRecognized: bool = false
- sendhermes/dialogueManager/intentNotRecognized
if intent recognition fails
- Notification
type: string = "notification"
- requiredtext: string
- sentence to speak using text to speech (required)
- Action
siteId: string = "default"
- Hermes site IDcustomData: string? = null
- user-defined data passed to subsequent session messages- Response(s)
- Starts a new dialogue session (done automatically on hotword
- hermes/dialogueManager/sessionStarted (JSON)
- Indicates a session has started
sessionId: string
- current session IDsiteId: string = "default"
- Hermes site IDcustomData: string? = null
- user-defined data (copied fromstartSession
)- Response to [
hermes/dialogueManager/startSession
]
- hermes/dialogueManager/sessionQueued (JSON)
- Indicates a session has been queued (only when
init.canBeEnqueued = true
instartSession
) sessionId: string
- current session IDsiteId: string = "default"
- Hermes site IDcustomData: string? = null
- user-defined data (copied fromstartSession
)- Response to [
hermes/dialogueManager/startSession
]
- Indicates a session has been queued (only when
- hermes/dialogueManager/continueSession (JSON)
- Requests that a session be continued after an
intent
has been recognized sessionId: string
- current session ID (required)customData: string? = null
- user-defined data (overrides sessioncustomData
if not null)text: string? = null
- sentence to speak using text to speechintentFilter: [string]? = null
- valid intent names (null
means all)sendIntentNotRecognized: bool = false
- sendhermes/dialogueManager/intentNotRecognized
if intent recognition fails
- Requests that a session be continued after an
- hermes/dialogueManager/endSession (JSON)
- Requests that a session be terminated nominally
sessionId: string
- current session ID (required)text: string? = null
- sentence to speak using text to speechcustomData: string? = null
- user-defined data (overrides sessioncustomData
if not null)
- hermes/dialogueManager/sessionEnded (JSON)
- Indicates a session has terminated
termination: string
reason for termination (required), one of:- nominal
- abortedByUser
- intentNotRecognized
- timeout
- error
sessionId: string
- current session IDsiteId: string = "default"
- Hermes site IDcustomData: string? = null
- user-defined data (copied fromstartSession
)- Response to
hermes/dialogueManager/endSession
or other reasons for a session termination
- hermes/dialogueManager/intentNotRecognized (JSON)
- Sent when intent recognition fails during a session (only when
init.sendIntentNotRecognized = true
instartSession
) sessionId: string
- current session IDinput: string? = null
input to NLU systemsiteId: string = "default"
- Hermes site IDcustomData: string? = null
- user-defined data (copied fromstartSession
)
- Sent when intent recognition fails during a session (only when
- hermes/dialogueManager/configure (JSON)
- Sets the default intent filter for all subsequent dialogue sessions
intents: [object]
- Intents to enable/disable (empty for all intents)intentId: string
- Name of intentenable: bool
- true if intent should be eligible for recognition
siteId: string = "default"
- Hermes site ID
- hermes/error/dialogueManager (JSON, Rhasspy only)
- Sent when an error occurs in the dialogue manager system
error: string
- description of the errorcontext: string? = null
- system-defined context of the errorsiteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID
Grapheme to Phoneme
Messages for looking up word pronunciations. See also the /api/lookup
HTTP endpoint.
Words are usually looked up from a phonetic dictionary included with the ASR system. The current speech to text services handle these messages.
- rhasspy/g2p/pronounce (JSON, Rhasspy only)
- Requests phonetic pronunciations of words
words: [string]
- words to pronounce (required)id: string? = null
- unique ID for request (copied tophonemes
)numGuesses: int = 5
- number of guesses if not in dictionarysiteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID- Response(s)
- rhasspy/g2p/phonemes (JSON, Rhasspy only)
- Phonetic pronunciations of words, either from a dictionary or grapheme-to-phoneme model
wordPhonemes: [object]
- phonetic pronunciations (required), keyed by word, values are:phonemes: [string]
- phonemes for word (key)guessed: bool? = null
- true if pronunciation came from a grapheme-to-phoneme model, false if guessed with g2p model
id: string? = null
- unique ID for request (copied frompronounce
)siteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID- Response to
rhasspy/g2p/pronounce
- rhasspy/error/g2p (JSON, Rhasspy only)
- Sent when an error occurs in the G2P system
error: string
- description of the errorcontext: string? = null
- system-defined context of the errorsiteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID
Hotword Detection
Messages for wake word detection. See also the /api/listen-for-wake
HTTP endpoint and the /api/events/wake
Websocket endpoint.
- hermes/hotword/toggleOn (JSON)
- Enables hotword detection
siteId: string = "default"
- Hermes site IDreason: string = ""
- Reason for toggle on
- hermes/hotword/toggleOff (JSON)
- Disables hotword detection
siteId: string = "default"
- Hermes site IDreason: string = ""
- Reason for toggle off
- hermes/hotword/<wakewordId>/detected (JSON)
- Indicates a hotword was successfully detected
wakewordId: string
- wake word ID (part of topic)modelId: string
- ID of wake word model used (service specific)modelVersion: string = ""
- version of wake word model used (service specific)modelType: string = "personal"
- type of wake word model used (service specific)currentSensitivity: float = 1.0
- sensitivity of wake word detection (service specific)siteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID (Rhasspy only)sendAudioCaptured: bool? = null
- if not null, copied toasr/startListening
message in dialogue manager
- hermes/error/hotword (JSON, Rhasspy only)
- Sent when an error occurs in the hotword system
error: string
- description of the errorcontext: string? = null
- system-defined context of the errorsiteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID
- rhasspy/hotword/getHotwords (JSON, Rhasspy only)
- Request available hotwords
id: string? = null
- unique ID for responsesiteId: string = "default"
- Hermes site ID
- rhasspy/hotword/hotwords (JSON, Rhasspy only)
- Response to
rhasspy/hotword/hotwords
models: [object]
- list of available hotwordsmodelId: string
- unique ID of hotword modelmodelWords: string
- words used to activate hotwordmodelVersion: string = ""
- version of hotword modelmodelType: string = "personal"
- "universal" or "personal"
id: string? = null
- unique ID from requestsiteId: string = "default"
- Hermes site ID
- Response to
Intent Handling
Messages for intent handling.
- rhasspy/handle/toggleOn (JSON, Rhasspy only)
- Enables intent handling
siteId: string = "default"
- Hermes site ID
- rhasspy/handle/toggleOff (JSON, Rhasspy only)
- Disables intent handling
siteId: string = "default"
- Hermes site ID
Natural Language Understanding
- hermes/nlu/query (JSON)
- Request an intent to be recognized from text
input: string
- text to recognize intent from (required)intentFilter: [string]? = null
- valid intent names (null
means all)id: string? = null
- unique id for request (copied to response messages)siteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session IDasrConfidence: float? = null
- confidence from ASR system for input text- Response(s)
- hermes/intent/<intentName> (JSON)
- Sent when an intent was successfully recognized
input: string
- text from query (required)intent: object
- details of recognized intent (required)intentName: string
- name of intent (required)confidenceScore: float
- confidence from NLU system for this intent (required)
slots: [object] = []
- details of named entities, list of:entity: string
- name of entity (required)slotName: string
- name of slot (required)confidence: float
- confidence from NLU system for this slot (required)rawValue: string
- entity value without substitutions (required)value: object
- entity value with substitutions (required)value: any
- entity value
range: object = null
- indexes of entity value in textstart: int
- start indexend: int
- end index (exclusive)
id: string = ""
- unique id for request (copied fromquery
)siteId: string = "default"
- Hermes site IDsessionId: string = ""
- current session IDcustomData: string = ""
- user-defined data (copied fromstartSession
)asrTokens: [[object]]? = null
- tokens from transcriptionvalue: string
- token valueconfidence: float
- confidence in tokenrange_start: int
- start of token in inputrange_end: int
- end of token in input (exclusive)
asrConfidence: float? = null
- confidence from ASR system for input text- Response to
hermes/nlu/query
- hermes/nlu/intentNotRecognized (JSON)
- Sent when intent recognition fails
input: string
- text from query (required)id: string? = null
- unique id for request (copied fromquery
)siteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID- Response to
hermes/nlu/query
- hermes/error/nlu (JSON)
- Sent when an error occurs in the NLU system
error: string
- description of the errorcontext: string? = null
- system-defined context of the errorsiteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID
- rhasspy/nlu/<siteId>/train (JSON, Rhasspy only)
- Instructs the NLU system to re-train
graph_path: str
- path to intent graph from rhasspy-nlu encoded as a pickle and gzippedid: string? = null
- unique ID for request (copied totrainSuccess
)graph_format: string? = null
- format of the graph (unused)siteId: string
- Hermes site ID (part of topic)- Response(s)
- rhasspy/nlu/<siteId>/trainSuccess (JSON, Rhasspy only)
- Indicates that training was successful
siteId: string
- Hermes site ID (part of topic)id: string? = null
- unique ID from request (copied fromtrain
)- Response to
rhasspy/nlu/<siteId>/train
Text to Speech
- hermes/tts/say (JSON)
- Generate spoken audio for a sentence using the configured text to speech system
- Automatically sends
playBytes
playBytes.requestId = say.id
text: string
- sentence to speak (required)lang: string? = null
- override language for TTS systemid: string? = null
- unique ID for request (copied tosayFinished
)volume: float? = null
- volume level to speak with (0 = off, 1 = full volume)siteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID- Response(s)
hermes/tts/sayFinished
(JSON)
- hermes/tts/sayFinished (JSON)
- Indicates that the text to speech system has finished speaking
id: string? = null
- unique ID for request (copied fromsay
)siteId: string = "default"
- Hermes site ID- Response to
hermes/tts/say
- hermes/error/tts (JSON, Rhasspy only)
- Sent when an error occurs in the text to speech system
error: string
- description of the errorcontext: string? = null
- system-defined context of the errorsiteId: string = "default"
- Hermes site IDsessionId: string? = null
- current session ID
- rhasspy/tts/getVoices (JSON, Rhasspy only)
- Request available text to speech voices
id: string? = null
- unique ID provided in responsesiteId: string = "default"
- Hermes site ID
- rhasspy/tts/voices (JSON, Rhasspy only)
- Response to
rhasspy/tts/getVoices
voices: list[object]
- available voicesvoiceId: string
- unique ID for voicedescription: string? = null
- human readable description of voice
id: string? = null
- unique ID from requestsiteId: string = "default"
- Hermes site ID
- Response to
HTTP API
Rhasspy's HTTP endpoints are documented below. You can also visit /api/
in your Rhasspy server (note the final slash) to try out each endpoint.
Application authors may want to use the rhasspy-client, which provides a high-level interface to a remote Rhasspy server.
Endpoints
- /api/custom-words
- GET custom word dictionary as plain text, or POST to overwrite it
- See
custom_words.txt
in your profile directory
- /api/backup-profile
- GET a zip file with relevant profile files (sentences, slots, etc.)
- /api/evaluate
- POST archive with WAV/JSON files for batch testing
- Returns JSON report
- Every file
foo.wav
should have afoo.json
with a recognized intent - Archive must be in a format supported by
shutil.unpack_archive
- /api/download-profile
- POST to have Rhasspy to download missing profile artifacts
- /api/handle-intent
- POST Hermes intent as JSON to handle
- /api/listen-for-command
- POST to wake Rhasspy up and start listening for a voice command
- Returns intent JSON when command is finished
?nohass=true
- stop Rhasspy from handling the intent?timeout=<seconds>
- override default command timeout?entity=<entity>&value=<value>
- set custom entities/values in recognized intent
- /api/listen-for-wake
- POST "on" to have Rhasspy listen for a wake word
- POST "off" to disable wake word
?siteId=site1,site2,...
to apply to specific site(s)
- /api/lookup
- POST word as plain text to look up or guess pronunciation
?n=<number>
- return at mostn
guessed pronunciations
- /api/microphones
- GET list of available microphones
- /api/mqtt/<TOPIC>
- POST JSON payload to
/api/mqtt/your/full/topic
- Payload will be published to
your/full/topic
on MQTT broker
- Payload will be published to
- GET next MQTT message on
TOPIC
as JSON- Subscribes to
your/full/topic
with request to/api/mqtt/your/full/topic
- Escape wildcard
#
as%23
and+
as%2B
- Subscribes to
- POST JSON payload to
- /api/phonemes
- GET example phonemes from speech recognizer for your profile
- See
phoneme_examples.txt
in your profile directory
- /api/play-recording
- POST to play last recorded voice command
- GET to download WAV data from last recorded voice command
- /api/play-wav
- POST to play WAV data
- Make sure to set
Content-Type
toaudio/wav
?siteId=site1,site2,...
to apply to specific site(s)
- /api/profile
- GET the JSON for your profile, or POST to overwrite it
?layers=profile
to only see settings different fromdefaults.json
- See
profile.json
in your profile directory
- /api/restart
- Restart Rhasspy server
- /api/sentences
- GET voice command templates or POST to overwrite
- Set
Accept: application/json
to GET JSON with all sentence files - Set
Content-Type: application/json
to POST JSON with sentences for multiple files - See
sentences.ini
andintents
directory in your profile
- /api/set-volume
- POST to set volume at one or more sites
- Body text is volume level (0 = off, 1 = full volume)
?siteId=site1,site2,...
to apply to specific site(s)
- /api/slots
- GET slot values as JSON or POST to add to/overwrite them
?overwrite_all=true
to clear slots in JSON before writing
- /api/speakers
- GET list of available audio output devices
- /api/speech-to-intent
- POST a WAV file and have Rhasspy process it as a voice command
- Returns intent JSON when command is finished
?nohass=true
- stop Rhasspy from handling the intent?entity=<entity>&value=<value>
- set custom entity/value in recognized intent
- /api/speech-to-text
- POST a WAV file and have Rhasspy return the text transcription
- Set
Accept: application/json
to receive JSON with more details ?noheader=true
- send raw 16-bit 16Khz mono audio without a WAV header
- /api/start-recording
- POST to have Rhasspy start recording a voice command
- /api/stop-recording
- POST to have Rhasspy stop recording and process recorded data as a voice command
- Returns intent JSON when command has been processed
?nohass=true
- stop Rhasspy from handling the intent?entity=<entity>&value=<value>
- set custom entity/value in recognized intent
- /api/test-microphones
- GET list of available microphones and if they're working
- /api/text-to-intent
- POST text and have Rhasspy process it as command
- Returns intent JSON when command has been processed
?nohass=true
- stop Rhasspy from handling the intent?entity=<entity>&value=<value>
- set custom entity/value in recognized intent
- /api/text-to-speech
- POST text and have Rhasspy speak it
?voice=<voice>
- override default TTS voice?language=<language>
- override default TTS language or locale?repeat=true
- have Rhasspy repeat the last sentence it spoke?volume=<volume>
- volume level to speak at (0 = off, 1 = full volume)?siteId=site1,site2,...
to apply to specific site(s)
- /api/train
- POST to re-train your profile
- /api/tts-voices
- GET JSON object with available text to speech voices
- /api/wake-words
- GET JSON object with available wake words
- /api/unknown-words
- GET words that Rhasspy doesn't know in your sentences
- See
unknown_words.txt
in your profile directory
Websocket API
- /api/events/intent
- Emits JSON-encoded intents after each NLU query
- /api/events/text
- Emits JSON-encoded transcriptions after each ASR transcription
- /api/events/wake
- Emits JSON-encoded detections after each wake word detection
- /api/mqtt
- Allows you to subscribe to, receive, and publish JSON-encoded MQTT messages
- Send
{ "type": "subscribe", "topic": "your/full/topic" }
to subscribe - Send
{ "type": "publish", "topic": "your/full/topic", "payload": { ... } }
to publish - Listen for subscribed topics, receiving
{ "topic": "...", "payload": { ... } }
for each message
- /api/mqtt/<TOPIC>
- Subscribes to messages from
TOPIC
- Receive
{ "topic": "...", "payload": { ... } }
for each message - Escape wildcard
#
as%23
and+
as%2B
- Subscribes to messages from
Profile Settings
All available profile sections and settings are listed below:
home_assistant
- how to communicate with Home Assistant/Hass.iourl
- Base URL of Home Assistant server (no/api
)access_token
- long-lived access token for Home Assistant (Hass.io token is used automatically)api_password
- Password, if you have that enabled (deprecated)pem_file
- Full path to your PEM certificate filekey_file
- Full path to your key file (if separate, optional)event_type_format
- Python format string used to create event type from intent type ({0}
)
speech_to_text
- transcribing voice commands to textsystem
- name of speech to text system (pocketsphinx
,kaldi
,remote
,command
,remote
,hermes
, ordummy
)pocketsphinx
- configuration for Pocketsphinxcompatible
- true if profile can use pocketsphinx for speech recognitionacoustic_model
- directory with CMU 16 kHz acoustic modelbase_dictionary
- large text file with word pronunciations (read only)custom_words
- small text file with words/pronunciations added by userdictionary
- text file with all words/pronunciations needed for example sentencesunknown_words
- small text file with guessed word pronunciations (from phonetisaurus)language_model
- text file with trigram ARPA language model built from example sentencesopen_transcription
- true if general language model should be used (custom voices commands ignored)base_language_model
- large general language model (read only)mllr_matrix
- MLLR matrix from acoustic model tuningmix_weight
- how much of the base language model to mix in during training (0-1)phoneme_examples
- text file with examples for each acoustic model phonemephoneme_map
- text file mapping ASR phonemes to eSpeak phonemes
kaldi
- configuration for Kaldicompatible
- true if profile can use Kaldi for speech recognitionkaldi_dir
- absolute path to Kaldi root directorymodel_dir
- directory where Kaldi model is stored (relative to profile directory)graph
- directory where HCLG.fst is located (relative tomodel_dir
)base_graph
- directory where large general HCLG.fst is located (relative tomodel_dir
)base_dictionary
- large text file with word pronunciations (read only)custom_words
- small text file with words/pronunciations added by userdictionary
- text file with all words/pronunciations needed for example sentencesopen_transcription
- true if general language model should be used (custom voices commands ignored)unknown_words
- small text file with guessed word pronunciations (from phonetisaurus)mix_weight
- how much of the base language model to mix in during training (0-1)phoneme_examples
- text file with examples for each acoustic model phonemephoneme_map
- text file mapping ASR phonemes to eSpeak phonemes
remote
- configuration for remote Rhasspy serverurl
- URL to POST WAV data for transcription (e.g.,http://your-rhasspy-server:12101/api/speech-to-text
)
command
- configuration for external speech-to-text programprogram
- path to executablearguments
- list of arguments to pass to program
sentences_ini
- Ini file with example sentences/JSGF templates grouped by intentsentences_dir
- Directory with additional sentence templates (default:intents
)g2p_model
- finite-state transducer for phonetisaurus to guess word pronunciationsg2p_casing
- casing to force for g2p model (upper
,lower
, or blank)dictionary_casing
- casing to force for dictionary words (upper
,lower
, or blank)slots_dir
- directory to look for slots lists (default:slots
)slot_programs
- directory to look for slot programs (defaultslot_programs
)
intent
- transforming text commands to intentssystem
- intent recognition system (fsticuffs
,fuzzywuzzy
,rasa
,remote
,adapt
,command
, ordummy
)fsticuffs
- configuration for OpenFST-based intent recognizerintent_json
- path to intent graph JSON file generated by [rhasspy-nlu][https://github.com/rhasspy/rhasspy-nlu]converters_dir
- directory to look for converter programs (default:converters
)ignore_unknown_words
- true if words not in the FST symbol table should be ignoredfuzzy
- true if text is matching in a fuzzy manner, skipping words instop_words.txt
fuzzywuzzy
- configuration for simplistic Levenshtein distance based intent recognizerexamples_json
- JSON file with intents/example sentencesmin_confidence
- minimum confidence required for intent to be converted to a JSON event (0-1)
remote
- configuration for remote Rhasspy serverurl
- URL to POST text to for intent recognition (e.g.,http://your-rhasspy-server:12101/api/text-to-intent
)
rasa
- configuration for Rasa NLU based intent recognizerurl
- URL of remote Rasa NLU server (e.g.,http://localhost:5005/
)examples_markdown
- Markdown file to generate with intents/example sentencesproject_name
- name of project to generate during training
adapt
- configuration for Mycroft Adapt based intent recognizerstop_words
- text file with words to ignore in training sentences
command
- configuration for external speech-to-text programprogram
- path to executablearguments
- list of arguments to pass to program
replace_numbers
if true, automatically replace number ranges (N..M
) or numbers (N
) with words
text_to_speech
- pronouncing wordssystem
- text to speech system (espeak
,flite
,picotts
,marytts
,command
,remote
,command
,hermes
, ordummy
)espeak
- configuration for eSpeakvoice
- name of voice to use (e.g.,en
,fr
)
flite
- configuration for flitevoice
- name of voice to use (e.g.,kal16
,rms
,awb
)
picotts
- configuration for PicoTTSlanguage
- language to use (default if not present)
marytts
- configuration for MaryTTSurl
- address:port of MaryTTS server (port is usually 59125)voice
- name of voice to use (e.g.,cmu-slt
). Default if not present.locale
- name of locale to use (e.g.,en-US
). Default if not present.
wavenet
- configuration for Google's WaveNetcache_dir
- path to directory in your profile where WAV files are cachedcredentials_json
- path to the JSON credentials file (generated online)gender
- gender of speaker (MALE
FEMALE
)language_code
- language/locale e.g.en-US
,sample_rate
- WAV sample rate (default: 22050)url
- URL of WaveNet endpointvoice
- voice to use (e.g.,Wavenet-C
)fallback_tts
- text to speech system to use when offline or error occurs (e.g.,espeak
)
remote
- configuration for remote text to speech serverurl
- URL to POST sentence to and get back WAV data
command
- configuration for external text-to-speech programsay_program
- path to executable for text to WAVsay_arguments
- list of arguments to pass to say programvoices_program
- path to executable for listing available voicesvoices_arguments
- list of arguments to pass to voices program
training
- training speech/intent recognizersspeech_to_text
- training for speech decodersystem
- speech to text training system (auto
ordummy
)command
- configuration for external speech-to-text training programprogram
- path to executablearguments
- list of arguments to pass to program
remote
- configuration for external HTTP endpointurl
- URL of speech to text training endpoint
intent
- training for intent recognizersystem
- intent recognizer training system (auto
ordummy
)command
- configuration for external intent recognizer training programprogram
- path to executablearguments
- list of arguments to pass to program
remote
- configuration for external HTTP endpointurl
- URL of intent recognizer training endpoint
wake
- waking Rhasspy up for speech inputsystem
- wake word recognition system (raven
,pocketsphinx
,snowboy
,precise
,porcupine
,command
,hermes
, ordummy
)raven
- configuration for Raven wake word recognizertemplate_dir
- directory where WAV templates are stored in profile (default:raven
)probability_threshold
- list with lower/upper probability range for detection (default: [0.45, 0.55])minimum_matches
- number of templates that must match for a detection (default: 1)
pocketsphinx
- configuration for Pocketsphinx wake word recognizerkeyphrase
- phrase to wake up on (3-4 syllables recommended)threshold
- sensitivity of detection (recommended range 1e-50 to 1e-5)chunk_size
- number of bytes per chunk to feed to Pocketsphinx (default 960)
snowboy
- configuration for snowboymodel
- path to model file(s), separated by commas (in profile directory)sensitivity
- model sensitivity (0-1, default 0.5)audio_gain
- audio gain (default 1)apply_frontend
- true if ApplyFrontend should be setchunk_size
- number of bytes per chunk to feed to snowboy (default 960)model_settings
- settings for each snowboy model path (e.g.,snowboy/snowboy.umdl
)<MODEL_PATH>
sensitivity
- model sensitivityaudio_gain
- audio gainapply_frontend
- true if ApplyFrontend should be set
precise
- configuration for Mycroft Preciseengine_path
- path to the precise-engine binarymodel
- path to model file (in profile directory)sensitivity
- model sensitivity (0-1, default 0.5)trigger_level
- number of events to trigger activation (default 3)chunk_size
- number of bytes per chunk to feed to Precise (default 2048)
porcupine
- configuration for PicoVoice's Porcupinelibrary_path
- path tolibpv_porcupine.so
for your platform/architecturemodel_path
- path to theporcupine_params.pv
(lib/common)keyword_path
- path to the.ppn
keyword filesensitivity
- model sensitivity (0-1, default 0.5)
command
- configuration for external speech-to-text programprogram
- path to executablearguments
- list of arguments to pass to program
microphone
- configuration for audio recordingsystem
- audio recording system (pyaudio
,arecord
, gstreamer, or
dummy`)pyaudio
- configuration for PyAudio microphonedevice
- index of device to use or empty for default deviceframes_per_buffer
- number of frames to read at a time (default 480)
arecord
- configuration for ALSA microphonedevice
- name of ALSA device (seearecord -L
) to use or empty for default devicechunk_size
- number of bytes to read at a time (default 960)
command
- configuration for external audio input programrecord_program
- path to executable for audio inputrecord_arguments
- list of arguments to pass to record programlist_program
- path to executable for listing available output deviceslist_arguments
- list of arguments to pass to list programtest_program
- path to executable for testing available output devicestest_arguments
- list of arguments to pass to test program
sounds
- configuration for audio output from Rhasspysystem
- which sound output system to use (aplay
,command
,remote
,hermes
, ordummy
)wake
- path to WAV file to play when Rhasspy wakes uprecorded
- path to WAV file to play when a command finishes recordingaplay
- configuration for ALSA speakersdevice
- name of ALSA device (seeaplay -L
) to use or empty for default device
command
- configuration for external audio output programplay_program
- path to executable for audio outputplay_arguments
- list of arguments to pass to play programlist_program
- path to executable for listing available output deviceslist_arguments
- list of arguments to pass to list program
remote
- configuration for remote audio output serverurl
- URL to POST WAV data to
handle
system
- which intent handling system to use (hass
,command
,remote
,command
, ordummy
)remote
- configuration for remote HTTP intent handlerurl
- URL to POST intent JSON to and receive response JSON from
command
- configuration for external speech-to-text programprogram
- path to executablearguments
- list of arguments to pass to program
mqtt
- configuration for MQTTenabled
- true if external broker should be used (false uses internal broker on port 12183)host
- external MQTT hostport
- external MQTT portusername
- external MQTT username (blank for anonymous)password
- external MQTT passwordsite_id
- one or more Hermes site IDs (comma separated). First ID is used for new messages
dialogue
- configuration for Hermes dialogue managersystem
- which dialogue manager to use (rhasspy
,hermes
, ordummy
)group_separator
- separator to use when grouping satellites (e.g.,bedroom.front
,bedroom.back
)
download
- configuration for profile file downloadingurl_base
- base URL to download profile artifacts (defaults to Github)conditions
- profile settings that will trigger file downloads- keys are profile setting paths (e.g.,
wake.system
) - values are dictionaries whose keys are profile settings values (e.g.,
snowboy
)- settings may have the form
<=N
or!X
to mean "less than or equal to N" or "not X" - leaf nodes are dictionaries whose keys are destination file paths and whose values reference the
files
dictionary
- settings may have the form
- keys are profile setting paths (e.g.,
files
- locations, etc. of files to download- keys are names of files
- values are dictionaries with:
url
- URL of file to download (appended tourl_base
)bytes_expected
- number of bytes file should be after decompressionunzip
-true
if file should be decompressed withgunzip
parts
- list of objects representing parts of a file that should be combined withcat
fragment
- fragment appended to file URLbytes_expected
- number of bytes for this part
logging
- settings for service loggersformat
- Python logger format string
Data Formats
In addition to the message formats specified in the Hermes protocol, Rhasspy has its own formats for transcriptions and intents. A Rhasspy profile also contains artifacts in standard formats, such as pronunciation dictionaries, language models, and grapheme to phoneme models.
Transcriptions
The /api/speech-to-text
HTTP endpoint and /api/events/text
Websocket endpoint produce JSON in the following format:
{
"text": "transcription text",
"transcribe_seconds": 0.123,
"likelihood": 0.321,
"wav_seconds": 1.456
}
where
text
is the most likely transcription of the audio data (string)transcribe_seconds
is the number of seconds it took to transcribe (number)likelihood
is a confidence value returned by the ASR system (number)wav_seconds
is the duration of the WAV audio in seconds (number)
Intents
The /api/text-to-intent
, /api/speech-to-intent
, /api/listen-for-command
, and /api/stop-recording
HTTP endpoints as well as the /api/events/intent
Websocket endpoint produce JSON in the following format:
{
"intent": {
"name": "NameOfIntent",
"confidence": 1.0
},
"entities": [
{ "entity": "entity_1", "value": "value_1", "raw_value": "value_1",
"start": 0, "end": 1, "raw_start": 0, "raw_end": 1 },
{ "entity": "entity_2", "value": "value_2", "raw_value": "value_2",
"start": 0, "end": 1, "raw_start": 0, "raw_end": 1 }
],
"slots": {
"entity_1": "value_1",
"entity_2": "value_2"
},
"text": "transcription text with substitutions",
"raw_text": "transcription text without substitutions",
"tokens": ["transcription", "text", "with", "substitutions"],
"raw_tokens": ["transcription", "text", "without", "substitutions"],
"recognize_seconds": 0.001
}
where
intent
describes the recognized intent (object)name
is the name of the recognized intent (section headers in your sentences.ini) (string)confidence
is a value between 0 and 1, with 1 being maximally confident (number)
entities
is a list of recognized entities (list)entity
is the name of the slot (string)value
is the (substitued) value (string)raw_value
is the (non-substitued) value (string)start
is the zero-based start index of the entity intext
(number)raw_start
is the zero-based start index of the entity inraw_text
(number)stop
is the zero-based stop index (exclusive) of the entity intext
(number)raw_stop
is the zero-based stop index (exclusive) of the entity inraw_text
(number)
slots
is a dictionary of entities/values (object)- Assumes one value per entity. See
entities
for complete list.
- Assumes one value per entity. See
text
is the input text with substitutions (string)raw_text
is the input text without substitutionstokens
is the list of words/tokens intext
raw_tokens
is the list of words/tokens inraw_text
recognize_seconds
is the number of seconds it took to recognize the intent and slots (number)
Pronunciation Dictionaries
Dictionaries are expected in plaintext, with the following format:
word1 P1 P2 P3
word2 P1 P4 P5
...
Each line starts with a word and, after some whitespace, a list of phonemes are given (separated by whitespace). These phonemes must match what the acoustic model was trained to recognize.
Multiple pronunciations for the same word are possible, and may optionally contain an index:
word P1 P2 P3
word(1) P2 P2 P3
word(2) P3 P2 P3
A rhasspy
profile will typically contain 3 dictionaries:
base_dictionary.txt
- A large, pre-built dictionary with most of the words in a given language
custom_words.txt
- A small, user-defined dictionary with custom words or pronunciations
dictionary.txt
- Contains exactly the vocabulary needed for a profile
- Automatically generated during training
Language Models
Language models must be in plaintext ARPA format.
A rhasspy
profile will typically contain 2 language models:
base_language_model.txt
- A large, pre-built language model that summarizes a given language
- Used when the
open_transcription
setting istrue
for the ASR system (e.g.,speech_to_text.pocketsphinx.open_transcription
) - Used during language model mixing
language_model.txt
- Summarizes the valid voice commands for a profile
- Automatically generated during training
Grapheme To Phoneme Models
A grapheme-to-phoneme (g2p) model helps guess the pronunciations of words outside of the dictionary. These models are trained on each profile's base_dictionary.txt
file using phonetisaurus and saved in the OpenFST binary format.
G2P prediction can also be done using transformer models.
Command Line Tools
rhasspy-client
- Remote control Rhasspy server
rhasspy-nlu
- Converts
sentences.ini
to intent graph
- Converts
rhasspy-hermes
- Injects WAV files and other Hermes MQTT messages
rhasspy-supervisor
- Converts
profile.json
tosupervisord
config ordocker-compose.yml
- Converts