Services
Rhasspy is composed of independent services that communicate over MQTT using a superset of the Hermes protocol for these components:
- Web Server
- Dialogue Manager
- Audio Input
- Wake Word Detection
- Speech to Text
- Intent Recognition
- Intent Handling
- Text to Speech
- Audio Output
The rhasspy-supervisor
tool converts your profile into a runnable configuration for either supervisord or docker-compose.
Each service sends and receives a specific set of MQTT messages. Message payloads are typically JSON objects, except for the following messages whose payloads are binary WAV audio:
hermes/audioServer/<siteId>/audioFrame
- WAV chunk from microphone
hermes/audioServer/<siteId>/playBytes/<requestId>
- WAV audio to play through speakers
rhasspy/asr/<siteId>/<sessionId>/audioCaptured
- WAV audio recorded from session
Most messages contain a string siteId
property, whose default value is "default". Each service takes one or more --siteId <NAME>
arguments that determine which site IDs the service will listen for. If not specified, the service will listen for all sites.
Internal vs. External MQTT
Rhasspy operates in one of two MQTT modes: internal or external. If you want to interact with Rhasspy over MQTT or use a server with satellites, it's important to understand the difference.
Internal MQTT
When Rhasspy is configured for internal MQTT (the default), a mosquitto
broker is automatically started on port 12183 (override with --local-mqtt-port
). All of Rhasspy's services will connect to this private broker and send messages through it.
If you're running Rhasspy inside Docker, make sure to add -p 12183:12183
to expose this port. Any downstream MQTT tools, like mosquitto_pub
or NodeRED
will need to have the MQTT port changed to 12183.
External MQTT
If you have your own MQTT broker that you'd like Rhasspy to share, configure it for external MQTT mode. In this mode, Rhasspy will simply connect all of its services to your broker.
Streaming audio from your microphone can sometimes cause congestion in an MQTT broker shared by many different services. For these scenarios, it's recommended to enable UDP audio streaming for both the Rhasspy audio input service and wake word service. This will disable MQTT audio streaming until the wake word has been detected, and again after a voice command has been spoken.
Web Server
Provides a graphical web interface for managing Rhasspy, and handles downloading language-specific profile artifacts.
Available Services
- rhasspy-server-hermes
- alpine.js based web UI at
http://YOUR_SERVER:12101
- Implements Rhasspy's HTTP API and websocket API
- alpine.js based web UI at
Dialogue Manager
Manages sessions initiated by a wake word detection
or a startSession
.
Available Services
Input Messages
hermes/dialogueManager/startSession
- Start a new session
hermes/dialogueManager/continueSession
- Continue an existing session
hermes/dialogueManager/endSession
- End an existing session
Output Messages
hermes/dialogueManager/sessionStarted
- New session has started
hermes/dialogueManager/sessionQueued
- New session has be enqueued
hermes/dialogueManager/sessionEnded
- Existing session has terminated
hermes/dialogueManager/intentNotRecognized
- Voice command was not recognized in existing session
Audio Input
Records audio from a microphone and streams it as WAV chunks over MQTT. See Audio Input for details.
Available Services
Input Messages
rhasspy/audioServer/getDevices
- Requests available input devices
Output Messages
hermes/audioServer/<siteId>/audioFrame
- WAV chunk from microphone
rhasspy/audioServer/devices
- Description of available audio input devices
Wake Word Detection
Listens to WAV chunks and tries to detect a wake/hotword. See Wake Word for details.
Available Services
- rhasspy-wake-raven-hermes
- Implements raven
- rhasspy-wake-pocketsphinx-hermes
- Implements pocketsphinx
- rhasspy-wake-porcupine-hermes
- Implements porcupine
- rhasspy-wake-precise-hermes
- Implements precise
- rhasspy-wake-snowboy-hermes
- Implements snowboy
Input Messages
hermes/hotword/toggleOn
- Enables wake word detection
hermes/hotword/toggleOff
- Disables wake word detection
rhasspy/hotword/getHotwords
- Request available hotwords
Output Messages
hermes/wake/hotword/<wakewordId>/detected
- Wake word successfully detected
rhasspy/hotword/hotwords
- Description of available hotwords
hermes/error/hotword
- Wake word system error
Speech to Text
Listens to WAV chunks and transcribes voice commands. See Speech to Text for details.
Available Services
- rhasspy-asr-kaldi-hermes
- Implements kaldi
- rhasspy-asr-pocketsphinx-hermes
- Implements pocketsphinx
- rhasspy-asr-deepspeech-hermes
- Implements deepspeech
- rhasspy-remote-http-hermes
- POSTs to remote web server for speech recognition
- Implements remote (
--asr-url
) and command (--asr-command
)
Input Messages
hermes/audioServer/<siteId>/audioFrame
- WAV chunk from microphone for a site
hermes/audioServer/<siteId>/<sessionId>/audioSessionFrame
- WAV chunk from microphone for a session
hermes/asr/toggleOn
- Enable ASR system
hermes/asr/toggleOff
- Disable ASR system
hermes/asr/startListening
- Start recording a voice command
hermes/asr/stopListening
- Stop recording a voice command
rhasspy/asr/<siteId>/train
- Re-train ASR system
rhasspy/g2p/pronounce
- Get phonetic pronunciations for words
Output Messages
hermes/asr/textCaptured
- Successful voice command transcription
hermes/error/asr
- Error during transcription/training
rhasspy/asr/<siteId>/trainSuccess
- ASR training succeeded
rhasspy/asr/<siteId>/<sessionId>/audioCaptured
- Audio recorded from voice command
- Sent when
sendAudioCaptured = true
instartListening
rhasspy/g2p/phonemes
- Phonetic pronunciations of words
Intent Recognition
Recognizes user intents from text input. See Intent Recognition for details.
Available Services
- rhasspy-fuzzywuzzy-hermes
- Implements fuzzywuzzy
- rhasspy-nlu-hermes
- Implements fsticuffs
- rhasspy-rasa-nlu-hermes
- Implements rasa
- rhasspy-remote-http-hermes
- POSTs to remote web server for intent recognition
- Implements remote (
--intent-url
) and command (--nlu-command
)
Input Messages
hermes/nlu/query
- Recognize intent from text
rhasspy/nlu/<siteId>/train
- Retrain NLU system
Output Messages
hermes/intent/<intentName>
- Intent successfully recognized
hermes/nlu/intentNotRecognized
- Intent was not recognized
hermes/error/nlu
- Error during recognition/training
rhasspy/nlu/<siteId>/trainSuccess
- NLU training succeeded
Intent Handling
Dispatches recognized intents to home automation software. See Intent Handling for details.
Available Services
- rhasspy-homeassistant-hermes
- Implements homeassistant
- rhasspy-remote-http-hermes
- POSTs to remote web server for intent handling
- Implements remote (
--handle-url
) and command (--handle-command
)
Input Messages
hermes/intent/<intentName>
- Intent successfully recognized
hermes/handle/toggleOn
- Enable intent handling
hermes/handle/toggleOff
- Disable intent handling
Output Messages
hermes/tts/say
- Speak a sentence
Text to Speech
Generates spoken audio for a sentence. See Text to Speech for details.
Available Services
- rhasspy-tts-cli-hermes
- rhasspy-tts-larynx-hermes
- Uses Larynx text to speech system (based on MozillaTTS)
- Implements larynx
- rhasspy-tts-wavenet-hermes
- rhasspy-remote-http-hermes
- POSTs to remote web server for text to speech
- Implements remote (
--tts-url
) and command (--tts-command
)
Input Messages
hermes/tts/say
- Speak a sentence
rhasspy/tts/getVoices
- Request available voices
Output Messages
hermes/tts/sayFinished
- Finished generating audio (actually spoken with
playBytes
)
- Finished generating audio (actually spoken with
rhasspy/tts/voices
- Description of available voices
Audio Output
Plays WAV audio through an audio output device (speakers). See Audio Output for details.
Available Services
- rhasspy-speakers-cli-hermes
- Implements remote and command
- POSTs to remote web server for audio output
Input Messages
hermes/audioServer/<siteId>/playBytes/<requestId>
- WAV audio to play through speakers
hermes/audioServer/toggleOff
- Disable audio output
hermes/audioServer/toggleOn
- Enable audio output
rhasspy/audioServer/getDevices
- Request audio output devices
Output Messages
hermes/audioServer/<siteId>/playFinished
- Audio has finished playing
rhasspy/audioServer/devices
- Details of audio output devices
Rhasspy Supervisor
The rhasspy-supervisor
tool transforms a Rhasspy profile.json
file into:
- A
supervisord.conf
file that can be run withsupervisord
- Runs services on the local machine
- Requires you to have service executables in your
PATH
(e.g.,rhasspy-server-hermes
)
- A
docker-compose.yml
file that can be run withdocker-compose
- Runs services inside a virtual Docker network
- Requires Docker and docker-compose to be installed
When you start Rhasspy, it automatically runs rhasspy-supervisor
to generate these files in your profile directory. From there, it depends on how you've installed Rhasspy.
Supervisord Restart
When running Rhasspy using supervisord
, the process ID (PID) of the supervisord
process will be written to a file named supervisord.pid
in your profile directory. If a restart is requested from the web interface, a SIGHUP
is sent to this PID, causing supervisord
to re-read its configuration file and stop/start all child processes.
Docker Compose Restart
If you run Rhasspy using docker-compose
, the restart process is a bit more complicated than with supervisord
. This is due to the need to re-write docker-compose.yml
on a profile change and bring the entire Docker container stack down and back up again.
A wrapper script like get-rhasspy.sh
needs to monitor the profile directory for a file named .restart_docker
. When a restart is requested via the web interface, this file is written and a timeout is set. The wrapper script should restart docker-compose
(using down
and then up
), and then delete the .restart_docker
file. Once its deleted, the web interface will reload the user's page.