Services

Rhasspy is composed of independent services that communicate over MQTT using a superset of the Hermes protocol for these components:

Web Server
Dialogue Manager
Audio Input
Wake Word Detection
Speech to Text
Intent Recognition
Intent Handling
Text to Speech
Audio Output

The rhasspy-supervisor tool converts your profile into a runnable configuration for either supervisord or docker-compose.

Each service sends and receives a specific set of MQTT messages. Message payloads are typically JSON objects, except for the following messages whose payloads are binary WAV audio:

hermes/audioServer/<siteId>/audioFrame
- WAV chunk from microphone
hermes/audioServer/<siteId>/playBytes/<requestId>
- WAV audio to play through speakers
rhasspy/asr/<siteId>/<sessionId>/audioCaptured
- WAV audio recorded from session

Most messages contain a string siteId property, whose default value is "default". Each service takes one or more --siteId <NAME> arguments that determine which site IDs the service will listen for. If not specified, the service will listen for all sites.

Internal vs. External MQTT

Rhasspy operates in one of two MQTT modes: internal or external. If you want to interact with Rhasspy over MQTT or use a server with satellites, it's important to understand the difference.

Internal MQTT

When Rhasspy is configured for internal MQTT (the default), a mosquitto broker is automatically started on port 12183 (override with --local-mqtt-port). All of Rhasspy's services will connect to this private broker and send messages through it.

Internal MQTT broker

If you're running Rhasspy inside Docker, make sure to add -p 12183:12183 to expose this port. Any downstream MQTT tools, like mosquitto_pub or NodeRED will need to have the MQTT port changed to 12183.

External MQTT

If you have your own MQTT broker that you'd like Rhasspy to share, configure it for external MQTT mode. In this mode, Rhasspy will simply connect all of its services to your broker.

External MQTT broker

Streaming audio from your microphone can sometimes cause congestion in an MQTT broker shared by many different services. For these scenarios, it's recommended to enable UDP audio streaming for both the Rhasspy audio input service and wake word service. This will disable MQTT audio streaming until the wake word has been detected, and again after a voice command has been spoken.

Web Server

Provides a graphical web interface for managing Rhasspy, and handles downloading language-specific profile artifacts.

Available Services

rhasspy-server-hermes
- alpine.js based web UI at http://YOUR_SERVER:12101
- Implements Rhasspy's HTTP API and websocket API

Dialogue Manager

Manages sessions initiated by a wake word detection or a startSession.

Hermes dialogue message flow

Audio Input

Records audio from a microphone and streams it as WAV chunks over MQTT. See Audio Input for details.

Available Services

rhasspy-microphone-cli-hermes
- Calls an external program for audio input
- Implements arecord and command
rhasspy-microphone-pyaudio-hermes
- Records directly from a PyAudio device
- Implements pyaudio

Input Messages

rhasspy/audioServer/getDevices
- Requests available input devices

Output Messages

hermes/audioServer/<siteId>/audioFrame
- WAV chunk from microphone
rhasspy/audioServer/devices
- Description of available audio input devices

Wake Word Detection

Listens to WAV chunks and tries to detect a wake/hotword. See Wake Word for details.

Available Services

rhasspy-wake-raven-hermes
- Implements raven
rhasspy-wake-pocketsphinx-hermes
- Implements pocketsphinx
rhasspy-wake-porcupine-hermes
- Implements porcupine
rhasspy-wake-precise-hermes
- Implements precise
rhasspy-wake-snowboy-hermes
- Implements snowboy

Input Messages

hermes/hotword/toggleOn
- Enables wake word detection
hermes/hotword/toggleOff
- Disables wake word detection
rhasspy/hotword/getHotwords
- Request available hotwords

Output Messages

hermes/wake/hotword/<wakewordId>/detected
- Wake word successfully detected
rhasspy/hotword/hotwords
- Description of available hotwords
hermes/error/hotword
- Wake word system error

Speech to Text

Listens to WAV chunks and transcribes voice commands. See Speech to Text for details.

Available Services

rhasspy-asr-kaldi-hermes
- Implements kaldi
rhasspy-asr-pocketsphinx-hermes
- Implements pocketsphinx
rhasspy-asr-deepspeech-hermes
- Implements deepspeech
rhasspy-remote-http-hermes
- POSTs to remote web server for speech recognition
- Implements remote (--asr-url) and command (--asr-command)

Input Messages

hermes/audioServer/<siteId>/audioFrame
- WAV chunk from microphone for a site
hermes/audioServer/<siteId>/<sessionId>/audioSessionFrame
- WAV chunk from microphone for a session
hermes/asr/toggleOn
- Enable ASR system
hermes/asr/toggleOff
- Disable ASR system
hermes/asr/startListening
- Start recording a voice command
hermes/asr/stopListening
- Stop recording a voice command
rhasspy/asr/<siteId>/train
- Re-train ASR system
rhasspy/g2p/pronounce
- Get phonetic pronunciations for words

Output Messages

hermes/asr/textCaptured
- Successful voice command transcription
hermes/error/asr
- Error during transcription/training
rhasspy/asr/<siteId>/trainSuccess
- ASR training succeeded
rhasspy/asr/<siteId>/<sessionId>/audioCaptured
- Audio recorded from voice command
- Sent when sendAudioCaptured = true in startListening
rhasspy/g2p/phonemes
- Phonetic pronunciations of words

Intent Recognition

Recognizes user intents from text input. See Intent Recognition for details.

Available Services

rhasspy-fuzzywuzzy-hermes
- Implements fuzzywuzzy
rhasspy-nlu-hermes
- Implements fsticuffs
rhasspy-rasa-nlu-hermes
- Implements rasa
rhasspy-remote-http-hermes
- POSTs to remote web server for intent recognition
- Implements remote (--intent-url) and command (--nlu-command)

Input Messages

hermes/nlu/query
- Recognize intent from text
rhasspy/nlu/<siteId>/train
- Retrain NLU system

Output Messages

hermes/intent/<intentName>
- Intent successfully recognized
hermes/nlu/intentNotRecognized
- Intent was not recognized
hermes/error/nlu
- Error during recognition/training
rhasspy/nlu/<siteId>/trainSuccess
- NLU training succeeded

Intent Handling

Dispatches recognized intents to home automation software. See Intent Handling for details.

Available Services

rhasspy-homeassistant-hermes
- Implements homeassistant
rhasspy-remote-http-hermes
- POSTs to remote web server for intent handling
- Implements remote (--handle-url) and command (--handle-command)

Input Messages

hermes/intent/<intentName>
- Intent successfully recognized
hermes/handle/toggleOn
- Enable intent handling
hermes/handle/toggleOff
- Disable intent handling

Output Messages

hermes/tts/say
- Speak a sentence

Text to Speech

Generates spoken audio for a sentence. See Text to Speech for details.

Available Services

rhasspy-tts-cli-hermes
- Calls external program for text to speech
- Implements espeak, flite, picoTTS, nanoTTS, marytts, opentts, and command
rhasspy-tts-larynx-hermes
- Uses Larynx text to speech system (based on MozillaTTS)
- Implements larynx
rhasspy-tts-wavenet-hermes
- Uses Google's WaveNet
- Implements wavenet
rhasspy-remote-http-hermes
- POSTs to remote web server for text to speech
- Implements remote (--tts-url) and command (--tts-command)

Input Messages

hermes/tts/say
- Speak a sentence
rhasspy/tts/getVoices
- Request available voices

Output Messages

hermes/tts/sayFinished
- Finished generating audio (actually spoken with playBytes)
rhasspy/tts/voices
- Description of available voices

Audio Output

Plays WAV audio through an audio output device (speakers). See Audio Output for details.

Available Services

rhasspy-speakers-cli-hermes
- Implements remote and command
- POSTs to remote web server for audio output

Input Messages

hermes/audioServer/<siteId>/playBytes/<requestId>
- WAV audio to play through speakers
hermes/audioServer/toggleOff
- Disable audio output
hermes/audioServer/toggleOn
- Enable audio output
rhasspy/audioServer/getDevices
- Request audio output devices

Output Messages

hermes/audioServer/<siteId>/playFinished
- Audio has finished playing
rhasspy/audioServer/devices
- Details of audio output devices

Rhasspy Supervisor

The rhasspy-supervisor tool transforms a Rhasspy profile.json file into:

A supervisord.conf file that can be run with supervisord
- Runs services on the local machine
- Requires you to have service executables in your PATH (e.g., rhasspy-server-hermes)
A docker-compose.yml file that can be run with docker-compose
- Runs services inside a virtual Docker network
- Requires Docker and docker-compose to be installed

When you start Rhasspy, it automatically runs rhasspy-supervisor to generate these files in your profile directory. From there, it depends on how you've installed Rhasspy.

Supervisord Restart

When running Rhasspy using supervisord, the process ID (PID) of the supervisord process will be written to a file named supervisord.pid in your profile directory. If a restart is requested from the web interface, a SIGHUP is sent to this PID, causing supervisord to re-read its configuration file and stop/start all child processes.

Docker Compose Restart

If you run Rhasspy using docker-compose, the restart process is a bit more complicated than with supervisord. This is due to the need to re-write docker-compose.yml on a profile change and bring the entire Docker container stack down and back up again.

A wrapper script like get-rhasspy.sh needs to monitor the profile directory for a file named .restart_docker. When a restart is requested via the web interface, this file is written and a timeout is set. The wrapper script should restart docker-compose (using down and then up), and then delete the .restart_docker file. Once its deleted, the web interface will reload the user's page.