Audio Input
Rhasspy can listen to audio input from a local microphone or a remote audio stream. Most of the local audio testing has been done with a USB PlayStation Eye camera.
MQTT/Hermes
Rhasspy receives audio over MQTT using the Hermes protocol: specifically, audio chunks in the WAV format on the topic hermes/audioServer/<siteId>/audioFrame
To avoid unnecessary conversion overhead, the WAV audio should be 16-bit 16Khz mono.
PulseAudio
See this example for a way use Rhasspy with PulseAudio.
Set your audio input system to ALSA, which will pass through to PulseAudio.
PyAudio
Streams microphone data from a PyAudio device. This is the default audio input system, and should work with both ALSA and PulseAudio.
Add to your profile:
"microphone": {
"system": "pyaudio",
"pyaudio": {
"device": ""
}
}
Set microphone.pyaudio.device
to a PyAudio device number or leave blank for the default device.
Streams 2048 byte chunks of 16-bit, 16 kHz mono audio by default.
UDP Audio Streaming
By default, audio will streamed over MQTT in WAV chunks. When using Rhasspy in a base station/satellite setup, it may be desirable to only send audio to the MQTT broker after the satellite as woken up. For this case, set both microphone.pyaudio.udp_audio
and wake.<WAKE_SYSTEM>.udp_audio
to the same free port number on your satellite. This will cause the microphone service to stream over UDP until an asr/startListening
message is received. It will go back to UDP stream when an asr/stopListening
.
Implemented by rhasspy-microphone-pyaudio-hermes
ALSA
Starts an arecord
process locally and reads audio data from its standard out.
Works best with ALSA.
Add to your profile:
"microphone": {
"system": "arecord",
"arecord": {
"device": ""
}
}
Set microphone.arecord.device
to the name of the ALSA device to use (-D
flag
to arecord
) or leave blank for the default device.
By default, calls arecord -t raw -r 16000 -f S16_LE -c 1
and reads 2048 byte chunks of audio data at a time.
UDP Audio Streaming
By default, audio will streamed over MQTT in WAV chunks. When using Rhasspy in a base station/satellite setup, it may be desirable to only send audio to the MQTT broker after the satellite as woken up. For this case, set both microphone.arecord.udp_audio
and wake.<WAKE_SYSTEM>.udp_audio
to the same free port number on your satellite. This will cause the microphone service to stream over UDP until an asr/startListening
message is received. It will go back to UDP stream when an asr/stopListening
.
Implemented by rhasspy-microphone-cli-hermes
Command
Calls an external program to record audio. RAW audio data is expected from the program's standard out.
Add to your profile:
"microphone": {
"system": "command",
"command": {
"record_program": "/path/to/record/program",
"record_arguments": [],
"sample_rate": 16000,
"sample_width": 2,
"channels": 1,
"list_program": "/path/to/list/program",
"list_arguments": [],
"test_program": "/path/to/test/program",
"test_arguments": []
}
}
The microphone.command.record_program
is executed when Rhasspy starts. It should output raw PCM audio data on its standard out. The sample_rate
(Hertz), sample_width
(bytes), and channels
parameters tell Rhasspy the format of the raw audio data.
If provided, the microphone.command.list_program
will be executed when a rhasspy/audioServer/getDevices
message is received and the test
field is false
. The program should return a listing of available audio output devices in the same format as arecord -L
.
If provided, the microphone.command.test_program
will be executed when a rhasspy/audioServer/getDevices
message is received and the test
field is true
. This program is called for each device returned by list_command
. The test_program
and its arguments are send to Python's str.format
with the device name as the only argument, so {}
in test_program
or test_arguments
will be replaced with it.
Implemented by rhasspy-microphone-cli-hermes
GStreamer
As of Rhasspy 2.5, you can use gstreamer
through the command microphone system.
Add to your profile:
"microphone": {
"system": "command",
"command": {
"record_program": "gst-launch-1.0",
"record_arguments": "udpsrc port=12333 ! rawaudioparse use-sink-caps=false format=pcm pcm-format=s16le sample-rate=16000 num-channels=1 ! queue ! audioconvert ! audioresample ! filesink location=/dev/stdout",
"sample_rate": 16000,
"sample_width": 2,
"channels": 1
}
}
This command receives raw 16-bit 16 kHz audio chunks via UDP port 12333. If you're using Docker, make sure to add -p 12333:12333/udp
to your docker run
command.
You can then stream microphone audio to Rhasspy from another computer by running the following terminal command:
gst-launch-1.0 \
autoaudiosrc ! \
audioconvert ! \
audioresample ! \
audio/x-raw, rate=16000, channels=1, format=S16LE ! \
udpsink host=RHASSPY_SERVER port=12333
where RHASSPY_SERVER
is the hostname of your Rhasspy server (e.g., localhost
). You may need to install the gstreamer1.0-tools
and gstreamer1.0-plugins-good
packages first.
The official Rhasspy Docker image contains the "good" plugin set for GStreamer, which includes a wide variety of ways to stream/transform audio.
Dummy
Disables microphone recording.
Add to your profile:
"microphone": {
"system": "dummy"
}
See rhasspy.audio_recorder.DummyAudioRecorder
for details.