Audio Input

Rhasspy can listen to audio input from a local microphone or from a remote audio stream. Most of the local audio testing has been done with a USB PlayStation Eye camera.

PyAudio

Streams microphone data from a PyAudio device. This is the default audio input system, and should work with both ALSA and PulseAudio.

Add to your profile:

"microphone": {
  "system": "pyaudio",
  "pyaudio": {
    "device": "",
    "frames_per_buffer": 480
  }
}

Set microphone.pyaudio.device to a PyAudio device number or leave blank for the default device. Streams 30ms chunks of 16-bit, 16 kHz mono audio by default (480 frames).

See rhasspy.audio_recorder.PyAudioRecorder for details.

ALSA

Starts an arecord process locally and reads audio data from its standard out. Works best with ALSA.

Add to your profile:

"microphone": {
  "system": "arecord",
  "arecord": {
    "device": "",
    "chunk_size": 960
  }
}

Set microphone.arecord.device to the name of the ALSA device to use (-D flag to arecord) or leave blank for the default device. By default, calls arecord -t raw -r 16000 -f S16_LE -c 1 and reads 30ms (960 bytes) of audio data at a time.

See rhasspy.audio_recorder.ARecordAudioRecorder for details.

MQTT/Hermes

Listens to the hermes/audioServer/<SITE_ID>/audioFrame topic for WAV data (Hermes protocol). This allows Rhasspy to receive audio from Snips.AI. Audio data is automatically converted to 16-bit, 16 kHz mono with sox.

Add to your profile:

"microphone": {
  "system": "hermes"
},

"mqtt": {
  "enabled": true,
  "host": "localhost",
  "username": "",
  "port": 1883,
  "password": "",
  "site_id": "default",
  "tls": {
    "enabled": false,
    "ca_certs": "",
    "cert_reqs": "CERT_REQUIRED",
    "certfile": "",
    "ciphers": "",
    "keyfile": ""
  }
}

Adjust the mqtt configuration to connect to your MQTT broker. Set mqtt.site_id to match your Snips.AI siteId.

See rhasspy.audio_recorder.HermesAudioRecorder for details.

HTTP Stream

Accepts chunks of 16-bit 16 kHz mono audio via an HTTP POST stream (assumes chunked transfer encoding).

Add to your profile:

"microphone": {
  "system": "http",
  "http": {
    "host": "127.0.0.1",
    "port": 12333,
    "stop_after": "never"
  }
}

Set microphone.http.stop_after to one of "never", "text", or "intent". When set to "never", you can continuously stream (chunked) audio into Rhasspy across multiple voice commands. When set to "text" or "intent", the stream will be closed when the first voice command has been transcribed ("text") or recognized ("intent"). Once closed, you can perform an HTTP GET request to the stream URL to retrieve the result (text for transcriptions or JSON for intent).

Note that microphone.http.port must be different than Rhasspy's webserver port (usually 12101).

See rhasspy.audio_recorder.HTTPAudioRecorder for details.

GStreamer

Receives audio chunks via stdout from a GStreamer pipeline.

Add to your profile:

"microphone": {
  "system": "gstreamer",
  "gstreamer": {
    "pipeline": "...",
  }
}

Set microphone.gstreamer.pipeline to your GStreamer pipeline without a sink (this will be added by Rhasspy). By default, the pipeline is:

udpsrc port=12333 ! rawaudioparse use-sink-caps=false format=pcm pcm-format=s16le sample-rate=16000 num-channels=1 ! queue ! audioconvert ! audioresample

which "simply" receives raw 16-bit 16 kHz audio chunks via UDP port 12333. You could stream microphone audio to Rhasspy from another machine by running the following terminal command:

gst-launch-1.0 \
    autoaudiosrc ! \
    audioconvert ! \
    audioresample ! \
    audio/x-raw, rate=16000, channels=1, format=S16LE ! \
    udpsink host=RHASSPY_SERVER port=12333

where RHASSPY_SERVER is the hostname of your Rhasspy server (e.g., localhost).

The Rhasspy Docker images contains the "good" plugin set for GStreamer, which includes a wide variety of ways to stream/transform audio.

See rhasspy.audio_recorder.GStreamerAudioRecorder for details.

Dummy

Disables microphone recording.

Add to your profile:

"microphone": {
  "system": "dummy"
}

See rhasspy.audio_recorder.DummyAudioRecorder for details.