Voice Chat

You're cooking dinner and your hands are covered in flour. You need to know how long to roast the chicken at 375°. You could wash your hands, dry them, unlock your phone, type the question... or you could just say "Hey Meggy, how long do I roast a chicken at 375?" and get an answer spoken back to you.

Voice Chat turns Meggy into a hands-free assistant. It listens for your voice, understands what you say, thinks about the answer using the same powerful AI engine behind text conversations, and speaks the response back to you. Same tools, same memory, same intelligence — just no keyboard required.

How It Works

Voice Chat is built on a four-stage pipeline:

1. Wake Word Detection

Meggy listens for its wake word — "Hey Meggy" — using a lightweight local detection model. This runs continuously in the background without sending any audio to the cloud. When the wake word is detected, the microphone activates and recording begins.

You can also use push-to-talk mode if you prefer — hold a key to speak, release to send. Both modes are available in settings.

2. Speech-to-Text (STT)

Once you finish speaking, your audio is transcribed into text. Meggy supports multiple STT providers:

Provider Model Runs Locally?
OpenAI Whisper Cloud
Google Cloud Speech-to-Text Cloud
Local Whisper.cpp ✅ Yes — fully on-device

If privacy is your priority, the local Whisper option means your voice never leaves your machine.

3. AI Processing

The transcribed text is processed through the exact same AI pipeline as typed messages. This means Voice Chat has access to:

You can ask voice questions that trigger tool calls — "What's the weather like?", "Turn off the bedroom lights", "Add milk to my shopping list" — and Meggy will use the appropriate tools to fulfill the request.

4. Text-to-Speech (TTS)

The AI's response is spoken back to you using natural-sounding voice synthesis:

Provider Voices Quality
ElevenLabs Hundreds of natural voices Premium, highly expressive
OpenAI 6 built-in voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer) High quality, fast
Google Cloud TTS with multiple languages Good quality, wide language support

You can choose your preferred voice in settings, adjusting speed, pitch, and provider.

Voice Activity Detection (VAD)

Meggy uses Voice Activity Detection to know when you've finished speaking. VAD analyzes the audio stream in real time to detect speech boundaries — it knows when you start talking and when you stop, so it doesn't cut you off mid-sentence or wait awkwardly after you've finished.

Platform Support

Voice Chat works on all supported platforms:

Setting Up Voice Chat

  1. Open Settings → Voice Chat
  2. Choose your STT provider (OpenAI Whisper recommended for best accuracy)
  3. Choose your TTS provider (ElevenLabs for the most natural voices)
  4. Select a voice from the provider's catalog
  5. Toggle wake word detection on if you want hands-free activation
  6. Start talking!

Voice Chat integrates with all of Meggy's channels — you can start a voice conversation on desktop and continue it via text on WhatsApp, or vice versa. It's all the same conversation, the same memory, the same assistant.