Voice Chat

You're cooking dinner and your hands are covered in flour. You need to know how long to roast the chicken at 375°. You could wash your hands, dry them, unlock your phone, type the question... or you could just say "Hey Meggy, how long do I roast a chicken at 375?" and get an answer spoken back to you.

Voice Chat turns Meggy into a hands-free assistant. It listens for your voice, understands what you say, thinks about the answer using the same powerful AI engine behind text conversations, and speaks the response back to you. Same tools, same memory, same intelligence — just no keyboard required.

How It Works

Voice Chat is built on a four-stage pipeline:

1. Wake Word Detection

Meggy listens for its wake word — "Hey Meggy" — using a lightweight local detection model. This runs continuously in the background without sending any audio to the cloud. When the wake word is detected, the microphone activates and recording begins.

You can also use push-to-talk mode if you prefer — hold a key to speak, release to send. Both modes are available in settings.

2. Speech-to-Text (STT)

Once you finish speaking, your audio is transcribed into text. Meggy supports multiple STT providers:

Provider	Model	Runs Locally?
OpenAI	Whisper	Cloud
Google	Cloud Speech-to-Text	Cloud
Local	Whisper.cpp	✅ Yes — fully on-device

If privacy is your priority, the local Whisper option means your voice never leaves your machine.

3. AI Processing

The transcribed text is processed through the exact same AI pipeline as typed messages. This means Voice Chat has access to:

All 110+ built-in tools
Your vault documents
Unified memory (facts, preferences, episodes)
Active skills
Connected agents

You can ask voice questions that trigger tool calls — "What's the weather like?", "Turn off the bedroom lights", "Add milk to my shopping list" — and Meggy will use the appropriate tools to fulfill the request.

4. Text-to-Speech (TTS)

The AI's response is spoken back to you using natural-sounding voice synthesis:

Provider	Voices	Quality
ElevenLabs	Hundreds of natural voices	Premium, highly expressive
OpenAI	6 built-in voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer)	High quality, fast
Google	Cloud TTS with multiple languages	Good quality, wide language support

You can choose your preferred voice in settings, adjusting speed, pitch, and provider.

Voice Activity Detection (VAD)

Meggy uses Voice Activity Detection to know when you've finished speaking. VAD analyzes the audio stream in real time to detect speech boundaries — it knows when you start talking and when you stop, so it doesn't cut you off mid-sentence or wait awkwardly after you've finished.

Platform Support

Voice Chat works on all supported platforms:

macOS — Full support including wake word
Windows — Full support including wake word
Linux — Full support including wake word

Setting Up Voice Chat

Open Settings → Voice Chat
Choose your STT provider (OpenAI Whisper recommended for best accuracy)
Choose your TTS provider (ElevenLabs for the most natural voices)
Select a voice from the provider's catalog
Toggle wake word detection on if you want hands-free activation
Start talking!

Voice Chat integrates with all of Meggy's channels — you can start a voice conversation on desktop and continue it via text on WhatsApp, or vice versa. It's all the same conversation, the same memory, the same assistant.