Imagine giving your AI assistant its own phone number — one that anyone can call, from any phone in the world. No app download, no account, no screen required. Just dial, speak, and get an intelligent response spoken back to you.
Phone Chat makes this possible. It bridges Meggy to the public telephone network through Twilio, turning your desktop AI into a full phone agent that handles real calls over PSTN.
When someone dials your Twilio number, the call flows through a multi-stage pipeline that converts telephone audio into text, processes it through Meggy's AI engine, and speaks the response back — all in real time.
Twilio receives the call and sends a webhook to Meggy's local HTTP server. Meggy responds with TwiML instructions to connect the call via a WebSocket Media Stream. A greeting message — "Hello, how can I help you?" — plays automatically.
If Meggy is already on a call, the new caller is placed in a queue with a hold message. The queue holds up to 3 callers by default. If the queue is full, callers hear a polite busy message and are disconnected.
Phone audio arrives as G.711 µ-law encoded audio at 8kHz — the standard telephone codec. Meggy decodes and resamples this to 16kHz PCM for the voice pipeline:
| Direction | Source | Conversion | Target |
|---|---|---|---|
| Inbound | µ-law 8kHz | Decode → Resample | PCM 16kHz (for VAD + STT) |
| Outbound | PCM 24kHz (TTS) | Resample → Encode | µ-law 8kHz (for Twilio) |
The codec uses box-average decimation for clean integer-ratio conversions — no ringing artifacts, just clear voice audio.
Once the caller finishes speaking (detected by Voice Activity Detection), the audio is transcribed and sent through Meggy's full AI pipeline. Phone conversations have access to all the same capabilities as text chat — tools, memory, vault documents, and connected agents.
The AI response is synthesized via TTS, resampled back down to 8kHz, encoded to µ-law, and streamed as 160-byte chunks back through the Twilio WebSocket. Responses are streamed sentence-by-sentence so the caller hears the answer progressively, not after a long pause.
Phone Chat isn't just for receiving calls. You can also tell Meggy to dial someone:
Phone Chat shares the same core voice pipeline as Voice Chat, but with some important differences:
| Feature | Voice Chat | Phone Chat |
|---|---|---|
| Transport | Local microphone/speaker | Twilio WebSocket (PSTN) |
| Wake word | "Hey Meggy" or push-to-talk | None — answering the call starts it |
| Audio format | Native PCM | G.711 µ-law (converted internally) |
| End signal | Stop speaking or close | Say "goodbye", "bye", or "hang up" |
| Queue | N/A (single user) | Up to 3 callers in queue |
| Approval UI | Voice-based | Not supported (audio-only) |
ngrok http 3456 during developmentCredentials are stored securely in your system keychain — never in plain text configuration files.
Phone Chat takes security seriously: