Phone Chat

Imagine giving your AI assistant its own phone number — one that anyone can call, from any phone in the world. No app download, no account, no screen required. Just dial, speak, and get an intelligent response spoken back to you.

Phone Chat makes this possible. It bridges Meggy to the public telephone network through Twilio, turning your desktop AI into a full phone agent that handles real calls over PSTN.

How It Works

When someone dials your Twilio number, the call flows through a multi-stage pipeline that converts telephone audio into text, processes it through Meggy's AI engine, and speaks the response back — all in real time.

1. Incoming Call

Twilio receives the call and sends a webhook to Meggy's local HTTP server. Meggy responds with TwiML instructions to connect the call via a WebSocket Media Stream. A greeting message — "Hello, how can I help you?" — plays automatically.

If Meggy is already on a call, the new caller is placed in a queue with a hold message. The queue holds up to 3 callers by default. If the queue is full, callers hear a polite busy message and are disconnected.

2. Audio Processing

Phone audio arrives as G.711 µ-law encoded audio at 8kHz — the standard telephone codec. Meggy decodes and resamples this to 16kHz PCM for the voice pipeline:

Direction Source Conversion Target
Inbound µ-law 8kHz Decode → Resample PCM 16kHz (for VAD + STT)
Outbound PCM 24kHz (TTS) Resample → Encode µ-law 8kHz (for Twilio)

The codec uses box-average decimation for clean integer-ratio conversions — no ringing artifacts, just clear voice audio.

3. Speech Recognition & AI

Once the caller finishes speaking (detected by Voice Activity Detection), the audio is transcribed and sent through Meggy's full AI pipeline. Phone conversations have access to all the same capabilities as text chat — tools, memory, vault documents, and connected agents.

4. Spoken Response

The AI response is synthesized via TTS, resampled back down to 8kHz, encoded to µ-law, and streamed as 160-byte chunks back through the Twilio WebSocket. Responses are streamed sentence-by-sentence so the caller hears the answer progressively, not after a long pause.

Outbound Calls

Phone Chat isn't just for receiving calls. You can also tell Meggy to dial someone:

Key Differences from Voice Chat

Phone Chat shares the same core voice pipeline as Voice Chat, but with some important differences:

Feature Voice Chat Phone Chat
Transport Local microphone/speaker Twilio WebSocket (PSTN)
Wake word "Hey Meggy" or push-to-talk None — answering the call starts it
Audio format Native PCM G.711 µ-law (converted internally)
End signal Stop speaking or close Say "goodbye", "bye", or "hang up"
Queue N/A (single user) Up to 3 callers in queue
Approval UI Voice-based Not supported (audio-only)

Setting Up Phone Chat

  1. Get a Twilio account — sign up at twilio.com and purchase a phone number
  2. Add credentials — open Settings → Sound & Speech → Phone Agent and enter your Account SID, Auth Token, and phone number
  3. Set up a tunnel — Twilio needs a public HTTPS URL. Use ngrok http 3456 during development
  4. Configure the webhook — set your Twilio phone number's voice webhook to your tunnel URL
  5. Start the agent — toggle On in the Phone Agent panel

Credentials are stored securely in your system keychain — never in plain text configuration files.

Security

Phone Chat takes security seriously: