Phone Chat

Imagine giving your AI assistant its own phone number — one that anyone can call, from any phone in the world. No app download, no account, no screen required. Just dial, speak, and get an intelligent response spoken back to you.

Phone Chat makes this possible. It bridges Meggy to the public telephone network through Twilio, turning your desktop AI into a full phone agent that handles real calls over PSTN.

How It Works

When someone dials your Twilio number, the call flows through a multi-stage pipeline that converts telephone audio into text, processes it through Meggy's AI engine, and speaks the response back — all in real time.

1. Incoming Call

Twilio receives the call and sends a webhook to Meggy's local HTTP server. Meggy responds with TwiML instructions to connect the call via a WebSocket Media Stream. A greeting message — "Hello, how can I help you?" — plays automatically.

If Meggy is already on a call, the new caller is placed in a queue with a hold message. The queue holds up to 3 callers by default. If the queue is full, callers hear a polite busy message and are disconnected.

2. Audio Processing

Phone audio arrives as G.711 µ-law encoded audio at 8kHz — the standard telephone codec. Meggy decodes and resamples this to 16kHz PCM for the voice pipeline:

Direction	Source	Conversion	Target
Inbound	µ-law 8kHz	Decode → Resample	PCM 16kHz (for VAD + STT)
Outbound	PCM 24kHz (TTS)	Resample → Encode	µ-law 8kHz (for Twilio)

The codec uses box-average decimation for clean integer-ratio conversions — no ringing artifacts, just clear voice audio.

3. Speech Recognition & AI

Once the caller finishes speaking (detected by Voice Activity Detection), the audio is transcribed and sent through Meggy's full AI pipeline. Phone conversations have access to all the same capabilities as text chat — tools, memory, vault documents, and connected agents.

4. Spoken Response

The AI response is synthesized via TTS, resampled back down to 8kHz, encoded to µ-law, and streamed as 160-byte chunks back through the Twilio WebSocket. Responses are streamed sentence-by-sentence so the caller hears the answer progressively, not after a long pause.

Outbound Calls

Phone Chat isn't just for receiving calls. You can also tell Meggy to dial someone:

Click Dial in the Phone Agent settings
Enter any phone number in E.164 format
Meggy places the call through Twilio and starts the conversation when the other party answers

Key Differences from Voice Chat

Phone Chat shares the same core voice pipeline as Voice Chat, but with some important differences:

Feature	Voice Chat	Phone Chat
Transport	Local microphone/speaker	Twilio WebSocket (PSTN)
Wake word	"Hey Meggy" or push-to-talk	None — answering the call starts it
Audio format	Native PCM	G.711 µ-law (converted internally)
End signal	Stop speaking or close	Say "goodbye", "bye", or "hang up"
Queue	N/A (single user)	Up to 3 callers in queue
Approval UI	Voice-based	Not supported (audio-only)

Setting Up Phone Chat

Get a Twilio account — sign up at twilio.com and purchase a phone number
Add credentials — open Settings → Sound & Speech → Phone Agent and enter your Account SID, Auth Token, and phone number
Set up a tunnel — Twilio needs a public HTTPS URL. Use ngrok http 3456 during development
Configure the webhook — set your Twilio phone number's voice webhook to your tunnel URL
Start the agent — toggle On in the Phone Agent panel

Credentials are stored securely in your system keychain — never in plain text configuration files.

Security

Phone Chat takes security seriously:

Webhook signature validation — every Twilio request is verified with HMAC-SHA1
Keychain storage — Twilio credentials are stored in the OS-level secure keychain
XML escaping — all dynamic content in TwiML responses is sanitized
Stale connection cleanup — calls alive longer than 30 minutes are automatically reaped
Error hangup — if AI generation fails, the call is hung up immediately so the caller isn't left in silence