Vault — Multimodal RAG Document Store

Tax season rolls around and you're hunting through folders for last year's W-2. Your kid's school sends a permission slip that you need to find again three weeks later. The mechanic gives you a receipt that you swear you saved somewhere.

The Vault solves this. It's Meggy's local document store — drop in your files, and your assistant organizes, indexes, and retrieves them for you. Ask "find my car insurance policy" or "what was the total on that restaurant receipt from December?" and Meggy searches through everything instantly.

It supports 25+ file formats, processes them into semantically chunked segments, and makes them searchable through a hybrid retrieval pipeline — all stored locally in SQLite. Your documents never leave your machine.

Supported Formats

The document parser handles a wide range of formats:

Category Formats
Documents PDF, DOCX, TXT, RTF, MD, HTML, JSON, EML
Spreadsheets XLSX, CSV
Presentations PPTX
Code JS, TS, PY, Java, C, C++, Go, Rust, and more
Markup HTML, XML, JSON, YAML
Images PNG, JPG, WEBP, SVG, GIF, BMP (with vision-based captioning)
Audio MP3, WAV, M4A, OGG (transcribed via STT model role)

How Ingestion Works

When you add a file to the Vault, it goes through a multi-stage pipeline:

  1. Format detection — The parser identifies the file type and selects the appropriate extraction strategy
  2. Content extraction — Text, tables, and structured data are extracted from the document
  3. Semantic chunking — Content is split into 400–600 token chunks using semantic boundaries (headings, paragraphs, code blocks)
  4. Embedding generation — Each chunk is embedded using the model assigned to the embedding role. For media files (images, audio), the system can embed the binary content natively when the embedding model supports multimodal input — otherwise it falls back to embedding the text caption or transcript

Note: Multimodal embedding (native binary input for images and audio) currently requires a Gemini embedding model. Other providers do not yet expose multimodal embedding APIs, so media files will be embedded using their text representation (caption or transcript) when a non-Gemini model is selected.

  1. Storage — Chunks, embeddings, and metadata are persisted in SQLite

You can drag and drop files, use the file picker, paste a URL, or let the AI ingest documents programmatically during conversations.

URL & YouTube Ingestion

Paste a URL and the Vault downloads, parses, and indexes the page content — including YouTube videos (transcripts extracted automatically). All fetches go through an SSRF protection layer that blocks requests to internal networks, so it's safe even on shared machines.

The Vault can also watch URLs you've previously ingested and re-crawl them on a schedule, creating a new version when the content changes.

Hybrid Search

Finding documents is more than just keyword matching. The Vault uses a two-stage hybrid retrieval pipeline that understands both meaning and exact terms:

Stage 1 — Parallel retrieval:

Stage 2 — Reciprocal Rank Fusion (RRF): Both result sets are merged using RRF, which weights each result based on its rank position across both retrieval methods. This produces a single, reranked list that captures both semantic relevance and term precision.

Smart Folders & Watched Directories

Smart Folders can be configured with auto-population rules. When new documents are ingested, they're automatically assigned to matching folders based on content type, keywords, or custom filters. Set up folders like "Medical," "Financial," or "Recipes," and Meggy sorts incoming documents for you.

Watched Directories let you link a local folder on your computer directly to the Vault. Not only are files automatically ingested as you add them, but the entire internal directory structure is mirrored. If you have nested subfolders in your OS, they appear as nested folders in the Vault. Delete a folder on your computer, and the Vault stays perfectly in sync.

Knowledge Graph

Every document you add is automatically scanned for entities — people, organizations, dates, locations, concepts, and events. The Vault builds a local knowledge graph that connects these entities across all your documents, so you can ask questions like "which documents mention Dr. Smith?" or "what's related to our home insurance?"

Intelligence Layer

When the AI searches the Vault, it doesn't just return raw text chunks. The Intelligence Layer includes:

Audio Overviews

Generate a podcast-style audio summary of any document or group of documents. The Vault creates a conversational script and synthesizes it with TTS — perfect for catching up on long reports while commuting or cooking.

Workspaces & Versioning

Workspaces are virtual collections that let you group documents by project or theme without moving files. A document can belong to multiple workspaces.

Versioning tracks changes to ingested documents non-destructively. When a file is re-ingested or a URL is re-crawled, the Vault creates a new version snapshot. You can view the full history, compare any two versions, and restore a previous version at any time.

AI-Callable Tools

The Vault exposes 13 tools that the AI assistant can invoke during conversations:

Plus folder management, smart folders, and media-specific ingestion tools.

This means the AI can autonomously pull context from your documents during a conversation — ask "what does my lease say about pets?" and Meggy searches the vault, finds the relevant section, cites the exact page, and gives you the answer.

Vault Shortcuts

Five pre-built slash commands give you quick access to common vault workflows:

Command What it does
/summarize-vault Summarize all vault documents or a specific folder
/compare-docs Compare two vault documents side-by-side
/ask-vault Ask a question answered from vault knowledge
/find-related Find documents related to a given topic
/vault-gaps Identify gaps or missing topics in your collection

What's Next?