Vault — Multimodal RAG Document Store

Tax season rolls around and you're hunting through folders for last year's W-2. Your kid's school sends a permission slip that you need to find again three weeks later. The mechanic gives you a receipt that you swear you saved somewhere.

The Vault solves this. It's Meggy's local document store — drop in your files, and your assistant organizes, indexes, and retrieves them for you. Ask "find my car insurance policy" or "what was the total on that restaurant receipt from December?" and Meggy searches through everything instantly.

It supports 25+ file formats, processes them into semantically chunked segments, and makes them searchable through a hybrid retrieval pipeline — all stored locally in SQLite. Your documents never leave your machine.

Supported Formats

The document parser handles a wide range of formats:

Category Formats
Documents PDF, DOCX, DOC, TXT, RTF, ODT, EPUB
Spreadsheets XLSX, XLS, CSV, TSV
Presentations PPTX, PPT, ODP
Code JS, TS, PY, Java, C, C++, Go, Rust, and more
Markup MD, HTML, XML, JSON, YAML
Images PNG, JPG, WEBP, SVG (with OCR + vision extraction)
Audio MP3, WAV, M4A, OGG (transcribed via STT model role)

How Ingestion Works

When you add a file to the Vault, it goes through a multi-stage pipeline:

  1. Format detection — The parser identifies the file type and selects the appropriate extraction strategy
  2. Content extraction — Text, tables, and structured data are extracted from the document
  3. Semantic chunking — Content is split into 400–600 token chunks using semantic boundaries (headings, paragraphs, code blocks)
  4. Embedding generation — Each chunk is embedded using the model assigned to the embedding role (e.g., text-embedding-3-small or Nomic)
  5. Storage — Chunks, embeddings, and metadata are persisted in SQLite

You can drag and drop files, use the file picker, or let the AI ingest documents programmatically during conversations.

Hybrid Search

Finding documents is more than just keyword matching. The Vault uses a two-stage hybrid retrieval pipeline that understands both meaning and exact terms:

Stage 1 — Parallel retrieval:

Stage 2 — Reciprocal Rank Fusion (RRF): Both result sets are merged using RRF, which weights each result based on its rank position across both retrieval methods. This produces a single, reranked list that captures both semantic relevance and term precision.

Smart Folders

Vault folders can be configured with auto-population rules. When new documents are ingested, they're automatically assigned to matching folders based on content type, keywords, or custom filters. Set up folders like "Medical," "Financial," or "Recipes," and Meggy sorts incoming documents for you.

AI-Callable Tools

The Vault exposes tools that the AI assistant can invoke during conversations:

This means the AI can autonomously pull context from your documents during a conversation — ask "what does my lease say about pets?" and Meggy searches the vault, finds the relevant section, and gives you the answer.