Vault — Multimodal RAG Document Store

Tax season rolls around and you're hunting through folders for last year's W-2. Your kid's school sends a permission slip that you need to find again three weeks later. The mechanic gives you a receipt that you swear you saved somewhere.

The Vault solves this. It's Meggy's local document store — drop in your files, and your assistant organizes, indexes, and retrieves them for you. Ask "find my car insurance policy" or "what was the total on that restaurant receipt from December?" and Meggy searches through everything instantly.

It supports 25+ file formats, processes them into semantically chunked segments, and makes them searchable through a hybrid retrieval pipeline — all stored locally in SQLite. Your documents never leave your machine.

Supported Formats

The document parser handles a wide range of formats:

Category	Formats
Documents	PDF, DOCX, DOC, TXT, RTF, ODT, EPUB
Spreadsheets	XLSX, XLS, CSV, TSV
Presentations	PPTX, PPT, ODP
Code	JS, TS, PY, Java, C, C++, Go, Rust, and more
Markup	MD, HTML, XML, JSON, YAML
Images	PNG, JPG, WEBP, SVG (with OCR + vision extraction)
Audio	MP3, WAV, M4A, OGG (transcribed via STT model role)

How Ingestion Works

When you add a file to the Vault, it goes through a multi-stage pipeline:

Format detection — The parser identifies the file type and selects the appropriate extraction strategy
Content extraction — Text, tables, and structured data are extracted from the document
Semantic chunking — Content is split into 400–600 token chunks using semantic boundaries (headings, paragraphs, code blocks)
Embedding generation — Each chunk is embedded using the model assigned to the embedding role (e.g., text-embedding-3-small or Nomic)
Storage — Chunks, embeddings, and metadata are persisted in SQLite

You can drag and drop files, use the file picker, or let the AI ingest documents programmatically during conversations.

Hybrid Search

Finding documents is more than just keyword matching. The Vault uses a two-stage hybrid retrieval pipeline that understands both meaning and exact terms:

Stage 1 — Parallel retrieval:

Vector search — Cosine similarity against stored embeddings finds semantically related chunks (e.g., searching for "health coverage" finds your "medical insurance" document)
FTS5 keyword search — SQLite's full-text search engine finds exact term matches (e.g., searching for "W-2" finds that exact form)

Stage 2 — Reciprocal Rank Fusion (RRF): Both result sets are merged using RRF, which weights each result based on its rank position across both retrieval methods. This produces a single, reranked list that captures both semantic relevance and term precision.

Smart Folders

Vault folders can be configured with auto-population rules. When new documents are ingested, they're automatically assigned to matching folders based on content type, keywords, or custom filters. Set up folders like "Medical," "Financial," or "Recipes," and Meggy sorts incoming documents for you.

AI-Callable Tools

The Vault exposes tools that the AI assistant can invoke during conversations:

vault_search — Search the vault for relevant documents given a natural language query
vault_ingest — Add a new document to the vault from a file path or URL
vault_list — List documents and folders in the vault
vault_delete — Remove a document from the vault

This means the AI can autonomously pull context from your documents during a conversation — ask "what does my lease say about pets?" and Meggy searches the vault, finds the relevant section, and gives you the answer.