Infrastructure layer
Two providers run specific technical functions in the background. OpenAI processes desktop embeddings so your vault can be searched. Deepgram transcribes audio for Meeting Copilot and chat mic.
Models Hub
Groq, Claude, Perplexity, DeepSeek, Kimi, Gemini, or Ollama. Pick the provider that fits the work, then switch when it changes.
How it works
ZSearch separates background infrastructure from the AI provider that answers your questions, so you always know what is running and why.
Two providers run specific technical functions in the background. OpenAI processes desktop embeddings so your vault can be searched. Deepgram transcribes audio for Meeting Copilot and chat mic.
This is where model choice lives. Groq, Claude, Perplexity, DeepSeek, Kimi, Gemini, and Ollama can each power AI chat, document questions, and AI Lens search.
Always on
These run automatically when their feature is used. No provider selection is needed for this layer.
Embeddings
Voice transcription
AI query providers
These providers power AI chat, document questions, and AI Lens search. Choose for the outcome you need, then switch providers when the task changes.
Speed
Groq keeps rapid research sessions moving.
Judgment
Claude is better when answer quality carries risk.
Sovereignty
Ollama keeps AI queries on your machine.
Lowest latency
Fast inference
AI chat and document questions with very low latency, so rapid follow-up questions feel natural.
Use when
Fast Q&A, high-volume document queries, and everyday research.
Highest confidence
Careful reasoning
Longer, more structured responses for complex document analysis and high-stakes questions.
Use when
Legal, research, finance, compliance, and analytical document work.
Current citations
Live web search
Combines private document answers with current web results and source citations.
Use when
Journalists, analysts, consultants, and researchers who need cited current answers.
Local-only path
Local AI queries
Connects ZSearch to a locally running LLM with no API key and no external AI query provider.
Use when
Maximum privacy, air-gapped workflows, security-sensitive teams, and local model control. AI queries stay on your machine. The only remaining external call is Deepgram, used only if you use Meeting Copilot or voice input.
Volume friendly
Efficient reasoning
Fast chat and step-by-step reasoning for large document workloads.
Use when
High-volume processing, lower-cost deployments, and transparent reasoning workflows.
Long document memory
Large context
Long document analysis with very large context windows, built for contracts, books, and research archives.
Use when
Long documents, complex reasoning, and high-quality answers without flagship pricing.
Visual reasoning
Vision and large context
Reads and reasons over images, charts, diagrams, scans, and very large collections.
Use when
Healthcare, architecture, finance, design, and visually rich document workflows.
Do not overthink the first choice. Start with the provider closest to your workflow and switch later if your needs change.
I just want to get started.
Fast, simple, and strong enough for most everyday document questions.
I am doing serious analytical work.
Best for contracts, compliance, research, finance, and answers with real consequences.
I run a small team handling sensitive documents.
Use Claude when quality matters, or choose a lower-cost option if budget matters more.
I need current web information.
It is the provider that combines your private vault with live web search.
I want great quality at lower cost.
Start with DeepSeek. Move to Kimi when very long documents are common.
My documents include images or scans.
Gemini can reason about visual material, not just extracted text.
I want AI queries fully local.
Your AI questions stay on your machine. Voice still uses Deepgram only when voice features are used.
Enterprise control
ZSearch Enterprise keeps model choice flexible while moving the sensitive compute layer inside your infrastructure. Use managed providers where they make sense, and local models where nothing should leave the perimeter.
Discuss deploymentRuns on your own infrastructure in Enterprise.
Use Ollama when answers must stay on the machine.
Route work to Groq, Claude, Perplexity, Gemini, and more.
Keep sensitive workflows aligned to your network boundary.
No lock-in
Switch providers as requirements change.
No hidden layer
Understand which compute runs where.
No forced cloud
Enterprise can keep every layer inside your boundary.
Start with one provider, switch when the work changes, and keep enterprise deployment open for full infrastructure control.