Models Hub

AI models inside ZSearch, without the lock-in.

Groq, Claude, Perplexity, DeepSeek, Kimi, Gemini, or Ollama. Pick the provider that fits the work, then switch when it changes.

How it works

Two layers, one clear boundary

ZSearch separates background infrastructure from the AI provider that answers your questions, so you always know what is running and why.

Infrastructure layer

Two providers run specific technical functions in the background. OpenAI processes desktop embeddings so your vault can be searched. Deepgram transcribes audio for Meeting Copilot and chat mic.

AI query layer

This is where model choice lives. Groq, Claude, Perplexity, DeepSeek, Kimi, Gemini, and Ollama can each power AI chat, document questions, and AI Lens search.

Always on

Infrastructure providers

These run automatically when their feature is used. No provider selection is needed for this layer.

Embeddings

OpenAI

What it powers: Makes your Knowledge Vault searchable so ZSearch can find the right passages when you ask a question.
Why it is here: OpenAI embeddings support the desktop search layer. They run when documents are added, not every time you ask a question.
Best for: All ZSearch Desktop users. Enterprise can run embedding compute inside your own infrastructure.

Provider details

Voice transcription

Deepgram

What it powers: Transcribes microphone and system audio for Meeting Copilot and chat mic input.
Why it is here: Deepgram is built for accurate, real-time transcription, including multiple speakers.
Best for: Anyone using Meeting Copilot or voice input. If you do not use voice features, Deepgram is not called.

Provider details

AI query providers

Route each question to the model that fits the work.

These providers power AI chat, document questions, and AI Lens search. Choose for the outcome you need, then switch providers when the task changes.

Speed

Groq keeps rapid research sessions moving.

Judgment

Claude is better when answer quality carries risk.

Sovereignty

Ollama keeps AI queries on your machine.

Lowest latency

Groq

Fast inference

AI chat and document questions with very low latency, so rapid follow-up questions feel natural.

Use when

Fast Q&A, high-volume document queries, and everyday research.

Provider details

Highest confidence

Anthropic Claude

Careful reasoning

Longer, more structured responses for complex document analysis and high-stakes questions.

Use when

Legal, research, finance, compliance, and analytical document work.

Provider details

Current citations

Perplexity

Live web search

Combines private document answers with current web results and source citations.

Use when

Journalists, analysts, consultants, and researchers who need cited current answers.

Provider details

Local-only path

Ollama

Local AI queries

Connects ZSearch to a locally running LLM with no API key and no external AI query provider.

Use when

Maximum privacy, air-gapped workflows, security-sensitive teams, and local model control. AI queries stay on your machine. The only remaining external call is Deepgram, used only if you use Meeting Copilot or voice input.

Provider details

Volume friendly

DeepSeek

Efficient reasoning

Fast chat and step-by-step reasoning for large document workloads.

Use when

High-volume processing, lower-cost deployments, and transparent reasoning workflows.

Provider details

Long document memory

Moonshot Kimi

Large context

Long document analysis with very large context windows, built for contracts, books, and research archives.

Use when

Long documents, complex reasoning, and high-quality answers without flagship pricing.

Provider details

Visual reasoning

Google Gemini

Vision and large context

Reads and reasons over images, charts, diagrams, scans, and very large collections.

Use when

Healthcare, architecture, finance, design, and visually rich document workflows.

Provider details

Which model should you choose?

Do not overthink the first choice. Start with the provider closest to your workflow and switch later if your needs change.

I just want to get started.

Choose Groq.

Fast, simple, and strong enough for most everyday document questions.

I am doing serious analytical work.

Choose Anthropic Claude.

Best for contracts, compliance, research, finance, and answers with real consequences.

I run a small team handling sensitive documents.

Start with Claude.

Use Claude when quality matters, or choose a lower-cost option if budget matters more.

I need current web information.

Choose Perplexity.

It is the provider that combines your private vault with live web search.

I want great quality at lower cost.

Choose DeepSeek or Kimi.

Start with DeepSeek. Move to Kimi when very long documents are common.

My documents include images or scans.

Choose Google Gemini.

Gemini can reason about visual material, not just extracted text.

I want AI queries fully local.

Choose Ollama.

Your AI questions stay on your machine. Voice still uses Deepgram only when voice features are used.

Enterprise control

Sovereignty is a deployment choice, not a provider promise.

ZSearch Enterprise keeps model choice flexible while moving the sensitive compute layer inside your infrastructure. Use managed providers where they make sense, and local models where nothing should leave the perimeter.

Discuss deployment

Embedding compute

Runs on your own infrastructure in Enterprise.

Local AI queries

Use Ollama when answers must stay on the machine.

Provider optionality

Route work to Groq, Claude, Perplexity, Gemini, and more.

Perimeter control

Keep sensitive workflows aligned to your network boundary.

No lock-in

Switch providers as requirements change.

No hidden layer

Understand which compute runs where.

No forced cloud

Enterprise can keep every layer inside your boundary.

Your AI. Your models. Your call.

Start with one provider, switch when the work changes, and keep enterprise deployment open for full infrastructure control.

Try free