TL;DR - Key Takeaways
- Zvec is an embedded vector database—runs in-process like SQLite; no server or daemon. Install with
pip install zvecand use from Python. - Built for edge and on-device RAG—local knowledge bases, semantic search, and agent workloads on laptops, mobile, and constrained hardware.
- Backed by Proxima—Alibaba’s production vector engine; Apache 2.0, Python 3.10–3.12 on Linux x86_64/ARM64 and macOS ARM64.
- Performance—over 8,000 QPS in VectorDBBench (Cohere 10M), more than 2× the previous leaderboard leader, with lower index build time.
- RAG-ready—full CRUD, schema evolution, multi-vector retrieval, built-in reranking (weighted fusion, RRF), and scalar–vector hybrid search with optional inverted indexes.
Why Embedded Vector Search Matters for RAG
RAG and semantic search need more than a bare vector index. You need vectors, scalar fields, full CRUD, and safe persistence. Local knowledge bases change as files, notes, and project state change—so your storage has to keep up without you building a custom layer on top of an index library.
Index-only libraries (e.g. Faiss) give you approximate nearest neighbor search but not scalar storage, crash recovery, or hybrid queries. Embedded extensions (e.g. DuckDB-VSS) add vector search to a DB but often expose fewer index and quantization options and weaker resource control for edge. Service-based systems (e.g. Milvus or managed vector clouds) need network calls and separate deployment—usually overkill for a desktop tool, mobile app, or CLI.


Zvec targets that gap: a vector-native, embedded engine with persistence, resource controls, and RAG-oriented features in a single library.
Core Architecture: In-Process and Vector-Native
Zvec runs as a library inside your process. There is no external server or RPC. You install it with pip install zvec, open collections from Python, and define schemas, insert documents, and run queries through the same API.
Under the hood it uses Proxima, Alibaba’s high-performance vector search engine. Zvec wraps Proxima with a simpler API and an embedded runtime. The project is Apache 2.0 and currently supports Python 3.10–3.12 on Linux x86_64, Linux ARM64, and macOS ARM64. The GitHub repo also offers a Node.js SDK (npm install @zvec/zvec) so you can embed vector search in JS/TS apps. From the repo: Zvec works with dense and sparse vectors and supports multi-vector queries in a single call; it’s built to run wherever your code runs—notebooks, servers, CLI tools, or edge devices.
Design goals are explicit:
- Embedded—runs in-process; no network, no standalone service.
- Vector-native—indexing and storage built for vector workloads.
- Production-ready—persistence and crash safety for zero-ops and edge deployments.
Developer Workflow: Install to Semantic Search in Minutes
The path from install to query is short:
- Install:
pip install zvec - Define a
CollectionSchema(one or more vector fields, optional scalars). - Create or open a collection on disk with
create_and_open. - Insert
Docobjects (id, vectors, and optional scalar attributes). - Build an index and run a
VectorQueryfor nearest-neighbor search.
Example:
import zvec
# Define collection schema
schema = zvec.CollectionSchema(
name="example",
vectors=zvec.VectorSchema("embedding", zvec.DataType.VECTOR_FP32, 4),
)
# Create collection
collection = zvec.create_and_open(path="./zvec_example", schema=schema)
# Insert documents
collection.insert([
zvec.Doc(id="doc_1", vectors={"embedding": [0.1, 0.2, 0.3, 0.4]}),
zvec.Doc(id="doc_2", vectors={"embedding": [0.2, 0.3, 0.4, 0.1]}),
])
# Search by vector similarity
results = collection.query(
zvec.VectorQuery("embedding", vector=[0.4, 0.3, 0.3, 0.1]),
topk=10
)
# Results: list of {'id': str, 'score': float, ...}, sorted by relevance
print(results)Results are dictionaries with IDs and similarity scores—enough to wire a local semantic search or RAG retrieval layer to any embedding model.
Performance: VectorDBBench and 8,000+ QPS
Zvec is tuned for high throughput and low latency on CPU: multithreading, cache-friendly layouts, SIMD, and prefetching.
In VectorDBBench on the Cohere 10M dataset, with comparable hardware and matched recall, Zvec reports over 8,000 QPS—more than 2× the previous leaderboard #1 (ZillizCloud)—and substantially lower index build time. That suggests an embedded library can reach cloud-scale performance for similarity search when the workload fits the benchmark profile. The README highlights “searches billions of vectors in milliseconds” for production workloads.


RAG Capabilities: CRUD, Hybrid Search, Fusion, Reranking
The feature set is aimed at RAG and agent-style retrieval:
| Capability | What it gives you |
|---|---|
| Full CRUD | Update the local knowledge base as documents and files change. |
| Schema evolution | Adjust index strategies and fields over time. |
| Multi-vector retrieval | Combine several embedding channels in one query. |
| Built-in reranker | Weighted fusion and Reciprocal Rank Fusion (RRF) without custom merging. |
| Scalar–vector hybrid | Push scalar filters into the index path; optional inverted indexes for attributes. |
You can build on-device assistants that mix semantic retrieval with filters (user, time, type) and multiple embedding models inside one embedded engine.

Resource Control for Edge and CLI
For CLI tools, mobile, and other constrained environments, Zvec adds explicit resource limits:
- Streaming writes—e.g. 64 MB chunked writes to avoid loading everything into memory.
- Optional mmap mode—data paged in on demand to avoid OOM when data exceeds RAM.
- Experimental memory cap—
memory_limit_mbfor a hard process-level budget when mmap is off. - Concurrency—
concurrency,optimize_threads, andquery_threadsto cap CPU use and keep the main thread responsive.
That makes it easier to ship vector search in desktop apps and mobile without blowing memory or freezing the UI.
From the GitHub Repo
The alibaba/zvec repository is the single place for code, installers, and community:
- Install: Python (PyPI) —
pip install zvec; Node.js (npm) —npm install @zvec/zvec. - Platforms: Linux (x86_64, ARM64), macOS (ARM64). For other setups, see Building from Source.
- Contributing: Bug fixes, features, and docs are welcome; see the repo’s Contributing Guide.
- Community: DingTalk, WeChat, Discord, X (Twitter) @zvec_ai.
Where to Learn More
- Docs and intro: Zvec introduction
- Repo: github.com/alibaba/zvec
- Quickstart: Zvec QuickStart
- Benchmarks: Zvec Benchmark
Roadmap items include deeper LangChain and LlamaIndex integration, DuckDB and PostgreSQL extensions, and validation on real edge devices (e.g. iOS, Android, Nvidia Jetson).
Source: MarkTechPost — Alibaba Open-Sources Zvec, February 2026; Zvec official introduction; Zvec GitHub.