Back to blog

Which vector database should I use for RAG?

FAISS vs pgvector vs Qdrant: how to pick when you're starting out

“Which vector database should I use for RAG?” I get this question a lot from people building their first RAG pipeline. It's a fair question, because most of the blog posts on the topic are vendor-shaped. So I figured I'd write down what I think about when choosing a vector database.

There isn't one right answer, which is why people get stuck on it. The other thing worth saying upfront: the vector database has less impact on overall RAG quality than your chunking strategy, embedding model, or LLM choice. So don't agonize.

This post walks through how I think about the trade-offs when picking one for a new project.

FAISS vs pgvector vs Qdrant

I'm going to walk through three different systems. It's worth knowing why each one exists and which type fits which situation.

FAISS: the speed ceiling, but you build the rest

FAISS is a Python wrapper around C++ vector math. That's the whole product. There's no REST API, no metadata, no persistence, no query language. If the process dies, the index dies with it unless you wrote the snapshot logic yourself.

Use it for prototyping, single-process apps, embedded use cases (one binary, one machine, no network), or as the kernel inside something you build. Skip it for anything multi-tenant, anything that needs metadata filtering, anything that has to survive a deploy without you reinventing operational primitives that other tools give you for free.

pgvector: the boring-correct answer for most teams

Vectors and metadata in one database. One connection pool. One backup strategy. One thing to monitor at 3 AM. pgvector handles 5-10M vectors comfortably; pgvectorscale pushes that further. The cost is 2-5 ms of SQL overhead per query, and your vector queries share I/O with the rest of your workload.

The most common pgvector mistake I see, and I see it constantly, is this: people install the extension, insert a million vectors, run queries, and conclude that pgvector is slow. They forgot to create the HNSW index. PostgreSQL is sequentially scanning every row. 287 ms per query without the index, 3 ms with it. Same data, two zeros of difference. When you add pgvector, the next thing you do is CREATE INDEX ... USING hnsw.

If you're already on Postgres and have fewer than 5 million vectors, pgvector is what I'd reach for.

Qdrant: when the operational features earn their keep

Qdrant isn't faster than pgvector on raw latency in most setups. The case for it is specific: filtering during search.

Concrete example. You run a multi-tenant SaaS, and every query has to filter by tenant_id. On FAISS, you can't. On pgvector, you can, but the moment your filter is selective enough to disqualify most rows, the planner has to choose between scanning the HNSW index without the filter (then post-filtering, risking fewer than K results) or scanning the table by filter (then computing exact distance on whatever is left). Neither path is great. Qdrant applies the filter during HNSW traversal, which is the data structure those queries actually want.

The honest cost is that you're now running another service. Backups, monitoring, on-call.

Qdrant earns its keep the moment your filtering needs outgrow what SQL can express cleanly.

The decision tree

It comes down to the same answer as most tech-stack decisions: it depends. It depends on how much data you're working with, what kind of filtering you need, and how much operational complexity you can carry right now.

Prototype, under 10K documents → FAISS Flat. Brute-force search is sub-millisecond at this scale, and you'll spend your time better on chunking than on vector store choice.
New system, on PostgreSQL, no immediate plan to cross 5M vectors. → pgvector with HNSW. One database to run, one backup strategy.
New system needing rich filtering during search (multi-tenant, faceted, hybrid filters) → Qdrant (or Milvus / Weaviate). The operational cost of a second service is justified by what these tools do that pgvector can't.
Considering Pinecone → ask whoever suggested it which constraint they're solving. If they can't name one, you're paying for operational discipline you don't yet need.

The three mistakes I see most often

Reaching for a managed vector DB before you have the scale. Pulling Pinecone before you cross 1M vectors is paying for someone else's operational discipline before you need it.
Tuning before fixing the basics. Under 50K vectors, brute-force search is already fast on a single core. The wins from better chunking are larger than anything you'll squeeze out of vector store tuning at that scale, and they compound through the rest of the pipeline.
Adding pgvector and forgetting to create the HNSW index. See the section above. This single missing line is responsible for half the “pgvector is slow” complaints I've read.

Your choice is not final

Migrating from one vector database to another for a RAG pipeline isn't that difficult. Because you control the write side, it's much easier to migrate than a traditional database. Unless you need filtering beyond what PostgreSQL supports, pick the boring stack. The interesting problems are downstream of it.

If you ended up somewhere different on your own project, I'd want to know what tipped it. You can always send me an email.