“Which vector database should I use for RAG?” I get this question a lot from people building their first RAG pipeline. It's a fair question, because most of the blog posts on the topic are vendor-shaped. So I figured I'd write down what I think about when choosing a vector database.
There isn't one right answer, which is why people get stuck on it. The other thing worth saying upfront: the vector database has less impact on overall RAG quality than your chunking strategy, embedding model, or LLM choice. So don't agonize.
This post walks through how I think about the trade-offs when picking one for a new project.
I'm going to walk through three different systems. It's worth knowing why each one exists and which type fits which situation.
FAISS is a Python wrapper around C++ vector math. That's the whole product. There's no REST API, no metadata, no persistence, no query language. If the process dies, the index dies with it unless you wrote the snapshot logic yourself.
Use it for prototyping, single-process apps, embedded use cases (one binary, one machine, no network), or as the kernel inside something you build. Skip it for anything multi-tenant, anything that needs metadata filtering, anything that has to survive a deploy without you reinventing operational primitives that other tools give you for free.
Vectors and metadata in one database. One connection pool. One backup strategy. One thing to monitor at 3 AM. pgvector handles 5-10M vectors comfortably; pgvectorscale pushes that further. The cost is 2-5 ms of SQL overhead per query, and your vector queries share I/O with the rest of your workload.
The most common pgvector mistake I see, and I see it constantly, is this: people install the extension, insert a
million vectors, run queries, and conclude that pgvector is slow. They forgot to create the HNSW index.
PostgreSQL is sequentially scanning every row. 287 ms per query without the index, 3 ms with it. Same data, two
zeros of difference. When you add pgvector, the next thing you do is CREATE INDEX ... USING hnsw.
If you're already on Postgres and have fewer than 5 million vectors, pgvector is what I'd reach for.
Qdrant isn't faster than pgvector on raw latency in most setups. The case for it is specific: filtering during search.
Concrete example. You run a multi-tenant SaaS, and every query has to filter by tenant_id. On
FAISS, you can't. On pgvector, you can, but the moment your filter is selective enough to disqualify most
rows, the planner has to choose between scanning the HNSW index without the filter (then post-filtering,
risking fewer than K results) or scanning the table by filter (then computing exact distance on whatever is
left). Neither path is great. Qdrant applies the filter during HNSW traversal, which is the data
structure those queries actually want.
The honest cost is that you're now running another service. Backups, monitoring, on-call.
Qdrant earns its keep the moment your filtering needs outgrow what SQL can express cleanly.
It comes down to the same answer as most tech-stack decisions: it depends. It depends on how much data you're working with, what kind of filtering you need, and how much operational complexity you can carry right now.
Migrating from one vector database to another for a RAG pipeline isn't that difficult. Because you control the write side, it's much easier to migrate than a traditional database. Unless you need filtering beyond what PostgreSQL supports, pick the boring stack. The interesting problems are downstream of it.
If you ended up somewhere different on your own project, I'd want to know what tipped it. You can always send me an email.