Back to blog

Which vector database should I use for RAG?

FAISS vs pgvector vs Qdrant: how to choose when you are starting out


“Which vector database should I use for RAG?”

I hear this question often from teams building their first retrieval pipeline. It is a fair question. Most writing about vector databases is vendor-shaped, and every product page makes the same implicit argument: your RAG system will fail unless you choose this database.

That is not how production RAG usually fails.

For most early systems, the vector database has less impact on answer quality than your chunking strategy, embedding model, retrieval design, evaluation loop, and LLM choice. The database matters, but it is rarely the first thing worth optimizing.

So the goal is not to find the perfect vector database. The goal is to pick one that fits your current constraints, avoids obvious traps, and does not create unnecessary operational work.

This post walks through how I think about that choice.

The three useful starting points

There are many vector databases, but most early RAG projects end up comparing the same three categories:

Each exists for a different reason. Understanding that reason is more useful than memorizing benchmark numbers.

FAISS: the speed ceiling, but you build the rest

FAISS is essentially a high-performance vector search library. It gives you fast similarity search, but not much else.

There is no REST API. No built-in metadata model. No query language. No operational dashboard. No persistence layer unless you build one. If the process dies, the index dies with it unless you wrote snapshotting and reload logic yourself.

That is not a criticism. It is the design.

FAISS is excellent when you want vector search inside a single process, a prototype, an embedded application, or a custom system where you are willing to build the surrounding infrastructure yourself. It is also useful as the search kernel inside a larger architecture.

But it is the wrong default for a production RAG system that needs multi-tenancy, metadata filtering, deployment safety, backups, observability, or predictable operations.

Use FAISS when you want a library.

Do not use FAISS when you actually need a database.

pgvector: the boring-correct answer for most teams

For many teams, pgvector is the right first production choice.

The reason is not that it is the fastest possible vector search engine. The reason is that it keeps your system simple.

Your vectors and metadata live in the same database. You use the same connection pool, the same backup strategy, the same monitoring stack, the same access controls, and the same operational habits your team already has. At 3 AM, that matters more than a synthetic benchmark.

For many RAG systems, pgvector comfortably handles millions of vectors. Extensions and related projects such as pgvectorscale can push that further. The trade-off is that your vector workload shares resources with the rest of PostgreSQL, and each query pays a small amount of SQL and planner overhead.

That is usually a good trade.

The most common pgvector mistake is simple: teams install the extension, insert a large number of embeddings, run a few searches, and conclude that pgvector is slow.

Then you look closer and find the missing line.

They never created the HNSW index.

Without the index, PostgreSQL scans the table and computes distances row by row. With the index, the same query can move from hundreds of milliseconds to single-digit milliseconds. Same data. Same database. Completely different result.

When you adopt pgvector for approximate nearest-neighbor search, creating the HNSW index is not an optional tuning step. It is part of the implementation.

If you already run PostgreSQL and you do not expect to cross several million vectors soon, pgvector is the default I would reach for.

Qdrant: when filtering becomes part of retrieval

Qdrant earns its keep when filtering matters during search.

That distinction is important. Many databases can filter results before or after vector search. Fewer systems handle selective filtering well while traversing the approximate-nearest-neighbor index.

Consider a multi-tenant SaaS product. Every query must be restricted to one tenant_id. That sounds simple, but it changes the retrieval problem.

With FAISS, there is no native metadata filtering model.

With pgvector, you can express the filter in SQL, but the planner still has to choose a physical execution path. It can search the vector index and filter afterward, which may return too few valid results if most candidates belong to other tenants. Or it can filter the table first and compute distances over the remaining rows, which may become expensive as the filtered set grows.

Neither path is ideal for heavily filtered vector search.

Qdrant is designed for this use case. It can apply filters during HNSW traversal, which is what multi-tenant and faceted retrieval often need.

The cost is operational. You are now running another service. That means another backup strategy, another monitoring surface, another deployment path, and another thing that can page someone.

That cost can be worth it. But it should buy you something specific.

Qdrant is not the default because it is fashionable. It is the right choice when your filtering needs outgrow what PostgreSQL can handle cleanly.

A practical decision tree

Here is the version I use when advising teams.

The three mistakes I see most often

The first mistake is reaching for a managed vector database before the system needs one.

Managed infrastructure is useful. But using it before you understand your retrieval workload often hides the wrong problems. If you have not crossed meaningful scale, and you do not yet know your filtering or latency constraints, you are probably buying operational discipline before you need it.

The second mistake is tuning the vector store before fixing the retrieval basics.

Under tens of thousands of vectors, brute-force search is often already fast enough. At that stage, better chunking, better metadata, better query rewriting, and better evaluation will usually improve the system more than changing databases.

The third mistake is using pgvector without creating the right index.

This one is common enough to call out twice. If you benchmark pgvector without HNSW, you are not benchmarking pgvector as you would use it in production. You are benchmarking a sequential scan.

Your choice is not final

The reassuring part is that vector database choice is usually reversible.

In a RAG system, you control the write side. You can re-embed, re-index, and replay documents into a new store more easily than you can migrate the source-of-truth transactional database for an application.

That does not mean migrations are free. But it does mean you should not treat the first vector database choice as permanent architecture.

Pick the simplest thing that satisfies your current constraints. For many teams, that means pgvector. For local experiments, FAISS is enough. For serious filtered retrieval, Qdrant or a similar dedicated vector database earns its place.

The interesting problems in RAG are usually downstream of the vector store.

Retrieval quality, chunk boundaries, metadata design, reranking, evaluation, observability, and failure analysis will matter more than whether your first database was the most fashionable choice.

Choose the boring stack until the boring stack stops working. Then change it for a reason.