Disclosure: The author maintains langchain-age.
TL;DR: AGEVector in
langchain-agesupports three search strategies: Similarity, MMR, and Hybrid (vector + full-text RRF). Hybrid Search combines semantic similarity with keyword matching via RRF (k=60), improving recall over pure vector search. MongoDB-style metadata filters ($gte,$in,$between, etc. — 14 operators) narrow the search scope, and HNSW indexes keep response times in milliseconds at scale.
Table of contents
Open Table of contents
- Series
- What You’ll Be Able to Do
- Background: Vector Search Alone Is Not Enough
- Three Search Strategies
- Real-World Comparison: Same Data, Three Strategies
- Strategy Comparison
- Metadata Filtering
- Index Strategies: HNSW vs IVFFlat
- Putting It Together: Hybrid + Metadata + HNSW
- LangChain Retriever Integration
- Which Strategy Should You Pick — Decision Guide
- FAQ
- Next Up
- Key Takeaways
- Related Posts
- References
Series
This is Part 3 of the langchain-age series.
- GraphRAG with Just PostgreSQL — Overview + Setup
- Neo4j vs Apache AGE Benchmark — Performance Data
- Mastering Vector Search (this post)
- Building a GraphRAG Pipeline — Vector + Graph Integration
- Full AI Agent Stack on One PostgreSQL — LangGraph Integration
What You’ll Be Able to Do
- Understand the differences between Similarity, MMR, and Hybrid Search and choose the right strategy for your use case.
- Explain how RRF (Reciprocal Rank Fusion) combines vector rank and keyword rank into a single score.
- Write metadata filters using MongoDB-style operators to precisely scope your search results.
- Compare HNSW and IVFFlat indexes and select the right one for your production environment.
Background: Vector Search Alone Is Not Enough
Part 1 introduced AGEVector basics. Using similarity_search() to find semantically similar documents is straightforward. But in production you quickly hit limitations:
- Synonym/abbreviation gaps: Searching “PostgreSQL” misses documents containing “PG”
- Lack of diversity: Top results are nearly identical in content
- No scope control: You want only 2024+ documents, but vector distance can’t express that
AGEVector provides a search strategy for each of these problems.
Three Search Strategies
1. Similarity Search — The Default
Cosine distance over embedding vectors. Returns the k nearest documents by semantic similarity.
from langchain_age import AGEVector, DistanceStrategy
from langchain_openai import OpenAIEmbeddings
store = AGEVector(
connection_string="host=localhost port=5433 dbname=langchain_age user=langchain password=langchain",
embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
collection_name="tech_docs",
distance_strategy=DistanceStrategy.COSINE,
)
# Basic similarity search
docs = store.similarity_search("PostgreSQL extensions", k=5)
# With distance scores (lower = more similar)
results = store.similarity_search_with_score("PostgreSQL extensions", k=5)
# With relevance scores (0–1, higher = more similar)
results = store.similarity_search_with_relevance_scores("PostgreSQL extensions", k=5)
Strengths: Simple and fast. Limitations: No keyword matching — may miss documents with exact terms.
Notice what happens with the results above: the top 5 documents are semantically close to “PostgreSQL extensions,” but they often repeat nearly identical content. Feeding this duplicated context to an LLM wastes tokens without improving answer quality. MMR solves this problem.
2. MMR — Ensuring Diversity
Maximal Marginal Relevance balances relevance with diversity. Reuses stored embeddings so there is no extra embedding API call.
docs = store.max_marginal_relevance_search(
"database technology",
k=3, # Final result count
fetch_k=10, # Candidate pool size
lambda_mult=0.5, # 0=max diversity, 1=max relevance
)
How it works:
- Fetch top
fetch_kcandidates by vector similarity - Select the most similar document first
- Iteratively select the candidate most dissimilar from already-selected documents
lambda_multcontrols the relevance-diversity ratio
When to use: When LLM context would otherwise be filled with near-duplicate passages, wasting tokens. MMR provides diverse angles for better answers.
MMR solves the diversity problem, but it still draws candidates from vector distance alone — so it can miss documents containing abbreviations like “PG” or “AGE” that are lexically important but semantically distant. To capture keyword matches alongside semantic similarity, you need Hybrid Search.
3. Hybrid Search — Vectors + Keywords Combined
Combines vector similarity with PostgreSQL full-text search (tsvector) via RRF (Reciprocal Rank Fusion, k=60).
from langchain_age import AGEVector, SearchType
store = AGEVector(
connection_string=conn_str,
embedding_function=embeddings,
collection_name="hybrid_docs",
search_type=SearchType.HYBRID, # Key: enable hybrid mode
)
store.add_texts([
"PostgreSQL supports JSON and full-text search.",
"Apache AGE adds Cypher graph queries to PostgreSQL.",
"pgvector enables vector similarity search.",
"PG's extension system allows custom data types.",
])
# Automatically combines vector + keyword matching
results = store.similarity_search("PostgreSQL graph extension", k=3)
RRF scoring:
score(doc) = 1/(k + rank_vector) + 1/(k + rank_keyword)
| Document | Vector Rank | Keyword Rank | RRF Score (k=60) |
|---|---|---|---|
| “Apache AGE adds…“ | 1 | 1 | 1/61 + 1/61 = 0.0328 |
| ”PG’s extension system…“ | 3 | 2 | 1/63 + 1/62 = 0.0320 |
| ”pgvector enables…“ | 2 | ∞ | 1/62 + 0 = 0.0161 |
A document ranked 3rd in vector but 2nd in keyword outscores one ranked 2nd in vector but absent from keyword results.
When to use: Domains where abbreviations, proper nouns, or exact terms matter — Hybrid improves recall over vector-only search.
Academic basis: RRF was introduced by Cormack et al. at SIGIR 2009 (“Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods”). Their experiments showed RRF consistently outperforms individual ranking functions. k=60 is the experimentally optimal value reported in that paper.
Real-World Comparison: Same Data, Three Strategies
We tested all three strategies on 1,200 technical documents (langchain-age docs + PostgreSQL official docs + Stack Overflow excerpts) with 50 queries:
| Strategy | Recall@5 | Avg Response Time | Abbreviation Accuracy |
|---|---|---|---|
| Similarity | 0.68 | 12ms | 0.31 |
| MMR (λ=0.5) | 0.64 | 18ms | 0.31 |
| Hybrid | 0.82 | 25ms | 0.78 |
Key findings:
- Hybrid improved Recall@5 by 21% over vector-only. For abbreviation/acronym queries (“PG”, “AGE”, “CTE”), accuracy jumped from 31% to 78% — a 2.5x improvement.
- MMR slightly reduced Recall but cut context duplication from 73% to 12% — meaningful for token efficiency when feeding LLM context.
- The response time difference (12ms vs 25ms) is negligible compared to LLM call latency (200-500ms).
Bottom line: For technical document search where abbreviations and proper nouns matter, Hybrid is the safest default. Use vector-only for general semantic search, and add MMR when LLM context diversity is needed.
Strategy Comparison
| Strategy | Semantic | Keyword | Diversity | Best For |
|---|---|---|---|---|
| Similarity | Yes | No | No | General similarity search |
| MMR | Yes | No | Yes | Diverse LLM context |
| Hybrid | Yes | Yes | No | Exact terms/abbreviations matter |
Metadata Filtering
AGEVector supports MongoDB-style filters on JSONB metadata. 14 operators let you precisely control the search scope.
Basic Usage
# Add documents with metadata
store.add_texts(
["AGE 1.7.0 release notes", "pgvector 0.8 performance", "LangChain v1 migration"],
metadatas=[
{"author": "alice", "year": 2026, "tag": "release"},
{"author": "bob", "year": 2026, "tag": "benchmark"},
{"author": "alice", "year": 2025, "tag": "migration"},
],
)
Comparison Operators
# Equality
store.similarity_search("release", filter={"author": "alice"})
# Comparison
store.similarity_search("performance", filter={"year": {"$gte": 2026}})
# Set membership
store.similarity_search("query", filter={"tag": {"$in": ["release", "benchmark"]}})
# Range
store.similarity_search("query", filter={"year": {"$between": [2025, 2026]}})
# Pattern matching (case-insensitive)
store.similarity_search("query", filter={"author": {"$ilike": "%ali%"}})
# Field existence
store.similarity_search("query", filter={"tag": {"$exists": True}})
Logical Combinations
# AND
store.similarity_search("query", filter={
"$and": [
{"author": "alice"},
{"year": {"$gte": 2026}},
]
})
# OR
store.similarity_search("query", filter={
"$or": [
{"tag": "release"},
{"tag": "benchmark"},
]
})
Full Operator Reference
| Operator | Meaning | Example |
|---|---|---|
$eq | Equal | {"year": {"$eq": 2026}} |
$ne | Not equal | {"tag": {"$ne": "draft"}} |
$lt | Less than | {"score": {"$lt": 0.5}} |
$lte | Less or equal | {"score": {"$lte": 0.5}} |
$gt | Greater than | {"year": {"$gt": 2024}} |
$gte | Greater or equal | {"year": {"$gte": 2025}} |
$in | In set | {"tag": {"$in": ["a", "b"]}} |
$nin | Not in set | {"tag": {"$nin": ["draft"]}} |
$between | Range | {"score": {"$between": [0.5, 1.0]}} |
$like | Pattern (case-sensitive) | {"name": {"$like": "%AGE%"}} |
$ilike | Pattern (case-insensitive) | {"name": {"$ilike": "%age%"}} |
$exists | Field exists | {"bio": {"$exists": True}} |
$and | AND combination | {"$and": [{...}, {...}]} |
$or | OR combination | {"$or": [{...}, {...}]} |
Internally, these translate to JSONB operators (->> and PostgreSQL comparisons), so GIN indexes on the metadata column make filtering fast.
Watch Out: Common Metadata Filter Mistakes
Two frequent mistakes when first using metadata filters:
-
Type mismatch: JSONB preserves numeric types when stored as
{"year": 2026}, but if you store{"year": "2026"}(string), then$gteperforms string comparison. Always store numbers as numbers for numeric filters. -
Over-filtering: Vector search + metadata filter together can drastically reduce the candidate pool. If you request k=5 but only 3 documents match the filter, you get 3 results. Check data distribution before adding filters.
Index Strategies: HNSW vs IVFFlat
Once vectors exceed tens of thousands, sequential scan becomes slow. pgvector provides two approximate nearest neighbor (ANN) index types.
HNSW (Recommended for Production)
store.create_hnsw_index(m=16, ef_construction=64)
| Parameter | Default | Meaning |
|---|---|---|
m | 16 | Connections per node. Higher = better recall, more memory |
ef_construction | 64 | Build-time search width. Higher = better quality, slower build |
- Pros: Excellent search speed and recall. Supports incremental inserts.
- Cons: Slower build time and higher memory than IVFFlat.
- Best for: Production environments. Data that grows over time.
IVFFlat
store.create_ivfflat_index(n_lists=100)
| Parameter | Default | Meaning |
|---|---|---|
n_lists | 100 | Cluster count. sqrt(total_vectors) is a good starting point |
- Pros: Fast to build. Memory-efficient.
- Cons: Needs REINDEX after data additions. Slightly lower recall than HNSW.
- Best for: Static datasets with batch inserts followed by search-only workloads.
Dropping Indexes
store.drop_index()
Selection Guide
| Criteria | HNSW | IVFFlat |
|---|---|---|
| Search accuracy | High | Moderate |
| Build speed | Slow | Fast |
| Incremental inserts | Supported | Needs rebuild |
| Memory usage | High | Low |
| Production recommended | Yes | Static data only |
Putting It Together: Hybrid + Metadata + HNSW
Combining all three features gives you a production-grade search pipeline.
from langchain_age import AGEVector, SearchType
from langchain_openai import OpenAIEmbeddings
# 1. Configure store with Hybrid + HNSW
store = AGEVector(
connection_string=conn_str,
embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
collection_name="production_docs",
search_type=SearchType.HYBRID,
)
# 2. Load documents
store.add_texts(
texts=["..."],
metadatas=[{"source": "internal", "year": 2026, "dept": "engineering"}],
)
# 3. Create HNSW index
store.create_hnsw_index(m=16, ef_construction=64)
# 4. Metadata filter + Hybrid search
results = store.similarity_search(
"PostgreSQL graph extension performance",
k=5,
filter={
"$and": [
{"source": "internal"},
{"year": {"$gte": 2025}},
]
},
)
# 5. Convert to LangChain Retriever
retriever = store.as_retriever(
search_kwargs={
"k": 5,
"filter": {"dept": "engineering"},
}
)
LangChain Retriever Integration
AGEVector supports as_retriever(), so it plugs directly into existing RAG chains.
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
retriever = store.as_retriever(search_kwargs={"k": 5})
prompt = ChatPromptTemplate.from_template(
"Answer the question based on the following context.\n\n"
"Context:\n{context}\n\nQuestion: {question}"
)
chain = (
{"context": retriever, "question": lambda x: x}
| prompt
| ChatOpenAI(model="gpt-4o-mini")
| StrOutputParser()
)
answer = chain.invoke("How does Hybrid Search work in AGE?")
Which Strategy Should You Pick — Decision Guide
Strategy selection depends on your data characteristics and usage patterns. Follow this question flow:
- Does your search target contain abbreviations, proper nouns, or code names? → Yes: Hybrid is mandatory. No: Start with Similarity.
- Is context duplication a problem when feeding the LLM? → Yes: Add MMR. No: Keep the default.
- Do you need to constrain by date, author, or category? → Yes: Add metadata filters.
- Do you have 100K+ vectors? → Yes: HNSW index is mandatory. No: Sequential scan is fine.
For most production RAG pipelines, Hybrid + metadata filters + HNSW is the safe default combination. If MMR is also needed, apply it as an application-level post-processing step.
FAQ
Can I adjust the weights between vector and keyword scores in Hybrid Search?
The RRF k value is currently fixed at 60, which is the experimentally optimal value reported by Cormack et al. (SIGIR 2009). With k=60, RRF assigns roughly equal weight to both ranking signals. Custom weight support is planned for a future release.
Does metadata filtering slow down search?
With a GIN index on the JSONB column, filtering cost is negligible. AGEVector creates the metadata column as JSONB, so you can manually add CREATE INDEX ON "collection" USING gin (metadata); if needed.
Can I combine MMR and Hybrid?
max_marginal_relevance_search() operates in SearchType.VECTOR mode. Hybrid + MMR is not currently supported directly, but you can use Hybrid to fetch a broader candidate set and apply MMR-style reranking at the application level.
Should I use HNSW or IVFFlat for pgvector?
HNSW delivers high search accuracy and supports incremental inserts, making it the best choice for production environments where data grows over time. The trade-off is longer build times and higher memory usage, but for most workloads the search quality and operational convenience outweigh these costs. IVFFlat, on the other hand, builds quickly and uses less memory, but requires a REINDEX after data additions and has slightly lower recall than HNSW. If your workflow involves a one-time batch insert followed by search-only queries on a static dataset, IVFFlat is a reasonable choice. In short: choose HNSW when data keeps growing, IVFFlat when the dataset is fixed.
Does Hybrid Search require pg_trgm?
Hybrid Search uses PostgreSQL’s built-in tsvector full-text search, which requires no extensions. pg_trgm optionally enhances similarity matching but is not required. The langchain-age Docker image includes pg_trgm pre-installed.
Next Up
This post covered vector search in depth. Part 4 combines vector search with graph traversal to build an end-to-end GraphRAG pipeline.
Key Takeaways
- Hybrid Search (vector + full-text RRF) improves Recall@5 by 21% over vector-only search for abbreviation and proper noun queries. It is the recommended default for technical document retrieval.
- MMR (Maximal Marginal Relevance) reduces context duplication from 73% to 12%, significantly improving token efficiency when feeding results to an LLM.
- HNSW indexes support incremental inserts, making them the right choice for production environments where data grows continuously. IVFFlat is better suited for static datasets with batch inserts.
- AGEVector’s MongoDB-style metadata filters (14 operators) are JSONB-based, so combining them with a GIN index makes filtering cost negligible.
Related Posts
- GraphRAG with Just PostgreSQL — Part 1: Overview and Quick Start
- Neo4j vs Apache AGE Benchmark — Part 2: Performance Comparison
- Building a GraphRAG Pipeline — Part 4: Vector + Graph Integration
- Full AI Agent Stack on One PostgreSQL — Part 5: LangGraph Integration
References
- pgvector — Vector similarity search for PostgreSQL
- Cormack, G. V., Clarke, C. L. A., & Buettcher, S. (2009). Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods. SIGIR 2009.
- LangChain VectorStore docs — LangChain vector store concepts guide
langchain-age is MIT licensed. Apache AGE is Apache 2.0. pgvector is PostgreSQL License.