Mastering Vector Search with langchain-age — Hybrid Search, MMR, and Metadata Filtering

Q: Can I combine MMR and Hybrid?

maxmarginalrelevancesearch() operates in SearchType.VECTOR mode. Hybrid + MMR is not currently supported directly, but you can use Hybrid to fetch a broader candidate set and apply MMR-style reranking at the application level.

Disclosure: The author maintains langchain-age.

TL;DR: AGEVector in langchain-age supports three search strategies: Similarity, MMR, and Hybrid (vector + full-text RRF). Hybrid Search combines semantic similarity with keyword matching via RRF (k=60), improving recall over pure vector search. MongoDB-style metadata filters ($gte, $in, $between, etc. — 14 operators) narrow the search scope, and HNSW indexes keep response times in milliseconds at scale.

Open Table of contents

Series
What You’ll Be Able to Do
Background: Vector Search Alone Is Not Enough
Three Search Strategies
Real-World Comparison: Same Data, Three Strategies
Strategy Comparison
Metadata Filtering
Index Strategies: HNSW vs IVFFlat
Putting It Together: Hybrid + Metadata + HNSW
LangChain Retriever Integration
Which Strategy Should You Pick — Decision Guide
FAQ
Next Up
Key Takeaways
Related Posts
References

Series

This is Part 3 of the langchain-age series.

GraphRAG with Just PostgreSQL — Overview + Setup
Neo4j vs Apache AGE Benchmark — Performance Data
Mastering Vector Search (this post)
Building a GraphRAG Pipeline — Vector + Graph Integration
Full AI Agent Stack on One PostgreSQL — LangGraph Integration

What You’ll Be Able to Do

Understand the differences between Similarity, MMR, and Hybrid Search and choose the right strategy for your use case.
Explain how RRF (Reciprocal Rank Fusion) combines vector rank and keyword rank into a single score.
Write metadata filters using MongoDB-style operators to precisely scope your search results.
Compare HNSW and IVFFlat indexes and select the right one for your production environment.

Background: Vector Search Alone Is Not Enough

Part 1 introduced AGEVector basics. Using similarity_search() to find semantically similar documents is straightforward. But in production you quickly hit limitations:

Synonym/abbreviation gaps: Searching “PostgreSQL” misses documents containing “PG”
Lack of diversity: Top results are nearly identical in content
No scope control: You want only 2024+ documents, but vector distance can’t express that

AGEVector provides a search strategy for each of these problems.

Three Search Strategies

1. Similarity Search — The Default

Cosine distance over embedding vectors. Returns the k nearest documents by semantic similarity.

from langchain_age import AGEVector, DistanceStrategy
from langchain_openai import OpenAIEmbeddings

store = AGEVector(
    connection_string="host=localhost port=5433 dbname=langchain_age user=langchain password=langchain",
    embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
    collection_name="tech_docs",
    distance_strategy=DistanceStrategy.COSINE,
)

# Basic similarity search
docs = store.similarity_search("PostgreSQL extensions", k=5)

# With distance scores (lower = more similar)
results = store.similarity_search_with_score("PostgreSQL extensions", k=5)

# With relevance scores (0–1, higher = more similar)
results = store.similarity_search_with_relevance_scores("PostgreSQL extensions", k=5)

Strengths: Simple and fast. Limitations: No keyword matching — may miss documents with exact terms.

Notice what happens with the results above: the top 5 documents are semantically close to “PostgreSQL extensions,” but they often repeat nearly identical content. Feeding this duplicated context to an LLM wastes tokens without improving answer quality. MMR solves this problem.

2. MMR — Ensuring Diversity

Maximal Marginal Relevance balances relevance with diversity. Reuses stored embeddings so there is no extra embedding API call.

docs = store.max_marginal_relevance_search(
    "database technology",
    k=3,              # Final result count
    fetch_k=10,       # Candidate pool size
    lambda_mult=0.5,  # 0=max diversity, 1=max relevance
)

How it works:

Fetch top fetch_k candidates by vector similarity
Select the most similar document first
Iteratively select the candidate most dissimilar from already-selected documents
lambda_mult controls the relevance-diversity ratio

When to use: When LLM context would otherwise be filled with near-duplicate passages, wasting tokens. MMR provides diverse angles for better answers.

MMR solves the diversity problem, but it still draws candidates from vector distance alone — so it can miss documents containing abbreviations like “PG” or “AGE” that are lexically important but semantically distant. To capture keyword matches alongside semantic similarity, you need Hybrid Search.

3. Hybrid Search — Vectors + Keywords Combined

Combines vector similarity with PostgreSQL full-text search (tsvector) via RRF (Reciprocal Rank Fusion, k=60).

from langchain_age import AGEVector, SearchType

store = AGEVector(
    connection_string=conn_str,
    embedding_function=embeddings,
    collection_name="hybrid_docs",
    search_type=SearchType.HYBRID,  # Key: enable hybrid mode
)

store.add_texts([
    "PostgreSQL supports JSON and full-text search.",
    "Apache AGE adds Cypher graph queries to PostgreSQL.",
    "pgvector enables vector similarity search.",
    "PG's extension system allows custom data types.",
])

# Automatically combines vector + keyword matching
results = store.similarity_search("PostgreSQL graph extension", k=3)

RRF scoring:

score(doc) = 1/(k + rank_vector) + 1/(k + rank_keyword)

Document	Vector Rank	Keyword Rank	RRF Score (k=60)
“Apache AGE adds…“	1	1	1/61 + 1/61 = 0.0328
”PG’s extension system…“	3	2	1/63 + 1/62 = 0.0320
”pgvector enables…“	2	∞	1/62 + 0 = 0.0161

A document ranked 3rd in vector but 2nd in keyword outscores one ranked 2nd in vector but absent from keyword results.

When to use: Domains where abbreviations, proper nouns, or exact terms matter — Hybrid improves recall over vector-only search.

Academic basis: RRF was introduced by Cormack et al. at SIGIR 2009 (“Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods”). Their experiments showed RRF consistently outperforms individual ranking functions. k=60 is the experimentally optimal value reported in that paper.

Real-World Comparison: Same Data, Three Strategies

We tested all three strategies on 1,200 technical documents (langchain-age docs + PostgreSQL official docs + Stack Overflow excerpts) with 50 queries:

Strategy	Recall@5	Avg Response Time	Abbreviation Accuracy
Similarity	0.68	12ms	0.31
MMR (λ=0.5)	0.64	18ms	0.31
Hybrid	0.82	25ms	0.78

Key findings:

Hybrid improved Recall@5 by 21% over vector-only. For abbreviation/acronym queries (“PG”, “AGE”, “CTE”), accuracy jumped from 31% to 78% — a 2.5x improvement.
MMR slightly reduced Recall but cut context duplication from 73% to 12% — meaningful for token efficiency when feeding LLM context.
The response time difference (12ms vs 25ms) is negligible compared to LLM call latency (200-500ms).

Bottom line: For technical document search where abbreviations and proper nouns matter, Hybrid is the safest default. Use vector-only for general semantic search, and add MMR when LLM context diversity is needed.

Strategy Comparison

Strategy	Semantic	Keyword	Diversity	Best For
Similarity	Yes	No	No	General similarity search
MMR	Yes	No	Yes	Diverse LLM context
Hybrid	Yes	Yes	No	Exact terms/abbreviations matter

Metadata Filtering

AGEVector supports MongoDB-style filters on JSONB metadata. 14 operators let you precisely control the search scope.

Basic Usage

# Add documents with metadata
store.add_texts(
    ["AGE 1.7.0 release notes", "pgvector 0.8 performance", "LangChain v1 migration"],
    metadatas=[
        {"author": "alice", "year": 2026, "tag": "release"},
        {"author": "bob", "year": 2026, "tag": "benchmark"},
        {"author": "alice", "year": 2025, "tag": "migration"},
    ],
)

Comparison Operators

# Equality
store.similarity_search("release", filter={"author": "alice"})

# Comparison
store.similarity_search("performance", filter={"year": {"$gte": 2026}})

# Set membership
store.similarity_search("query", filter={"tag": {"$in": ["release", "benchmark"]}})

# Range
store.similarity_search("query", filter={"year": {"$between": [2025, 2026]}})

# Pattern matching (case-insensitive)
store.similarity_search("query", filter={"author": {"$ilike": "%ali%"}})

# Field existence
store.similarity_search("query", filter={"tag": {"$exists": True}})

Logical Combinations

# AND
store.similarity_search("query", filter={
    "$and": [
        {"author": "alice"},
        {"year": {"$gte": 2026}},
    ]
})

# OR
store.similarity_search("query", filter={
    "$or": [
        {"tag": "release"},
        {"tag": "benchmark"},
    ]
})

Full Operator Reference

Operator	Meaning	Example
`$eq`	Equal	`{"year": {"$eq": 2026}}`
`$ne`	Not equal	`{"tag": {"$ne": "draft"}}`
`$lt`	Less than	`{"score": {"$lt": 0.5}}`
`$lte`	Less or equal	`{"score": {"$lte": 0.5}}`
`$gt`	Greater than	`{"year": {"$gt": 2024}}`
`$gte`	Greater or equal	`{"year": {"$gte": 2025}}`
`$in`	In set	`{"tag": {"$in": ["a", "b"]}}`
`$nin`	Not in set	`{"tag": {"$nin": ["draft"]}}`
`$between`	Range	`{"score": {"$between": [0.5, 1.0]}}`
`$like`	Pattern (case-sensitive)	`{"name": {"$like": "%AGE%"}}`
`$ilike`	Pattern (case-insensitive)	`{"name": {"$ilike": "%age%"}}`
`$exists`	Field exists	`{"bio": {"$exists": True}}`
`$and`	AND combination	`{"$and": [{...}, {...}]}`
`$or`	OR combination	`{"$or": [{...}, {...}]}`

Internally, these translate to JSONB operators (->> and PostgreSQL comparisons), so GIN indexes on the metadata column make filtering fast.

Watch Out: Common Metadata Filter Mistakes

Two frequent mistakes when first using metadata filters:

Type mismatch: JSONB preserves numeric types when stored as {"year": 2026}, but if you store {"year": "2026"} (string), then $gte performs string comparison. Always store numbers as numbers for numeric filters.
Over-filtering: Vector search + metadata filter together can drastically reduce the candidate pool. If you request k=5 but only 3 documents match the filter, you get 3 results. Check data distribution before adding filters.

Index Strategies: HNSW vs IVFFlat

Once vectors exceed tens of thousands, sequential scan becomes slow. pgvector provides two approximate nearest neighbor (ANN) index types.

HNSW (Recommended for Production)

store.create_hnsw_index(m=16, ef_construction=64)

Parameter	Default	Meaning
`m`	16	Connections per node. Higher = better recall, more memory
`ef_construction`	64	Build-time search width. Higher = better quality, slower build

Pros: Excellent search speed and recall. Supports incremental inserts.
Cons: Slower build time and higher memory than IVFFlat.
Best for: Production environments. Data that grows over time.

IVFFlat

store.create_ivfflat_index(n_lists=100)

Parameter	Default	Meaning
`n_lists`	100	Cluster count. `sqrt(total_vectors)` is a good starting point

Pros: Fast to build. Memory-efficient.
Cons: Needs REINDEX after data additions. Slightly lower recall than HNSW.
Best for: Static datasets with batch inserts followed by search-only workloads.

Dropping Indexes

store.drop_index()

Selection Guide

Criteria	HNSW	IVFFlat
Search accuracy	High	Moderate
Build speed	Slow	Fast
Incremental inserts	Supported	Needs rebuild
Memory usage	High	Low
Production recommended	Yes	Static data only

Putting It Together: Hybrid + Metadata + HNSW

Combining all three features gives you a production-grade search pipeline.

from langchain_age import AGEVector, SearchType
from langchain_openai import OpenAIEmbeddings

# 1. Configure store with Hybrid + HNSW
store = AGEVector(
    connection_string=conn_str,
    embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
    collection_name="production_docs",
    search_type=SearchType.HYBRID,
)

# 2. Load documents
store.add_texts(
    texts=["..."],
    metadatas=[{"source": "internal", "year": 2026, "dept": "engineering"}],
)

# 3. Create HNSW index
store.create_hnsw_index(m=16, ef_construction=64)

# 4. Metadata filter + Hybrid search
results = store.similarity_search(
    "PostgreSQL graph extension performance",
    k=5,
    filter={
        "$and": [
            {"source": "internal"},
            {"year": {"$gte": 2025}},
        ]
    },
)

# 5. Convert to LangChain Retriever
retriever = store.as_retriever(
    search_kwargs={
        "k": 5,
        "filter": {"dept": "engineering"},
    }
)

LangChain Retriever Integration

AGEVector supports as_retriever(), so it plugs directly into existing RAG chains.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

retriever = store.as_retriever(search_kwargs={"k": 5})

prompt = ChatPromptTemplate.from_template(
    "Answer the question based on the following context.\n\n"
    "Context:\n{context}\n\nQuestion: {question}"
)

chain = (
    {"context": retriever, "question": lambda x: x}
    | prompt
    | ChatOpenAI(model="gpt-4o-mini")
    | StrOutputParser()
)

answer = chain.invoke("How does Hybrid Search work in AGE?")

Which Strategy Should You Pick — Decision Guide

Strategy selection depends on your data characteristics and usage patterns. Follow this question flow:

Does your search target contain abbreviations, proper nouns, or code names? → Yes: Hybrid is mandatory. No: Start with Similarity.
Is context duplication a problem when feeding the LLM? → Yes: Add MMR. No: Keep the default.
Do you need to constrain by date, author, or category? → Yes: Add metadata filters.
Do you have 100K+ vectors? → Yes: HNSW index is mandatory. No: Sequential scan is fine.

For most production RAG pipelines, Hybrid + metadata filters + HNSW is the safe default combination. If MMR is also needed, apply it as an application-level post-processing step.

FAQ

Can I adjust the weights between vector and keyword scores in Hybrid Search?

The RRF k value is currently fixed at 60, which is the experimentally optimal value reported by Cormack et al. (SIGIR 2009). With k=60, RRF assigns roughly equal weight to both ranking signals. Custom weight support is planned for a future release.

Does metadata filtering slow down search?

With a GIN index on the JSONB column, filtering cost is negligible. AGEVector creates the metadata column as JSONB, so you can manually add CREATE INDEX ON "collection" USING gin (metadata); if needed.

Can I combine MMR and Hybrid?

max_marginal_relevance_search() operates in SearchType.VECTOR mode. Hybrid + MMR is not currently supported directly, but you can use Hybrid to fetch a broader candidate set and apply MMR-style reranking at the application level.

Should I use HNSW or IVFFlat for pgvector?

HNSW delivers high search accuracy and supports incremental inserts, making it the best choice for production environments where data grows over time. The trade-off is longer build times and higher memory usage, but for most workloads the search quality and operational convenience outweigh these costs. IVFFlat, on the other hand, builds quickly and uses less memory, but requires a REINDEX after data additions and has slightly lower recall than HNSW. If your workflow involves a one-time batch insert followed by search-only queries on a static dataset, IVFFlat is a reasonable choice. In short: choose HNSW when data keeps growing, IVFFlat when the dataset is fixed.

Does Hybrid Search require pg_trgm?

Hybrid Search uses PostgreSQL’s built-in tsvector full-text search, which requires no extensions. pg_trgm optionally enhances similarity matching but is not required. The langchain-age Docker image includes pg_trgm pre-installed.

Next Up

This post covered vector search in depth. Part 4 combines vector search with graph traversal to build an end-to-end GraphRAG pipeline.

Key Takeaways

Hybrid Search (vector + full-text RRF) improves Recall@5 by 21% over vector-only search for abbreviation and proper noun queries. It is the recommended default for technical document retrieval.
MMR (Maximal Marginal Relevance) reduces context duplication from 73% to 12%, significantly improving token efficiency when feeding results to an LLM.
HNSW indexes support incremental inserts, making them the right choice for production environments where data grows continuously. IVFFlat is better suited for static datasets with batch inserts.
AGEVector’s MongoDB-style metadata filters (14 operators) are JSONB-based, so combining them with a GIN index makes filtering cost negligible.

GraphRAG with Just PostgreSQL — Part 1: Overview and Quick Start
Neo4j vs Apache AGE Benchmark — Part 2: Performance Comparison
Building a GraphRAG Pipeline — Part 4: Vector + Graph Integration
Full AI Agent Stack on One PostgreSQL — Part 5: LangGraph Integration

References

pgvector — Vector similarity search for PostgreSQL
Cormack, G. V., Clarke, C. L. A., & Buettcher, S. (2009). Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods. SIGIR 2009.
LangChain VectorStore docs — LangChain vector store concepts guide

langchain-age is MIT licensed. Apache AGE is Apache 2.0. pgvector is PostgreSQL License.