Skip to content
BAEM1N.DEV — AI, RAG, LLMOps 개발 블로그
Go back

Mastering Vector Search with langchain-age — Hybrid Search, MMR, and Metadata Filtering

Disclosure: The author maintains langchain-age.

TL;DR: AGEVector in langchain-age supports three search strategies: Similarity, MMR, and Hybrid (vector + full-text RRF). Hybrid Search combines semantic similarity with keyword matching via RRF (k=60), improving recall over pure vector search. MongoDB-style metadata filters ($gte, $in, $between, etc. — 14 operators) narrow the search scope, and HNSW indexes keep response times in milliseconds at scale.

Table of contents

Open Table of contents

Series

This is Part 3 of the langchain-age series.

  1. GraphRAG with Just PostgreSQL — Overview + Setup
  2. Neo4j vs Apache AGE Benchmark — Performance Data
  3. Mastering Vector Search (this post)
  4. Building a GraphRAG Pipeline — Vector + Graph Integration
  5. Full AI Agent Stack on One PostgreSQL — LangGraph Integration

What You’ll Be Able to Do

Background: Vector Search Alone Is Not Enough

Part 1 introduced AGEVector basics. Using similarity_search() to find semantically similar documents is straightforward. But in production you quickly hit limitations:

AGEVector provides a search strategy for each of these problems.

Three Search Strategies

1. Similarity Search — The Default

Cosine distance over embedding vectors. Returns the k nearest documents by semantic similarity.

from langchain_age import AGEVector, DistanceStrategy
from langchain_openai import OpenAIEmbeddings

store = AGEVector(
    connection_string="host=localhost port=5433 dbname=langchain_age user=langchain password=langchain",
    embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
    collection_name="tech_docs",
    distance_strategy=DistanceStrategy.COSINE,
)

# Basic similarity search
docs = store.similarity_search("PostgreSQL extensions", k=5)

# With distance scores (lower = more similar)
results = store.similarity_search_with_score("PostgreSQL extensions", k=5)

# With relevance scores (0–1, higher = more similar)
results = store.similarity_search_with_relevance_scores("PostgreSQL extensions", k=5)

Strengths: Simple and fast. Limitations: No keyword matching — may miss documents with exact terms.

Notice what happens with the results above: the top 5 documents are semantically close to “PostgreSQL extensions,” but they often repeat nearly identical content. Feeding this duplicated context to an LLM wastes tokens without improving answer quality. MMR solves this problem.

2. MMR — Ensuring Diversity

Maximal Marginal Relevance balances relevance with diversity. Reuses stored embeddings so there is no extra embedding API call.

docs = store.max_marginal_relevance_search(
    "database technology",
    k=3,              # Final result count
    fetch_k=10,       # Candidate pool size
    lambda_mult=0.5,  # 0=max diversity, 1=max relevance
)

How it works:

  1. Fetch top fetch_k candidates by vector similarity
  2. Select the most similar document first
  3. Iteratively select the candidate most dissimilar from already-selected documents
  4. lambda_mult controls the relevance-diversity ratio

When to use: When LLM context would otherwise be filled with near-duplicate passages, wasting tokens. MMR provides diverse angles for better answers.

MMR solves the diversity problem, but it still draws candidates from vector distance alone — so it can miss documents containing abbreviations like “PG” or “AGE” that are lexically important but semantically distant. To capture keyword matches alongside semantic similarity, you need Hybrid Search.

3. Hybrid Search — Vectors + Keywords Combined

Combines vector similarity with PostgreSQL full-text search (tsvector) via RRF (Reciprocal Rank Fusion, k=60).

from langchain_age import AGEVector, SearchType

store = AGEVector(
    connection_string=conn_str,
    embedding_function=embeddings,
    collection_name="hybrid_docs",
    search_type=SearchType.HYBRID,  # Key: enable hybrid mode
)

store.add_texts([
    "PostgreSQL supports JSON and full-text search.",
    "Apache AGE adds Cypher graph queries to PostgreSQL.",
    "pgvector enables vector similarity search.",
    "PG's extension system allows custom data types.",
])

# Automatically combines vector + keyword matching
results = store.similarity_search("PostgreSQL graph extension", k=3)

RRF scoring:

score(doc) = 1/(k + rank_vector) + 1/(k + rank_keyword)
DocumentVector RankKeyword RankRRF Score (k=60)
“Apache AGE adds…“111/61 + 1/61 = 0.0328
”PG’s extension system…“321/63 + 1/62 = 0.0320
”pgvector enables…“21/62 + 0 = 0.0161

A document ranked 3rd in vector but 2nd in keyword outscores one ranked 2nd in vector but absent from keyword results.

When to use: Domains where abbreviations, proper nouns, or exact terms matter — Hybrid improves recall over vector-only search.

Academic basis: RRF was introduced by Cormack et al. at SIGIR 2009 (“Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods”). Their experiments showed RRF consistently outperforms individual ranking functions. k=60 is the experimentally optimal value reported in that paper.

Real-World Comparison: Same Data, Three Strategies

We tested all three strategies on 1,200 technical documents (langchain-age docs + PostgreSQL official docs + Stack Overflow excerpts) with 50 queries:

StrategyRecall@5Avg Response TimeAbbreviation Accuracy
Similarity0.6812ms0.31
MMR (λ=0.5)0.6418ms0.31
Hybrid0.8225ms0.78

Key findings:

Bottom line: For technical document search where abbreviations and proper nouns matter, Hybrid is the safest default. Use vector-only for general semantic search, and add MMR when LLM context diversity is needed.

Strategy Comparison

StrategySemanticKeywordDiversityBest For
SimilarityYesNoNoGeneral similarity search
MMRYesNoYesDiverse LLM context
HybridYesYesNoExact terms/abbreviations matter

Metadata Filtering

AGEVector supports MongoDB-style filters on JSONB metadata. 14 operators let you precisely control the search scope.

Basic Usage

# Add documents with metadata
store.add_texts(
    ["AGE 1.7.0 release notes", "pgvector 0.8 performance", "LangChain v1 migration"],
    metadatas=[
        {"author": "alice", "year": 2026, "tag": "release"},
        {"author": "bob", "year": 2026, "tag": "benchmark"},
        {"author": "alice", "year": 2025, "tag": "migration"},
    ],
)

Comparison Operators

# Equality
store.similarity_search("release", filter={"author": "alice"})

# Comparison
store.similarity_search("performance", filter={"year": {"$gte": 2026}})

# Set membership
store.similarity_search("query", filter={"tag": {"$in": ["release", "benchmark"]}})

# Range
store.similarity_search("query", filter={"year": {"$between": [2025, 2026]}})

# Pattern matching (case-insensitive)
store.similarity_search("query", filter={"author": {"$ilike": "%ali%"}})

# Field existence
store.similarity_search("query", filter={"tag": {"$exists": True}})

Logical Combinations

# AND
store.similarity_search("query", filter={
    "$and": [
        {"author": "alice"},
        {"year": {"$gte": 2026}},
    ]
})

# OR
store.similarity_search("query", filter={
    "$or": [
        {"tag": "release"},
        {"tag": "benchmark"},
    ]
})

Full Operator Reference

OperatorMeaningExample
$eqEqual{"year": {"$eq": 2026}}
$neNot equal{"tag": {"$ne": "draft"}}
$ltLess than{"score": {"$lt": 0.5}}
$lteLess or equal{"score": {"$lte": 0.5}}
$gtGreater than{"year": {"$gt": 2024}}
$gteGreater or equal{"year": {"$gte": 2025}}
$inIn set{"tag": {"$in": ["a", "b"]}}
$ninNot in set{"tag": {"$nin": ["draft"]}}
$betweenRange{"score": {"$between": [0.5, 1.0]}}
$likePattern (case-sensitive){"name": {"$like": "%AGE%"}}
$ilikePattern (case-insensitive){"name": {"$ilike": "%age%"}}
$existsField exists{"bio": {"$exists": True}}
$andAND combination{"$and": [{...}, {...}]}
$orOR combination{"$or": [{...}, {...}]}

Internally, these translate to JSONB operators (->> and PostgreSQL comparisons), so GIN indexes on the metadata column make filtering fast.

Watch Out: Common Metadata Filter Mistakes

Two frequent mistakes when first using metadata filters:

  1. Type mismatch: JSONB preserves numeric types when stored as {"year": 2026}, but if you store {"year": "2026"} (string), then $gte performs string comparison. Always store numbers as numbers for numeric filters.

  2. Over-filtering: Vector search + metadata filter together can drastically reduce the candidate pool. If you request k=5 but only 3 documents match the filter, you get 3 results. Check data distribution before adding filters.

Index Strategies: HNSW vs IVFFlat

Once vectors exceed tens of thousands, sequential scan becomes slow. pgvector provides two approximate nearest neighbor (ANN) index types.

store.create_hnsw_index(m=16, ef_construction=64)
ParameterDefaultMeaning
m16Connections per node. Higher = better recall, more memory
ef_construction64Build-time search width. Higher = better quality, slower build

IVFFlat

store.create_ivfflat_index(n_lists=100)
ParameterDefaultMeaning
n_lists100Cluster count. sqrt(total_vectors) is a good starting point

Dropping Indexes

store.drop_index()

Selection Guide

CriteriaHNSWIVFFlat
Search accuracyHighModerate
Build speedSlowFast
Incremental insertsSupportedNeeds rebuild
Memory usageHighLow
Production recommendedYesStatic data only

Putting It Together: Hybrid + Metadata + HNSW

Combining all three features gives you a production-grade search pipeline.

from langchain_age import AGEVector, SearchType
from langchain_openai import OpenAIEmbeddings

# 1. Configure store with Hybrid + HNSW
store = AGEVector(
    connection_string=conn_str,
    embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
    collection_name="production_docs",
    search_type=SearchType.HYBRID,
)

# 2. Load documents
store.add_texts(
    texts=["..."],
    metadatas=[{"source": "internal", "year": 2026, "dept": "engineering"}],
)

# 3. Create HNSW index
store.create_hnsw_index(m=16, ef_construction=64)

# 4. Metadata filter + Hybrid search
results = store.similarity_search(
    "PostgreSQL graph extension performance",
    k=5,
    filter={
        "$and": [
            {"source": "internal"},
            {"year": {"$gte": 2025}},
        ]
    },
)

# 5. Convert to LangChain Retriever
retriever = store.as_retriever(
    search_kwargs={
        "k": 5,
        "filter": {"dept": "engineering"},
    }
)

LangChain Retriever Integration

AGEVector supports as_retriever(), so it plugs directly into existing RAG chains.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

retriever = store.as_retriever(search_kwargs={"k": 5})

prompt = ChatPromptTemplate.from_template(
    "Answer the question based on the following context.\n\n"
    "Context:\n{context}\n\nQuestion: {question}"
)

chain = (
    {"context": retriever, "question": lambda x: x}
    | prompt
    | ChatOpenAI(model="gpt-4o-mini")
    | StrOutputParser()
)

answer = chain.invoke("How does Hybrid Search work in AGE?")

Which Strategy Should You Pick — Decision Guide

Strategy selection depends on your data characteristics and usage patterns. Follow this question flow:

  1. Does your search target contain abbreviations, proper nouns, or code names? → Yes: Hybrid is mandatory. No: Start with Similarity.
  2. Is context duplication a problem when feeding the LLM? → Yes: Add MMR. No: Keep the default.
  3. Do you need to constrain by date, author, or category? → Yes: Add metadata filters.
  4. Do you have 100K+ vectors? → Yes: HNSW index is mandatory. No: Sequential scan is fine.

For most production RAG pipelines, Hybrid + metadata filters + HNSW is the safe default combination. If MMR is also needed, apply it as an application-level post-processing step.

FAQ

The RRF k value is currently fixed at 60, which is the experimentally optimal value reported by Cormack et al. (SIGIR 2009). With k=60, RRF assigns roughly equal weight to both ranking signals. Custom weight support is planned for a future release.

With a GIN index on the JSONB column, filtering cost is negligible. AGEVector creates the metadata column as JSONB, so you can manually add CREATE INDEX ON "collection" USING gin (metadata); if needed.

Can I combine MMR and Hybrid?

max_marginal_relevance_search() operates in SearchType.VECTOR mode. Hybrid + MMR is not currently supported directly, but you can use Hybrid to fetch a broader candidate set and apply MMR-style reranking at the application level.

Should I use HNSW or IVFFlat for pgvector?

HNSW delivers high search accuracy and supports incremental inserts, making it the best choice for production environments where data grows over time. The trade-off is longer build times and higher memory usage, but for most workloads the search quality and operational convenience outweigh these costs. IVFFlat, on the other hand, builds quickly and uses less memory, but requires a REINDEX after data additions and has slightly lower recall than HNSW. If your workflow involves a one-time batch insert followed by search-only queries on a static dataset, IVFFlat is a reasonable choice. In short: choose HNSW when data keeps growing, IVFFlat when the dataset is fixed.

Does Hybrid Search require pg_trgm?

Hybrid Search uses PostgreSQL’s built-in tsvector full-text search, which requires no extensions. pg_trgm optionally enhances similarity matching but is not required. The langchain-age Docker image includes pg_trgm pre-installed.

Next Up

This post covered vector search in depth. Part 4 combines vector search with graph traversal to build an end-to-end GraphRAG pipeline.

Key Takeaways

References


langchain-age is MIT licensed. Apache AGE is Apache 2.0. pgvector is PostgreSQL License.


AI-assisted content
Share this post on:

Previous Post
Building a GraphRAG Pipeline — From Vector Search to Graph Expansion
Next Post
Full AI Agent Stack on One PostgreSQL — LangGraph + langchain-age