Posts
All the articles I've posted.
-
Korean RAG Benchmark Conclusion: Look at the Pipeline Before Upgrading the Model
Synthesis of a Korean RAG benchmark — the same GPT-5.4 with a tuned pipeline hits accuracy 0.827, +6.0pp over a 10× costlier model. A 0.6B Korean reranker beats a 4B SOTA by +1.83pp. Reranker is the dominant axis. The 7 findings and the recommended production pipeline.
-
Stacking Univariate Winners Didn't Give the Optimum — A 384-Combination Korean RAG Sweep
Scoring all 384 Pre×Retrieval×Reranker combinations for Korean RAG — query2doc, only 4th by univariate e2e judge, becomes the global winner (judge 4.067/acc 0.827) once paired with jina-reranker-m0. The MRR winner ≠ the answer-quality winner. Interaction is why the full sweep was needed.
-
How Far Have Open-Weight LLMs Come in Korean RAG — 46 Generators and Judge Reliability
Comparing 46 Korean RAG generators (27 open + 19 closed) — gpt-oss-120b and kimi-k2.5 tie for the open-weight lead (acc 0.740), and gpt-oss-20b reaches 0.727 at 13GB VRAM. The closed leader gpt-5.4 (0.787) is -4.7pp ahead. A single LLM-as-Judge shook the rankings.
-
Why a 0.6B Korean Reranker Beat a 4B SOTA — Comparing 25 Rerankers for Korean RAG
Univariate comparison of 25 rerankers for Korean RAG — a 0.6B Korean fine-tune (dragonkue/bge-reranker-v2-m3-ko) hits MRR 0.7697, beating the 6.7× larger 2025 SOTA Qwen3-Reranker-4B (0.7514) by +1.83pp. The reranker was the single biggest axis.
-
Dense Alone Wasn't Enough: BM25-KIWI, Hybrid, and Query Transforms in Korean RAG
Univariate retrieval comparison for Korean RAG — Hybrid 3:7 (Dense + BM25-KIWI) hits MRR 0.7171, beating every single-method retriever. BM25 needs morphology (KIWI): +14.4pp over whitespace. Pre-retrieval query transforms were noise-level on their own.
-
Korean RAG Ingestion: Simpler Choices Won in Loader, Chunker, Embedding
Univariate comparison over 300 Korean Q&A — PyMuPDF wins the loader at MRR 0.6486; the top char chunker by dense MRR is Chonkie Fast 800 (0.6903), but within ≈1.5pp noise, so the standard LC Recursive 300/50 (0.6816 dense / 0.7171 hybrid) was adopted downstream; KoE5 beats an 8B English model by +0.16 MRR. Korean alignment mattered more than processing complexity.
-
Korean RAG Benchmark: Why I Took the Whole Pipeline Apart with 300 Questions
Methodology of a Korean RAG benchmark that decomposes the pipeline into 6 stages and runs a full 384-combination Cartesian sweep. 300 Q&A × 58 PDF × 5 domains, 46 generators (27 open + 19 closed), 4-metric LLM-as-Judge, ≈1.2M LLM calls.
-
RunPod Referral Link: Get $5-$500 in GPU Credits
Sign up with my RunPod referral link and get a $5-$500 credit bonus when you add $10 for the first time.
-
Vultr Referral Link: Get $300 in Credits
Sign up with my Vultr referral link and get $300 in credits to try VPS, Cloud GPU, Kubernetes, and Object Storage.
-
Local LLM Inference Benchmark: Experimental Design Across 4 Hardware Platforms and 5 Engines
Updated:Methodology, experimental design, and gotchas for a cross-platform benchmark measuring Qwen3.5 on M5 Max, RTX 3090×2, DGX Spark, and Ryzen AI MAX 395+.
-
Qwen3.5 Cross-Platform Benchmark: 4 Hardware Targets × 5 Engines Compared
Updated:Apples-to-apples Qwen3.5 numbers across Mac M5 Max, RTX 3090×2, DGX Spark, and Ryzen AI MAX 395+. Cold prefill, cache disabled, randomized run order.
-
Qwen3.5 Local Inference Benchmark Results: 4 Machines × 5 Engines
Updated:Generation and prefill throughput for Qwen3.5 (9B, 27B, 35B-A3B MoE, 122B-A10B MoE) on M5 Max, RTX 3090×2, DGX Spark GB10, and Ryzen AI MAX 395 — measured with llama.cpp, MLX, Ollama, vLLM, and Lemonade.
-
Building a GraphRAG Pipeline — From Vector Search to Graph Expansion
Solve multi-hop questions that plain vector RAG can't answer. Vectorize graph nodes with from_existing_graph in one line, auto-convert natural language to Cypher with CypherQAChain.
-
Mastering Vector Search with langchain-age — Hybrid Search, MMR, and Metadata Filtering
Why Hybrid Search matters for pgvector, when to use each strategy, and real recall benchmarks. Includes HNSW vs IVFFlat selection criteria and MongoDB-style metadata filtering.
-
Full AI Agent Stack on One PostgreSQL — LangGraph + langchain-age
Can you replace Neo4j+Redis+Pinecone with just PostgreSQL for an AI Agent? A real architecture that unifies graph, vectors, checkpoints, and long-term memory in one database.