TL;DR: A single
build_llm()factory function unifies 5 LLM providers (2 cloud + 3 local), with local models auto-detected via the/v1/modelsAPI.
Table of contents
Open Table of contents
The 5 Providers
| Provider | Type | Default Model | API Format |
|---|---|---|---|
| Anthropic | Cloud | claude-sonnet-4-6 | Anthropic native |
| OpenRouter | Cloud | anthropic/claude-sonnet-4-5 | OpenAI compatible |
| Ollama | Local | llama3.1 | OpenAI compatible |
| LM Studio | Local | loaded-model | OpenAI compatible |
| vLLM | Local | meta-llama/Llama-3.1-8B | OpenAI compatible |
build_llm(): Single Factory
def build_llm() -> Any:
provider = config.LLM_PROVIDER or os.getenv("LLM_PROVIDER", "anthropic")
if provider == "openrouter":
return ChatOpenAI(model=..., base_url=config.OPENROUTER_BASE_URL, ...)
elif provider in ("ollama", "lmstudio", "vllm"):
return ChatOpenAI(model=..., api_key="local", base_url=..., ...)
else: # anthropic
return ChatAnthropic(model=..., ...)
Key: ChatOpenAI’s base_url parameter supports any OpenAI-compatible API local server. Only Anthropic uses a dedicated SDK (ChatAnthropic). Ollama’s OpenAI compatibility mode is the foundation of this integration.
Build Variants (Deploy Mode)
VITE_DEPLOY_MODE controls provider visibility at build time:
export const PROVIDERS = (() => {
const all = [
{ value: "anthropic", cloud: true, local: false },
{ value: "openrouter", cloud: true, local: false },
{ value: "ollama", cloud: false, local: true },
{ value: "lmstudio", cloud: false, local: true },
{ value: "vllm", cloud: false, local: true },
];
if (DEPLOY_MODE === "local") return all.filter((p) => p.local);
if (DEPLOY_MODE === "cloud") return all.filter((p) => p.cloud);
return all;
})();
| Command | Result |
|---|---|
npm run build | All providers |
VITE_DEPLOY_MODE=local npm run build | Ollama, LM Studio, vLLM only |
VITE_DEPLOY_MODE=cloud npm run build | Anthropic, OpenRouter only |
The “local” build completely removes the API key input UI.
Local Model Auto-Detection
The SettingsModal auto-fetches model lists from local servers using a two-stage approach:
/v1/models— OpenAI-compatible API (all local servers)/api/tags— Ollama-specific API (fallback)
Both use a 5-second timeout with AbortSignal.timeout(5000).
Settings Persistence
Settings are saved to ~/.cowork.env and survive app restarts. Provider, model, and API key changes persist automatically.
Agent Rebuild
When the provider or model changes, all active agents rebuild immediately through rebuild_all_agents_safe(), which uses an asyncio.Lock to prevent concurrent rebuilds.
Benchmark
| Metric | Value |
|---|---|
| Model list fetch latency (Ollama local) | ~120ms |
| Model list fetch latency (LM Studio local) | ~80ms |
| Model list fetch timeout | 5 seconds |
| Provider switch to agent rebuild | ~200ms |
| build_llm() execution time | ~15ms |
Lessons Learned
Passing a tool name ("write_file") to create_react_agent’s interrupt_before parameter produced a ValueError. LangGraph’s interrupt_before only accepts node names, not tool names. This misunderstanding cost hours of debugging and was a key reason for choosing DeepAgents SDK, whose interrupt_on works with tool names directly.
The second issue was Ollama’s model list API. Ollama originally used the /api/tags endpoint, but added OpenAI-compatible /v1/models from v0.1.24. Calling only /v1/models returned 404 on older Ollama versions. A two-stage fallback strategy (try /v1/models first, fall back to /api/tags) solved it.
Third, we learned the hard way that not including HTTP-Referer and X-Title headers in OpenRouter requests leads to faster 429 rate limiting. Should have read the OpenRouter docs more carefully from the start.
FAQ
Can I run Ollama on a remote server?
Yes. Change the server URL to a remote address like http://192.168.1.100:11434/v1. CORS configuration may be needed.
Why include OpenRouter?
OpenRouter unifies multiple LLM providers under one API. You can use Claude, GPT-4, Gemini, and more with a single API key — convenient for testing and comparison.
Does changing the model affect in-progress conversations?
No. Current streams are unaffected. The new model applies from the next message.
Series
- DeepCoWork: I Built an AI Agent Desktop App
- Tauri 2 + Python Sidecar
- DeepAgents SDK Internals
- System Prompt Design per Mode
- SSE Streaming Pipeline
- HITL Approval Flow
- Multi-Agent ACP Mode
- Agent Memory 4 Layers
- Skills System
- [This post] LLM Provider Integration
- Security Checklist
- GitHub Actions Cross-Platform Build