Documentation
API Reference
Supported Models
Use model: "auto" to let Laghav's ML router pick the cheapest capable model for your query. Or specify any model below directly.
✦Always start with model: "auto"
The auto router achieves 89% cost savings by classifying query complexity (simple, translation, code, complex) and routing accordingly. Override only when you have a specific reason.
Anthropic
| Model ID | Tier | Input cost / 1M tokens | Best for |
|---|---|---|---|
claude-haiku-3 | Cheapest | $0.25 | FAQ, classification, simple Q&A, support bots |
claude-sonnet-4 | Balanced | $3.00 | Code generation, analysis, structured data |
claude-opus-4 | Most capable | $15.00 | Complex reasoning, multi-step agents, PhD-level tasks |
OpenAI
| Model ID | Tier | Input cost / 1M tokens | Best for |
|---|---|---|---|
gpt-4o-mini | Cheapest | $0.15 | Fast classification, simple tasks |
gpt-4o | Balanced | $5.00 | General purpose, vision, code |
| Model ID | Tier | Input cost / 1M tokens | Best for |
|---|---|---|---|
gemini-1.5-flash | Cheapest | $0.075 | Very long contexts, document analysis |
gemini-1.5-pro | Balanced | $3.50 | Multimodal, 1M token contexts |
Routing categories (auto mode)
The ML router (DistilBERT ONNX, 3.4ms CPU latency) classifies every prompt into four categories and routes accordingly:
| Category | Default model | Triggers |
|---|---|---|
| simple | claude-haiku-3 | FAQ, greetings, yes/no questions, classification |
| translation | claude-haiku-3 | Language translation tasks |
| code | claude-sonnet-4 | Code generation, debugging, technical analysis |
| complex | claude-opus-4 | Multi-step reasoning, research, legal, medical |
ℹFallback behavior
If the ML router confidence score is below 0.70 (e.g., ambiguous query), Laghav falls back to a pattern-matching rule set. If the routing service is unavailable, it defaults to
claude-haiku-3.