Skip to main content
Documentation
API Reference

POST /v1/complete

The primary endpoint. Compresses your prompt, routes to the cheapest capable model, checks the cache, and returns quality scores — all in a single call.

POSThttps://api.laghav.ai/v1/complete

Request body

request.json
{
messages: [
{role: "system", content: "You are a helpful assistant"},
{role: "user", content: "string"}
],
model: "auto",
max_tokens: 1000,
stream: false,
laghav_options: {
compress: true,
route: true,
cache: true,
score: true,
budget_id: "engineering",
mask_pii: false,
protocol_id: "acme-corp-v1",
skip_rules: ["intent"],
max_aggressiveness: 0.7,
conversation_id: "conv_abc123",
max_turns_to_keep: 10,
agent_run_id: "agent_xyz789"
}
}

Request fields

FieldTypeRequiredDefaultDescription
messagesarrayYesArray of {role, content} objects. Standard OpenAI-compatible format.
modelstringNo"auto"Target model. Use "auto" for Laghav routing, or a specific model name.
max_tokensintegerNo1000Maximum tokens in the response.
streambooleanNofalseIf true, response is streamed as Server-Sent Events.
laghav_options.compressbooleanNotrueEnable prompt compression pipeline.
laghav_options.routebooleanNotrueEnable ML model routing.
laghav_options.cachebooleanNotrueEnable semantic dedup cache.
laghav_options.scorebooleanNotrueInclude quality score in response.
laghav_options.budget_idstringNoTeam budget to charge this call against.
laghav_options.mask_piibooleanNofalseEnable PII masking via Presidio (Phase 2).
laghav_options.max_aggressivenessfloatNo0.5Compression level 0.0 (light) to 1.0 (maximum).
laghav_options.skip_rulesarrayNo[]Compression rule names to skip. See Compression docs.
laghav_options.conversation_idstringNoEnables multi-turn conversation optimization.
laghav_options.agent_run_idstringNoEnables agent loop cost tracking and safety guard.

Supported models

Model IDProviderTierCost / 1M tokens
autoLaghav selectsAuto-routingCheapest capable
claude-haiku-3AnthropicCheapest$0.25
claude-sonnet-4AnthropicBalanced$3.00
claude-opus-4AnthropicMost capable$15.00
gpt-4o-miniOpenAICheapest$0.15
gpt-4oOpenAIBalanced$5.00
gemini-1.5-flashGoogleCheapest$0.075
gemini-1.5-proGoogleBalanced$3.50

Response (200)

response.json
{
id: "lgh_req_abc123",
object: "chat.completion",
created: 1717257600,
choices: [{
index: 0,
message: {
role: "assistant",
content: "Here is the analysis..."
},
finish_reason: "stop"
}],
model: "claude-haiku-3-20240307",
laghav_meta: {
original_tokens: 847,
compressed_tokens: 340,
compression_ratio: 0.60,
quality_score: 94,
cost_original_usd: 0.000212,
cost_actual_usd: 0.000085,
saved_usd: 0.000127,
routing_reason: "faq_pattern",
model_requested: "auto",
rules_applied: ["filler", "preamble"],
cache_hit: false,
pii_masked: false,
latency_overhead_ms: 18,
conversation_id: "conv_abc123"
}
}

laghav_meta fields

FieldTypeDescription
original_tokensintegerToken count before compression
compressed_tokensintegerToken count after compression
compression_ratiofloatFraction of tokens removed (e.g. 0.60 = 60% removed)
quality_scoreinteger0–100 semantic similarity score of compressed vs original
cost_original_usdfloatWhat this call would have cost without Laghav
cost_actual_usdfloatWhat you actually paid
saved_usdfloatcost_original_usd - cost_actual_usd
routing_reasonstringWhy this model was selected (e.g. 'faq_pattern', 'code_task')
model_requestedstringThe model field you sent (often 'auto')
rules_appliedarrayCompression rules that modified this prompt
cache_hitbooleantrue if response was served from semantic cache
pii_maskedbooleantrue if PII was detected and masked
latency_overhead_msintegerMilliseconds Laghav added to total latency

Streaming

Set "stream": true to receive Server-Sent Events. Each chunk is a partial choices.delta. The final chunk carries the full laghav_meta.

bash
curl -X POST https://api.laghav.ai/v1/complete \
-H "Authorization: Bearer lgh_live_xxx" \
-H "Content-Type: application/json" \
--no-buffer \
-d '{"messages":[...],"model":"auto","stream":true}'
# data: {"choices":[{"index":0,"delta":{"content":"Here"},"finish_reason":null}]}
# data: {"choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}
# data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"laghav_meta":{...}}
# data: [DONE]
OpenAI compatible
The response schema is a superset of the OpenAI Chat Completions format. Any library that works with OpenAI (LangChain, LlamaIndex, LiteLLM) works with Laghav with only the base URL and API key changed.