POST /v1/complete

The primary endpoint. Compresses your prompt, routes to the cheapest capable model, checks the cache, and returns quality scores — all in a single call.

POSThttps://api.laghav.ai/v1/complete

Request body

request.json

{
  messages: [
    {role: "system", content: "You are a helpful assistant"},
    {role: "user",   content: "string"}
  ],
  model: "auto",
  max_tokens: 1000,
  stream: false,
  laghav_options: {
    compress: true,
    route: true,
    cache: true,
    score: true,
    budget_id: "engineering",
    mask_pii: false,
    protocol_id: "acme-corp-v1",
    skip_rules: ["intent"],
    max_aggressiveness: 0.7,
    conversation_id: "conv_abc123",
    max_turns_to_keep: 10,
    agent_run_id: "agent_xyz789"
  }
}

Request fields

Field	Type	Required	Default	Description
messages	array	Yes	—	Array of {role, content} objects. Standard OpenAI-compatible format.
model	string	No	"auto"	Target model. Use "auto" for Laghav routing, or a specific model name.
max_tokens	integer	No	1000	Maximum tokens in the response.
stream	boolean	No	false	If true, response is streamed as Server-Sent Events.
`laghav_options.compress`	boolean	No	true	Enable prompt compression pipeline.
`laghav_options.route`	boolean	No	true	Enable ML model routing.
`laghav_options.cache`	boolean	No	true	Enable semantic dedup cache.
`laghav_options.score`	boolean	No	true	Include quality score in response.
`laghav_options.budget_id`	string	No	—	Team budget to charge this call against.
`laghav_options.mask_pii`	boolean	No	false	Enable PII masking via Presidio (Phase 2).
`laghav_options.max_aggressiveness`	float	No	0.5	Compression level 0.0 (light) to 1.0 (maximum).
`laghav_options.skip_rules`	array	No	[]	Compression rule names to skip. See Compression docs.
`laghav_options.conversation_id`	string	No	—	Enables multi-turn conversation optimization.
`laghav_options.agent_run_id`	string	No	—	Enables agent loop cost tracking and safety guard.

Supported models

Model ID	Provider	Tier	Cost / 1M tokens
`auto`	Laghav selects	Auto-routing	Cheapest capable
`claude-haiku-3`	Anthropic	Cheapest	$0.25
`claude-sonnet-4`	Anthropic	Balanced	$3.00
`claude-opus-4`	Anthropic	Most capable	$15.00
`gpt-4o-mini`	OpenAI	Cheapest	$0.15
`gpt-4o`	OpenAI	Balanced	$5.00
`gemini-1.5-flash`	Google	Cheapest	$0.075
`gemini-1.5-pro`	Google	Balanced	$3.50

Response (200)

response.json

{
  id: "lgh_req_abc123",
  object: "chat.completion",
  created: 1717257600,
  choices: [{
    index: 0,
    message: {
      role: "assistant",
      content: "Here is the analysis..."
    },
    finish_reason: "stop"
  }],
  model: "claude-haiku-3-20240307",
  laghav_meta: {
    original_tokens: 847,
    compressed_tokens: 340,
    compression_ratio: 0.60,
    quality_score: 94,
    cost_original_usd: 0.000212,
    cost_actual_usd: 0.000085,
    saved_usd: 0.000127,
    routing_reason: "faq_pattern",
    model_requested: "auto",
    rules_applied: ["filler", "preamble"],
    cache_hit: false,
    pii_masked: false,
    latency_overhead_ms: 18,
    conversation_id: "conv_abc123"
  }
}

laghav_meta fields

Field	Type	Description
original_tokens	integer	Token count before compression
compressed_tokens	integer	Token count after compression
compression_ratio	float	Fraction of tokens removed (e.g. 0.60 = 60% removed)
quality_score	integer	0–100 semantic similarity score of compressed vs original
cost_original_usd	float	What this call would have cost without Laghav
cost_actual_usd	float	What you actually paid
saved_usd	float	cost_original_usd - cost_actual_usd
routing_reason	string	Why this model was selected (e.g. 'faq_pattern', 'code_task')
model_requested	string	The model field you sent (often 'auto')
rules_applied	array	Compression rules that modified this prompt
cache_hit	boolean	true if response was served from semantic cache
pii_masked	boolean	true if PII was detected and masked
latency_overhead_ms	integer	Milliseconds Laghav added to total latency

Streaming

Set "stream": true to receive Server-Sent Events. Each chunk is a partial choices.delta. The final chunk carries the full laghav_meta.

bash

curl -X POST https://api.laghav.ai/v1/complete \
  -H "Authorization: Bearer lgh_live_xxx" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{"messages":[...],"model":"auto","stream":true}'
# data: {"choices":[{"index":0,"delta":{"content":"Here"},"finish_reason":null}]}
# data: {"choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}
# data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"laghav_meta":{...}}
# data: [DONE]

ℹOpenAI compatible

The response schema is a superset of the OpenAI Chat Completions format. Any library that works with OpenAI (LangChain, LlamaIndex, LiteLLM) works with Laghav with only the base URL and API key changed.

Authentication Rate Limits