Skip to main content
Documentation
Docs v1.0

Laghav Documentation

Everything you need to compress AI prompts, route to the cheapest capable model, and ship in under 10 minutes. One endpoint change. No refactoring.

New here?
Start with the Python Quickstart — you'll be saving tokens within 5 minutes.

How Laghav works

Laghav sits between your application and any LLM provider. Every call passes through a six-stage pipeline:

1. Compress

Strips filler words, preambles, and duplicated context using 8 specialized rules + LLMLingua-2.

2. Route

ML classifier (DistilBERT ONNX) maps complexity → cheapest capable model. FAQ → Haiku. Code → Sonnet.

3. Cache

Semantic vector search on Redis Stack. Identical or similar queries served instantly — zero LLM cost.

4. Score

Quality scorer rates the compressed prompt 0–100 before the LLM call. You set the minimum threshold.

5. Govern

PII masking (Presidio), team budget caps, audit logs, per-app API keys, and governance protocols.

6. Observe

Real-time savings dashboard by app, model, team, and compression rule. ClickHouse analytics pipeline.

one-line-migration.py
# Before: direct Anthropic call
response = anthropic.messages.create(
model="claude-opus-4",
messages=[{"role": "user", "content": prompt}]
)
# After: route through Laghav
from laghav import LaghavClient
client = LaghavClient(api_key="lgh_live_xxx")
response = client.complete(
messages=[{"role": "user", "content": prompt}],
model="auto" # Laghav picks cheapest capable model
)
print(response.laghav_meta.compression_ratio) # 0.60
print(response.laghav_meta.quality_score) # 94
print(response.laghav_meta.saved_usd) # 0.043

Where to go next

61%

avg token reduction

94/100

avg quality score

<20ms

latency overhead