Skip to main content
Documentation
Features

Quality Scoring

Before every LLM call, Laghav rates the compressed prompt on a 0–100 scale. If quality falls below your threshold, compression is rolled back automatically.

How scores are computed

The quality scorer uses sentence-transformers cosine similarity between the original and compressed embeddings. A score of 94/100 means the compressed prompt retains 94% of the semantic content of the original. Scores below 80 trigger automatic compression rollback.

Score rangeMeaningAction
95–100Excellent — no semantic lossProceed with compressed prompt
85–94Good — minimal semantic lossProceed with compressed prompt
75–84Acceptable — some context trimmedProceed with compressed prompt
< 75Poor — significant context lostLaghav rolls back to original prompt

Reading the quality score

scoring.py
response = client.complete(messages=messages, model="auto")
score = response.laghav_meta.quality_score
print(f"Quality score: {score}/100")
# Score is always based on the compressed output actually sent
# If compression was rolled back, score reflects the uncompressed prompt (100/100)

Disabling scoring

no_score.py
# Skip scoring for ~2ms latency reduction (not recommended for production)
response = client.complete(
messages=messages,
model="auto",
laghav_options={"score": False}
)
Score vs compression tradeoff
Higher max_aggressiveness values compress more tokens but may lower quality scores. For production, keep aggressiveness at 0.5–0.7 unless you have validated higher values on your specific data.