LlamaIndex

Integrate Laghav into LlamaIndex pipelines with LaghavCallbackHandler — automatic compression, routing, and savings tracking on every LLM call.

Installation

bash

pip install laghav[llama-index]
# installs: laghav + llama-index-core

Basic usage

llamaindex_basic.py

from llama_index.core import Settings, VectorStoreIndex
from llama_index.core.callbacks import CallbackManager
from laghav.integrations.llama_index import LaghavCallbackHandler
 
# Register Laghav as the global callback handler
handler = LaghavCallbackHandler(
    api_key="lgh_live_...",
    compress=True,
    route=True,
)
Settings.callback_manager = CallbackManager([handler])
 
# All LlamaIndex calls are now compressed and routed through Laghav
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
result = query_engine.query("What caused the Q3 revenue drop?")

Session summary

llamaindex_summary.py

# After your LlamaIndex session completes
summary = handler.session_summary
print("Total LLM calls:         ", summary['total_llm_calls'])
print("Total original tokens:   ", summary['total_original_tokens'])
print("Total compressed tokens: ", summary['total_compressed_tokens'])
print("Total saved (USD):       ", summary['total_saved_usd'])
print("Avg quality score:       ", summary['avg_quality_score'])
 
# Example output:
# Total LLM calls:          8
# Total original tokens:    4840
# Total compressed tokens:  1548
# Total saved (USD):        0.213
# Avg quality score:        94.5

✦Works with any LlamaIndex query engine

The callback hooks into LLMStartEvent and LLMEndEvent in the LlamaIndex event system. It works with any LLM backend (OpenAI, Anthropic, Ollama) and any query engine type.

Per-query settings

llamaindex_options.py

# Override Laghav options per handler instance
handler = LaghavCallbackHandler(
    api_key="lgh_live_...",
    compress=True,
    route=True,
    max_aggressiveness=0.8,   # aggressive for document-heavy queries
    skip_rules=["intent"],
)

LangChain CLI Tool