Engine/RAG Dashboard¶

Purpose and audience¶

Purpose: provide one operational overview for the online query runtime and the async indexing pipeline.
Audience: engineers and operators diagnosing latency, failure rate, queue pressure, and ingestion backlog.
Source of truth: monitoring/grafana/dashboards/engine-rag.json
Live dashboard: Grafana Engine/RAG dashboard for RagLogic AI

Provisioned panel inventory¶

Overview and backend health
RAG runtime health and collection availability
Query rate by mode and by outcome
Query latency by mode and phase latency p95
Provider and retrieval error breakdown
Queue depth
Errors per minute
Traffic
Pipeline progress and counters
Relation sync and extraction status
Worker throughput and latency

Current panel-to-signal mapping¶

Expected signal	Current coverage	Notes
Gateway backend health	Present	`lalandre_api_gateway_backend_health`
Gateway traffic	Present	Query and search rates are graphed
Gateway proxy errors	Present	Proxy error rate is graphed
RAG backend health	Present	`lalandre_rag_service_backend_health` including collections
RAG query modes	Present	`lalandre_rag_service_query_requests_total` broken down by `mode`
RAG outcomes	Present	`lalandre_rag_service_query_requests_total` broken down by `outcome`
Query latency by mode	Present	`lalandre_rag_service_query_duration_seconds_bucket`
RAG phase durations	Present	`lalandre_rag_service_phase_duration_seconds_bucket`
Provider errors	Present	`lalandre_rag_service_provider_errors_total`
Retrieval errors	Present	`lalandre_rag_retrieval_errors_total`
Queue depth	Present	Redis key size panels
Worker throughput	Present	Chunking, embedding, extraction job rate panels
Worker latency	Present	Histogram quantiles for worker durations
Extraction backlog and relation sync	Present	Extraction status and graph sync panels

Validation notes¶

The dashboard now exposes the core runtime signals that were previously missing:

canonical query modes such as rag, llm_only, summarize, and compare,
response outcomes including grounded, weakly_grounded, clarify, and hard_block,
request latency by mode,
phase latency via lalandre_rag_service_phase_duration_seconds,
provider failures, retrieval failures, and rag-service backend health.

This validates the dashboard for the current runtime metrics surface described in the codebase and in the Sphinx operations pages.

Completion checklist¶

[x] Provisioned dashboard JSON is versioned in the repo.
[x] Existing panels are inventoried.
[x] Runtime metrics have been compared with panel coverage.
[x] Query mode coverage is complete.
[x] Query outcome coverage is complete.
[x] Phase timing coverage is complete.
[x] Provider and retrieval error coverage is complete.
[x] RAG backend health coverage is complete.