Engine/RAG Dashboard

Purpose and audience

  • Purpose: provide one operational overview for the online query runtime and the async indexing pipeline.

  • Audience: engineers and operators diagnosing latency, failure rate, queue pressure, and ingestion backlog.

  • Source of truth: monitoring/grafana/dashboards/engine-rag.json

  • Live dashboard: Grafana Engine/RAG dashboard for RagLogic AI

Provisioned panel inventory

  • Overview and backend health

  • RAG runtime health and collection availability

  • Query rate by mode and by outcome

  • Query latency by mode and phase latency p95

  • Provider and retrieval error breakdown

  • Queue depth

  • Errors per minute

  • Traffic

  • Pipeline progress and counters

  • Relation sync and extraction status

  • Worker throughput and latency

Current panel-to-signal mapping

Expected signal

Current coverage

Notes

Gateway backend health

Present

lalandre_api_gateway_backend_health

Gateway traffic

Present

Query and search rates are graphed

Gateway proxy errors

Present

Proxy error rate is graphed

RAG backend health

Present

lalandre_rag_service_backend_health including collections

RAG query modes

Present

lalandre_rag_service_query_requests_total broken down by mode

RAG outcomes

Present

lalandre_rag_service_query_requests_total broken down by outcome

Query latency by mode

Present

lalandre_rag_service_query_duration_seconds_bucket

RAG phase durations

Present

lalandre_rag_service_phase_duration_seconds_bucket

Provider errors

Present

lalandre_rag_service_provider_errors_total

Retrieval errors

Present

lalandre_rag_retrieval_errors_total

Queue depth

Present

Redis key size panels

Worker throughput

Present

Chunking, embedding, extraction job rate panels

Worker latency

Present

Histogram quantiles for worker durations

Extraction backlog and relation sync

Present

Extraction status and graph sync panels

Validation notes

The dashboard now exposes the core runtime signals that were previously missing:

  • canonical query modes such as rag, llm_only, summarize, and compare,

  • response outcomes including grounded, weakly_grounded, clarify, and hard_block,

  • request latency by mode,

  • phase latency via lalandre_rag_service_phase_duration_seconds,

  • provider failures, retrieval failures, and rag-service backend health.

This validates the dashboard for the current runtime metrics surface described in the codebase and in the Sphinx operations pages.

Completion checklist

  • [x] Provisioned dashboard JSON is versioned in the repo.

  • [x] Existing panels are inventoried.

  • [x] Runtime metrics have been compared with panel coverage.

  • [x] Query mode coverage is complete.

  • [x] Query outcome coverage is complete.

  • [x] Phase timing coverage is complete.

  • [x] Provider and retrieval error coverage is complete.

  • [x] RAG backend health coverage is complete.