Observability¶
The observability stack combines Prometheus, provisioned Grafana dashboards, and service-local metrics endpoints.
Production Links¶
The operational endpoints below are live production surfaces. For the full inventory, see Production Surfaces.
Surface |
URL |
Usage |
|---|---|---|
|
Official production technical documentation. |
|
|
Dashboards and runtime metrics. |
|
|
Vector collection inspection. |
|
|
Graph inspection and troubleshooting. |
|
|
Driver and Neo4j Browser Bolt access. |
|
|
Container state and deployment inspection. |
Provisioned dashboards¶
Dashboard JSON |
Intent |
|---|---|
|
Runtime and indexing overview |
|
Live ingestion progress and failures |
|
Host and container resource usage |
Prometheus scrape inventory¶
Source of truth: monitoring/prometheus.yml.
Job |
Target |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note
rerank-service is covered today through health probes surfaced by
api-gateway and rag-service. It does not have a dedicated Prometheus scrape
job in monitoring/prometheus.yml.
Core runtime signals¶
Signal family |
Metric examples |
|---|---|
Gateway edge behavior |
|
RAG request behavior |
|
Phase timing |
|
Provider failures |
|
Retrieval failures |
|
Backend health |
|
Current assessment¶
The
Engine/RAGdashboard now covers the runtime metrics surface expected by the current query stack.The chart audit pages still separate provisioned panels from expectations, but the critical runtime gaps are now closed for
Engine/RAG.
For the architectural role of each async worker and the queue/job chain behind these metrics endpoints, see Workers.