Observability

The observability stack combines Prometheus, provisioned Grafana dashboards, and service-local metrics endpoints.

Provisioned dashboards

Dashboard JSON

Intent

monitoring/grafana/dashboards/engine-rag.json

Runtime and indexing overview

monitoring/grafana/dashboards/ingestion-runs.json

Live ingestion progress and failures

monitoring/grafana/dashboards/vps-resources.json

Host and container resource usage

Prometheus scrape inventory

Source of truth: monitoring/prometheus.yml.

Job

Target

api_gateway

api-gateway:8000/metrics

rag_service

rag-service:8001/metrics

embedding_service

embedding-service:8002/metrics

chunking_worker

chunking-worker:9109/metrics

embedding_worker

embedding-worker:9108/metrics

embedding_worker_e5

embedding-worker-e5:9108/metrics

extraction_worker

extraction-worker:9107/metrics

postgresql

postgres_exporter:9187

redis

redis_exporter:9121/metrics

pdf_extract

pdf-extract:8080/metrics

docker_stats

docker-stats-exporter:9417/

node

node-exporter:9100

Note

rerank-service is covered today through health probes surfaced by api-gateway and rag-service. It does not have a dedicated Prometheus scrape job in monitoring/prometheus.yml.

Core runtime signals

Signal family

Metric examples

Gateway edge behavior

lalandre_api_gateway_query_requests_total, lalandre_api_gateway_proxy_errors_total

RAG request behavior

lalandre_rag_service_query_requests_total, lalandre_rag_service_query_duration_seconds

Phase timing

lalandre_rag_service_phase_duration_seconds

Provider failures

lalandre_rag_service_provider_errors_total

Retrieval failures

lalandre_rag_retrieval_errors_total

Backend health

lalandre_api_gateway_backend_health, lalandre_rag_service_backend_health

Current assessment

  • The Engine/RAG dashboard now covers the runtime metrics surface expected by the current query stack.

  • The chart audit pages still separate provisioned panels from expectations, but the critical runtime gaps are now closed for Engine/RAG.

For the architectural role of each async worker and the queue/job chain behind these metrics endpoints, see Workers.