Streaming and Phases

The frontend trace model is built from structured SSE events. The most important runtime contract is that phases remain understandable and auditable.

Main phases

Phase

Meaning

routing

Question qualification and profile selection

planning

Agentic planning, clarification, and retrieval shaping

retrieval

Search, fusion, rerank, and result shaping

crag

Sufficiency evaluation and refinement

enrichment

Context and graph enrichment

complementary

Additional retrieval for deeper search modes

compression

Context reduction before generation

generation

LLM answer generation

citations

Citation validation and repair

cypher

Graph support branch activity

Metrics linkage

Phase timings are exported through lalandre_rag_service_phase_duration_seconds. The metric can also contain normalized names derived from:

  • phase_timings_ms

  • graph_query_phase_timings_ms

  • graph_retrieval_phase_timings_ms

Why the phase chart matters

The Engine/RAG dashboard should expose these phases clearly enough for:

  • runtime bottleneck detection,

  • regression analysis after prompt or retrieval changes,

  • explanation of user-visible latency in the chat trace.

That coverage is currently incomplete and is tracked in the dashboard audit pages.