Streaming and Phases¶
The frontend trace model is built from structured SSE events. The most important runtime contract is that phases remain understandable and auditable.
Main phases¶
Phase |
Meaning |
|---|---|
|
Question qualification and profile selection |
|
Agentic planning, clarification, and retrieval shaping |
|
Search, fusion, rerank, and result shaping |
|
Sufficiency evaluation and refinement |
|
Context and graph enrichment |
|
Additional retrieval for deeper search modes |
|
Context reduction before generation |
|
LLM answer generation |
|
Citation validation and repair |
|
Graph support branch activity |
Metrics linkage¶
Phase timings are exported through lalandre_rag_service_phase_duration_seconds.
The metric can also contain normalized names derived from:
phase_timings_msgraph_query_phase_timings_msgraph_retrieval_phase_timings_ms
Why the phase chart matters¶
The Engine/RAG dashboard should expose these phases clearly enough for:
runtime bottleneck detection,
regression analysis after prompt or retrieval changes,
explanation of user-visible latency in the chat trace.
That coverage is currently incomplete and is tracked in the dashboard audit pages.