HybridFlow-inspired routing (not a direct implementation of the paper, which is edge-cloud). T0โT1โT2 exhausted before any cloud call. T3 feature-flagged and disabled by default. Rust Tokio async classifier routes based on measured utility.
T0Qwen2.5-0.5BPi 5 (Ollama)
15-20 tok/s$0
Quick confirmations, simple follow-ups
T1Qwen2.5-3BPi 5 (Ollama)
5-8 tok/s$0
Most Socratic dialogue, agent routing, BKT updates
T2Llama3.1-8BHome server (llama.cpp C++, Vulkan)
20-40 tok/s$0
Deep reasoning, content generation, complex scaffolding
T3Claude Sonnet / Gemini Flash / GrokCloud (feature-flagged, disabled by default)
Variable$0-30/mo
Only when local confidence < 0.85 for specific agents (debates, emotional coaching)
SELECTIVE PREMIUM BOOST (T3)
When local confidence < 0.85 for specific agents (e.g., Debate Coach deep Socratic reasoning, Zen emotional coaching), route to Claude Sonnet, Gemini Flash, or Grok. You decide per agent. Default stays 100% local. Feature-flagged off until you enable it.