Choosing LLM Tiers and Families
Last updated: 2026-04-06
Quick answer
Decide by measurable task needs—latency, accuracy on your evals, context length, tool reliability, and $/1M tokens—not by leaderboard vibes. Most systems mix a fast tier and a strong tier.
Decision criteria
- Latency SLA: interactive UI vs batch jobs.
- Task error cost: wrong triage vs wrong legal summary.
- Context: long documents vs short prompts.
- Tool calling: structured JSON reliability under your schema.
- Residency and compliance: hosted region, logging, retention.
- Unit economics: volume where open-weight or committed capacity wins.
Tradeoff breakdown
Larger hosted “frontier” tiers usually improve reasoning and tool-use consistency at higher latency and price. Smaller or specialized models reduce cost and time-to-first-token but raise the burden of evals and fallbacks when inputs are hard.
When to choose option A (hosted frontier / general strong tier)
Complex planning, fragile tool chains, long-context synthesis, and low-tolerance failure modes where retries are expensive—provided budget and data-handling terms fit.
When to choose option B (smaller, specialized, or self-hosted tier)
High-volume classification, extraction, routing, redaction, or on-prem requirements where you can prove parity on a fixed eval set and operate the stack.
Failure modes
One model for every step; no shadow testing when providers bump versions; ignoring that product modes change behavior even when the “same” model name appears in the UI.
Related pages
LLMs in agentic systems · Engineering code review case study · Local and open-weight models · AI platforms and tools · Categories