Choosing LLM Tiers and Families

Last updated: 2026-04-06

Quick answer

Decide by measurable task needs—latency, accuracy on your evals, context length, tool reliability, and $/1M tokens—not by leaderboard vibes. Most systems mix a fast tier and a strong tier.

Decision criteria

Latency SLA: interactive UI vs batch jobs.
Task error cost: wrong triage vs wrong legal summary.
Context: long documents vs short prompts.
Tool calling: structured JSON reliability under your schema.
Residency and compliance: hosted region, logging, retention.
Unit economics: volume where open-weight or committed capacity wins.

Larger hosted “frontier” tiers usually improve reasoning and tool-use consistency at higher latency and price. Smaller or specialized models reduce cost and time-to-first-token but raise the burden of evals and fallbacks when inputs are hard.

When to choose option A (hosted frontier / general strong tier)

Complex planning, fragile tool chains, long-context synthesis, and low-tolerance failure modes where retries are expensive—provided budget and data-handling terms fit.

When to choose option B (smaller, specialized, or self-hosted tier)

High-volume classification, extraction, routing, redaction, or on-prem requirements where you can prove parity on a fixed eval set and operate the stack.

Failure modes

One model for every step; no shadow testing when providers bump versions; ignoring that product modes change behavior even when the “same” model name appears in the UI.

LLMs in agentic systems · Engineering code review case study · Local and open-weight models · AI platforms and tools · Categories