LLM Tiers
Last updated April 7, 2026
Small / Medium / Large — how plugins pick LLMs without binding to specific models.
The SDK's LLM surface is tier-based. Your plugin doesn't say "use gpt-4o-mini" — it says "this is a simple task" by asking for the small tier. The user's kb.config.json decides which model that tier maps to. The same plugin can run against OpenAI, Anthropic, or a local model without changing a line of code.
This page covers the three tiers, how they're configured, what happens on mismatch, and the immutability guarantees that matter in concurrent code.
The three tiers
type LLMTier = 'small' | 'medium' | 'large';small— the plugin is saying: this task is simple, doesn't need much thought. Good for short classification, simple extraction, yes/no decisions, parameter filling.medium— standard task, typical reasoning. Good for summarization, moderate code generation, conversational flows, single-step agents.large— complex task, needs maximum quality. Good for deep reasoning, long-horizon planning, multi-step agents, hard coding problems.
Tiers are user-defined slots, not model identifiers. The user decides what small means in their deployment. In one setup it might be gpt-4o-mini; in another it might be claude-haiku; in a third it might be an on-premise Llama. The plugin doesn't care.
Using tiers in handler code
import { useLLM, getLLMTier } from '@kb-labs/sdk';
const llmSmall = useLLM({ tier: 'small' });
const llmMedium = useLLM({ tier: 'medium' });
const llmLarge = useLLM({ tier: 'large' });
// With capability constraints:
const llmForCode = useLLM({ tier: 'medium', capabilities: ['coding'] });
const llmForVision = useLLM({ tier: 'large', capabilities: ['vision', 'reasoning'] });
// Default tier (what the user configured as default):
const llm = useLLM();Every useLLM(...) call returns an ILLM instance bound to the resolved tier. Call complete, stream, or chatWithTools on it — the underlying adapter handles the actual model selection.
Configuring tiers
Tiers are mapped to models in kb.config.json under platform.adapterOptions.llm.tierMapping:
{
"platform": {
"adapters": {
"llm": [
"@kb-labs/adapters-openai",
"@kb-labs/adapters-vibeproxy"
]
},
"adapterOptions": {
"llm": {
"defaultTier": "small",
"tierMapping": {
"small": [
{
"adapter": "@kb-labs/adapters-openai",
"model": "gpt-4o-mini",
"priority": 1,
"capabilities": ["fast"]
}
],
"medium": [
{
"adapter": "@kb-labs/adapters-vibeproxy",
"model": "claude-sonnet-4-6",
"priority": 1,
"capabilities": ["coding", "reasoning", "vision"]
},
{
"adapter": "@kb-labs/adapters-vibeproxy",
"model": "gpt-5-codex",
"priority": 2,
"capabilities": ["coding"]
}
],
"large": [
{
"adapter": "@kb-labs/adapters-vibeproxy",
"model": "gpt-5.1-codex-max",
"priority": 1,
"capabilities": ["reasoning", "coding"]
},
{
"adapter": "@kb-labs/adapters-vibeproxy",
"model": "claude-opus-4-6",
"priority": 2,
"capabilities": ["reasoning", "coding", "vision"]
}
]
}
}
}
}
}Each tier holds a list of TierModelEntry objects:
interface TierModelEntry {
adapter?: string; // which adapter to use; defaults to primary
model: string; // vendor-specific model ID
priority: number; // lower = higher priority
capabilities?: LLMCapability[]; // what this model can do
}When a plugin asks for a tier, the router picks the highest-priority entry in that tier that satisfies any capabilities constraint the plugin passed. If the plugin doesn't specify capabilities, the first entry by priority wins.
See Configuration → kb.config.json for the full schema.
Capabilities
type LLMCapability = 'reasoning' | 'coding' | 'vision' | 'fast';reasoning— complex reasoning chains, multi-step thinking.coding— code generation, understanding, review.vision— image input support.fast— low latency, real-time responses.
A plugin declares what the task requires:
const llm = useLLM({
tier: 'medium',
capabilities: ['coding'],
});The router filters the tier's entries by required capabilities (all must be present) and picks the highest-priority match. If nothing in the tier matches, it escalates (see below).
Capabilities are advisory — users can mis-tag models in config, and the router trusts the declared tags. Don't use capabilities as a hard filter for correctness; use them as a hint about intent.
Mismatch behavior
What happens when the user's configured tiers don't match the plugin's request?
Escalation: plugin asks for small, only medium is configured
Silent upgrade. The router picks medium and uses it, no warning. Rationale: a plugin asking for small is asking for the cheapest thing that works — anything better also works, just more expensively.
Degradation: plugin asks for large, only medium is configured
Picks medium, emits a warning through the platform logger. Plugin still gets a working LLM, but the log line lets operators know the configured setup can't fully satisfy the request.
Capability filter empty
Plugin asks for { tier: 'medium', capabilities: ['vision'] }, nothing in medium has vision. Router escalates to large and retries; if nothing in large has vision either, the request falls back to whichever entry has the highest priority overall, with a warning.
The full resolution algorithm lives in the ILLMRouter implementation in @kb-labs/core-platform — see llm-types.ts for the contract types.
getLLMTier()
Returns the tier that the configured LLM would resolve to by default:
import { getLLMTier } from '@kb-labs/sdk';
const defaultTier = getLLMTier(); // 'small' | 'medium' | 'large' | undefinedundefined when no LLM is configured. Use it for diagnostic output or for deciding branch paths based on what the user has set up — e.g., "if they have large as default, they probably want long-form output; if small, short snippets".
LazyBoundLLM and the immutability guarantee
This is an advanced topic. Most plugin code doesn't need to think about it.
useLLM({ tier: 'large' }) doesn't hand you the underlying adapter directly. It hands you a LazyBoundLLM — a lazy wrapper that resolves the tier on first use and caches the binding. Each useLLM() call returns a new, independent LazyBoundLLM instance.
Why this matters: in concurrent code, if two handlers call useLLM({ tier: 'small' }) and useLLM({ tier: 'large' }) simultaneously, both should get the tier they asked for. If the underlying router held mutable state (one global "currently selected tier"), one call would clobber the other. The lazy binding ensures each call is independent.
In practice:
// In handler A:
const llmA = useLLM({ tier: 'small' }); // lazy binding for small
await llmA.complete('quick task'); // resolves to small here
// Concurrent in handler B:
const llmB = useLLM({ tier: 'large' }); // independent lazy binding for large
await llmB.complete('hard task'); // resolves to large hereNeither call affects the other. The router resolves each LazyBoundLLM the first time its complete / stream / chatWithTools method is called, then caches the result for subsequent calls on the same instance.
This was a real issue before the lazy binding was introduced — it's called out in the source as the reason LazyBoundLLM exists. Don't try to "optimize" by caching useLLM(...) results at module level — let each handler call useLLM(...) fresh, and the lazy binding does the right thing.
Picking a tier for a task
Rules of thumb:
- Extract a field from user input? →
small. - Classify into a known set? →
small. - Summarize a few paragraphs? →
smallormedium. - Write a commit message for a diff? →
medium. - Review a PR for bugs? →
mediumorlarge. - Plan a multi-step refactor? →
large. - Debug a hard production issue? →
large. - Anything involving an agent loop? →
mediumorlarge, depending on how many turns.
Start with small, measure quality, escalate if needed. large is expensive; don't reach for it out of habit.
Testing with tiers
The mockLLM() builder from @kb-labs/sdk/testing ignores tiers — it returns whatever you scripted for the method. For testing tier-specific logic, write a custom fake:
import { mockLLM } from '@kb-labs/sdk/testing';
const smallLLM = mockLLM().onAnyComplete().respondWith('short');
const largeLLM = mockLLM().onAnyComplete().respondWith('detailed');
// In test setup, override useLLM to return different mocks by tier:
vi.mock('@kb-labs/sdk', async () => {
const actual = await vi.importActual('@kb-labs/sdk');
return {
...actual,
useLLM: (opts?: { tier?: string }) =>
opts?.tier === 'large' ? largeLLM : smallLLM,
};
});For most tests, a single mock is enough — tier behavior is the router's responsibility and doesn't need to be re-tested inside every plugin's tests.
What to read next
- SDK → Hooks → useLLM — the full
useLLMAPI including options and return types. - Configuration → kb.config.json → LLM options — the full
tierMappingschema. - Adapters → LLM — the
ILLMadapter interface and its capabilities. - SDK → Testing —
mockLLM()for test scenarios.