LLM Tiers

Last updated April 7, 2026

Small / Medium / Large — how plugins pick LLMs without binding to specific models.

The SDK's LLM surface is tier-based. Your plugin doesn't say "use gpt-4o-mini" — it says "this is a simple task" by asking for the small tier. The user's kb.config.json decides which model that tier maps to. The same plugin can run against OpenAI, Anthropic, or a local model without changing a line of code.

This page covers the three tiers, how they're configured, what happens on mismatch, and the immutability guarantees that matter in concurrent code.

The three tiers

TypeScript

type LLMTier = 'small' | 'medium' | 'large';

small — the plugin is saying: this task is simple, doesn't need much thought. Good for short classification, simple extraction, yes/no decisions, parameter filling.
medium — standard task, typical reasoning. Good for summarization, moderate code generation, conversational flows, single-step agents.
large — complex task, needs maximum quality. Good for deep reasoning, long-horizon planning, multi-step agents, hard coding problems.

Tiers are user-defined slots, not model identifiers. The user decides what small means in their deployment. In one setup it might be gpt-4o-mini; in another it might be claude-haiku; in a third it might be an on-premise Llama. The plugin doesn't care.

Using tiers in handler code

TypeScript

import { useLLM, getLLMTier } from '@kb-labs/sdk';
 
const llmSmall  = useLLM({ tier: 'small' });
const llmMedium = useLLM({ tier: 'medium' });
const llmLarge  = useLLM({ tier: 'large' });
 
// With capability constraints:
const llmForCode   = useLLM({ tier: 'medium', capabilities: ['coding'] });
const llmForVision = useLLM({ tier: 'large', capabilities: ['vision', 'reasoning'] });
 
// Default tier (what the user configured as default):
const llm = useLLM();

Every useLLM(...) call returns an ILLM instance bound to the resolved tier. Call complete, stream, or chatWithTools on it — the underlying adapter handles the actual model selection.

Configuring tiers

Tiers are mapped to models in kb.config.json under platform.adapterOptions.llm.tierMapping:

JSON

{
  "platform": {
    "adapters": {
      "llm": [
        "@kb-labs/adapters-openai",
        "@kb-labs/adapters-vibeproxy"
      ]
    },
    "adapterOptions": {
      "llm": {
        "defaultTier": "small",
        "tierMapping": {
          "small": [
            {
              "adapter": "@kb-labs/adapters-openai",
              "model": "gpt-4o-mini",
              "priority": 1,
              "capabilities": ["fast"]
            }
          ],
          "medium": [
            {
              "adapter": "@kb-labs/adapters-vibeproxy",
              "model": "claude-sonnet-4-6",
              "priority": 1,
              "capabilities": ["coding", "reasoning", "vision"]
            },
            {
              "adapter": "@kb-labs/adapters-vibeproxy",
              "model": "gpt-5-codex",
              "priority": 2,
              "capabilities": ["coding"]
            }
          ],
          "large": [
            {
              "adapter": "@kb-labs/adapters-vibeproxy",
              "model": "gpt-5.1-codex-max",
              "priority": 1,
              "capabilities": ["reasoning", "coding"]
            },
            {
              "adapter": "@kb-labs/adapters-vibeproxy",
              "model": "claude-opus-4-6",
              "priority": 2,
              "capabilities": ["reasoning", "coding", "vision"]
            }
          ]
        }
      }
    }
  }
}

Each tier holds a list of TierModelEntry objects:

TypeScript

interface TierModelEntry {
  adapter?: string;               // which adapter to use; defaults to primary
  model: string;                  // vendor-specific model ID
  priority: number;               // lower = higher priority
  capabilities?: LLMCapability[]; // what this model can do
}

When a plugin asks for a tier, the router picks the highest-priority entry in that tier that satisfies any capabilities constraint the plugin passed. If the plugin doesn't specify capabilities, the first entry by priority wins.

See Configuration → kb.config.json for the full schema.

Capabilities

TypeScript

type LLMCapability = 'reasoning' | 'coding' | 'vision' | 'fast';

reasoning — complex reasoning chains, multi-step thinking.
coding — code generation, understanding, review.
vision — image input support.
fast — low latency, real-time responses.

A plugin declares what the task requires:

TypeScript

const llm = useLLM({
  tier: 'medium',
  capabilities: ['coding'],
});

The router filters the tier's entries by required capabilities (all must be present) and picks the highest-priority match. If nothing in the tier matches, it escalates (see below).

Capabilities are advisory — users can mis-tag models in config, and the router trusts the declared tags. Don't use capabilities as a hard filter for correctness; use them as a hint about intent.

Mismatch behavior

What happens when the user's configured tiers don't match the plugin's request?

Escalation: plugin asks for `small`, only `medium` is configured

Silent upgrade. The router picks medium and uses it, no warning. Rationale: a plugin asking for small is asking for the cheapest thing that works — anything better also works, just more expensively.

Degradation: plugin asks for `large`, only `medium` is configured

Picks medium, emits a warning through the platform logger. Plugin still gets a working LLM, but the log line lets operators know the configured setup can't fully satisfy the request.

Capability filter empty

Plugin asks for { tier: 'medium', capabilities: ['vision'] }, nothing in medium has vision. Router escalates to large and retries; if nothing in large has vision either, the request falls back to whichever entry has the highest priority overall, with a warning.

The full resolution algorithm lives in the ILLMRouter implementation in @kb-labs/core-platform — see llm-types.ts for the contract types.

`getLLMTier()`

Returns the tier that the configured LLM would resolve to by default:

TypeScript

import { getLLMTier } from '@kb-labs/sdk';
 
const defaultTier = getLLMTier(); // 'small' | 'medium' | 'large' | undefined

undefined when no LLM is configured. Use it for diagnostic output or for deciding branch paths based on what the user has set up — e.g., "if they have large as default, they probably want long-form output; if small, short snippets".

`LazyBoundLLM` and the immutability guarantee

This is an advanced topic. Most plugin code doesn't need to think about it.

useLLM({ tier: 'large' }) doesn't hand you the underlying adapter directly. It hands you a LazyBoundLLM — a lazy wrapper that resolves the tier on first use and caches the binding. Each useLLM() call returns a new, independent LazyBoundLLM instance.

Why this matters: in concurrent code, if two handlers call useLLM({ tier: 'small' }) and useLLM({ tier: 'large' }) simultaneously, both should get the tier they asked for. If the underlying router held mutable state (one global "currently selected tier"), one call would clobber the other. The lazy binding ensures each call is independent.

In practice:

TypeScript

// In handler A:
const llmA = useLLM({ tier: 'small' });    // lazy binding for small
await llmA.complete('quick task');          // resolves to small here
 
// Concurrent in handler B:
const llmB = useLLM({ tier: 'large' });    // independent lazy binding for large
await llmB.complete('hard task');           // resolves to large here

Neither call affects the other. The router resolves each LazyBoundLLM the first time its complete / stream / chatWithTools method is called, then caches the result for subsequent calls on the same instance.

This was a real issue before the lazy binding was introduced — it's called out in the source as the reason LazyBoundLLM exists. Don't try to "optimize" by caching useLLM(...) results at module level — let each handler call useLLM(...) fresh, and the lazy binding does the right thing.

Picking a tier for a task

Rules of thumb:

Extract a field from user input? → small.
Classify into a known set? → small.
Summarize a few paragraphs? → small or medium.
Write a commit message for a diff? → medium.
Review a PR for bugs? → medium or large.
Plan a multi-step refactor? → large.
Debug a hard production issue? → large.
Anything involving an agent loop? → medium or large, depending on how many turns.

Start with small, measure quality, escalate if needed. large is expensive; don't reach for it out of habit.

Testing with tiers

The mockLLM() builder from @kb-labs/sdk/testing ignores tiers — it returns whatever you scripted for the method. For testing tier-specific logic, write a custom fake:

TypeScript

import { mockLLM } from '@kb-labs/sdk/testing';
 
const smallLLM = mockLLM().onAnyComplete().respondWith('short');
const largeLLM = mockLLM().onAnyComplete().respondWith('detailed');
 
// In test setup, override useLLM to return different mocks by tier:
vi.mock('@kb-labs/sdk', async () => {
  const actual = await vi.importActual('@kb-labs/sdk');
  return {
    ...actual,
    useLLM: (opts?: { tier?: string }) =>
      opts?.tier === 'large' ? largeLLM : smallLLM,
  };
});

For most tests, a single mock is enough — tier behavior is the router's responsibility and doesn't need to be re-tested inside every plugin's tests.

LLM Tiers

The three tiers

Using tiers in handler code

Configuring tiers

Capabilities

Mismatch behavior

Escalation: plugin asks for small, only medium is configured

Degradation: plugin asks for large, only medium is configured

Capability filter empty

getLLMTier()

LazyBoundLLM and the immutability guarantee

Picking a tier for a task

Testing with tiers

What to read next

Escalation: plugin asks for `small`, only `medium` is configured

Degradation: plugin asks for `large`, only `medium` is configured

`getLLMTier()`

`LazyBoundLLM` and the immutability guarantee