Testing

Last updated April 7, 2026

Unit-testing plugin handlers with createTestContext and the mock builders.

@kb-labs/sdk/testing is a separate subpath that exports a full set of test utilities: a test-context factory, per-service mock builders (LLM, cache, storage, logger, tool), and a command test runner. It's designed for writing fast, deterministic unit tests for plugin handlers without booting the full platform.

Source: platform/kb-labs-sdk/packages/sdk/src/testing/index.ts. The underlying utilities come from @kb-labs/shared-testing and @kb-labs/shared-tool-kit/testing — both re-exported through the SDK so plugin tests only ever import from @kb-labs/sdk/testing.

The import

TypeScript

import {
  createTestContext,
  mockLLM,
  mockCache,
  mockStorage,
  mockLogger,
  testCommand,
  setupTestPlatform,
} from '@kb-labs/sdk/testing';

All test utilities come from this one subpath. Don't reach into @kb-labs/shared-testing directly — the SDK is the stable surface.

`createTestContext`

Builds a PluginContextV3 suitable for passing as the first argument to your handler's execute:

TypeScript

const { ctx, cleanup } = createTestContext({
  host: 'cli',                          // default
  platform: {
    llm: mockLLM(),
    cache: mockCache(),
  },
});
 
await handler.execute(ctx, { flags: { ... }, argv: [] });
 
cleanup();                              // reset the platform singleton

Options

TypeScript

interface CreateTestContextOptions {
  host?: HostType;                      // 'cli' | 'rest' | 'workflow' | 'webhook' | 'ws'
  platform?: Partial<PlatformServices>; // override individual adapters
  runtime?: Partial<RuntimeAPI>;        // override the sandboxed runtime
  api?: Partial<PluginAPI>;             // override plugin APIs
  ui?: Partial<UIFacade>;               // override the UI facade
  cwd?: string;
  outdir?: string;
  tenantId?: string;
  config?: unknown;                     // set ctx.config
  hostContext?: HostContext;            // host-specific context
  signal?: AbortSignal;
}

Everything is optional. Defaults are sane for most unit tests — host: 'cli', sensible tenant/cwd, fresh mock instances where adapters are omitted.

Return value

TypeScript

interface TestContextResult {
  ctx: PluginContextV3;
  cleanup: () => void;
  platform: PlatformServices;
  runtime: RuntimeAPI;
  api: PluginAPI;
  ui: UIFacade;
}

cleanup() resets the platform singleton — important because useLLM(), useCache(), etc. read from a module-scoped singleton, and without cleanup, state leaks between tests. Call it in afterEach:

TypeScript

import { describe, it, expect, afterEach } from 'vitest';
import { createTestContext, mockLLM } from '@kb-labs/sdk/testing';
 
describe('my handler', () => {
  let cleanup: () => void;
 
  afterEach(() => cleanup?.());
 
  it('uses the LLM', async () => {
    const llm = mockLLM().onAnyComplete().respondWith('hi');
    const test = createTestContext({ platform: { llm } });
    cleanup = test.cleanup;
 
    await handler.execute(test.ctx, { flags: {}, argv: [] });
 
    expect(llm.complete).toHaveBeenCalled();
  });
});

Both ctx.platform.llm and useLLM() return the same mock after createTestContext — the hooks read from the platform singleton, which createTestContext swaps out. This is the key insight that makes hook-based handlers testable.

`mockLLM`

Builder for an ILLM mock with a fluent API for scripting responses:

TypeScript

const llm = mockLLM()
  .onAnyComplete()
  .respondWith('generic answer');
 
const llmWithSpecific = mockLLM()
  .onComplete('What is 2+2?').respondWith('4')
  .onComplete('What is 3+3?').respondWith('6')
  .onAnyComplete().respondWith('Unknown');
 
const llmWithToolCalls = mockLLM()
  .onAnyChatWithTools()
  .respondWith({
    content: 'Calling search',
    toolCalls: [
      { id: '1', name: 'search', input: { query: 'hello' } },
    ],
  });

The recorded surface

Every call is recorded on llm.calls:

TypeScript

interface LLMCall {
  method: 'complete' | 'stream' | 'chatWithTools';
  prompt?: string;
  messages?: LLMMessage[];
  options?: LLMOptions;
  response: LLMResponse | LLMToolCallResponse;
  timestamp: number;
}
 
// Inspect in assertions:
expect(llm.calls).toHaveLength(1);
expect(llm.calls[0].method).toBe('complete');
expect(llm.calls[0].prompt).toContain('commit');

Mock instance methods

Every ICache/ILLM/IStorage method is wrapped in a Vitest-style spy. You can assert on calls, reset the mock, and use it like a real adapter from handler code:

TypeScript

interface MockLLMInstance {
  complete: Mock;
  stream: Mock;
  chatWithTools: Mock;
 
  calls: LLMCall[];
 
  onComplete(promptMatch: string | RegExp): ResponseBuilder;
  onAnyComplete(): ResponseBuilder;
  onStream(promptMatch: string | RegExp): ResponseBuilder;
  onAnyStream(): ResponseBuilder;
  onChatWithTools(matcher: (messages: LLMMessage[]) => boolean): ResponseBuilder;
  onAnyChatWithTools(): ResponseBuilder;
 
  reset(): void;
}

The respondWith builder takes either a literal response (string for complete, LLMResponse for full control) or a function (prompt, options) => LLMResponse for dynamic responses.

`mockCache`, `mockStorage`, `mockLogger`

Same shape as mockLLM — fluent builders plus recorded calls:

TypeScript

const cache = mockCache();
// Every ICache method is spied; state is stored in memory
 
const storage = mockStorage();
// IStorage methods; internally uses a Map for file state
 
const logger = mockLogger();
// ILogger methods; log entries accumulate on logger.entries

Cache

TypeScript

const cache = mockCache();
const test = createTestContext({ platform: { cache } });
 
await handler.execute(test.ctx, input);
 
expect(cache.set).toHaveBeenCalledWith('key', expect.any(Object), 60_000);
expect(await cache.get('key')).toBeDefined();

The mock cache is a real in-memory KV store — get returns what you set, TTLs are stored, sorted sets and atomic operations work.

Storage

TypeScript

const storage = mockStorage();
const test = createTestContext({ platform: { storage } });
 
await handler.execute(test.ctx, input);
 
expect(await storage.read('reports/summary.md')).not.toBeNull();
expect(storage.write).toHaveBeenCalled();

Logger

TypeScript

const logger = mockLogger();
const test = createTestContext({ platform: { logger } });
 
await handler.execute(test.ctx, input);
 
expect(logger.entries).toContainEqual(
  expect.objectContaining({ level: 'info', message: 'task started' }),
);

logger.entries is an array of LogEntry objects — level, message, and metadata fields for every log call.

`mockTool`

From @kb-labs/shared-tool-kit/testing, re-exported through the SDK. Builder for mocking LLM tool-call responses when testing tool-use flows:

TypeScript

import { mockTool } from '@kb-labs/sdk/testing';
 
const searchTool = mockTool('search')
  .withInput({ query: 'hello' })
  .returning({ results: [{ id: 1, title: 'Hello World' }] });

Use it when your handler uses native tool calls and you want to control what the tools return without spinning up real implementations.

`testCommand`

A higher-level test runner for command handlers. Wraps createTestContext with a command-specific flow:

TypeScript

import { testCommand } from '@kb-labs/sdk/testing';
import greetHandler from '../src/cli/commands/greet';
 
describe('greet command', () => {
  it('greets with a name', async () => {
    const result = await testCommand(greetHandler, {
      flags: { name: 'Alice' },
    });
 
    expect(result.exitCode).toBe(0);
    expect(result.result).toEqual({
      greeting: 'Hello, Alice!',
      source: 'deterministic',
    });
  });
 
  it('uses the LLM when --ai is passed', async () => {
    const llm = mockLLM().onAnyComplete().respondWith('Hi Alice!');
    const result = await testCommand(greetHandler, {
      flags: { name: 'Alice', ai: true },
      platform: { llm },
    });
 
    expect(result.result?.source).toBe('llm');
    expect(llm.complete).toHaveBeenCalled();
  });
});

Options

TypeScript

interface TestCommandOptions extends CreateTestContextOptions {
  flags?: Record<string, unknown>;
  argv?: string[];
  input?: unknown;                   // override the full input for non-CLI commands
}

Return value

TypeScript

interface TestCommandResult {
  exitCode: number;
  result?: unknown;
  meta?: Record<string, unknown>;
  error?: { code?: string; message: string };
  ctx: PluginContextV3;             // the context used for the call
  cleanup: () => void;
}

testCommand runs the handler, collects the result, and returns it alongside the context for further assertions. The context is still alive — call cleanup() when done (typically in afterEach).

`setupTestPlatform`

Lower-level helper that sets up the platform singleton without building a full context. Use it when you want to test code that calls hooks directly but isn't a handler:

TypeScript

import { setupTestPlatform, mockLLM } from '@kb-labs/sdk/testing';
import { useLLM } from '@kb-labs/sdk';
 
describe('pure function using useLLM', () => {
  let cleanup: () => void;
  afterEach(() => cleanup?.());
 
  it('calls useLLM correctly', async () => {
    const llm = mockLLM().onAnyComplete().respondWith('result');
    const setup = setupTestPlatform({ llm });
    cleanup = setup.cleanup;
 
    const result = await someUtilityThatCallsUseLLM();
 
    expect(llm.complete).toHaveBeenCalled();
    expect(result).toBe('result');
  });
});

It's the same mechanism createTestContext uses under the hood — swap out the platform singleton, return a cleanup function. Prefer createTestContext when you're testing handlers; use setupTestPlatform for helper functions that don't take a context.

Testing patterns

Unit-testing a handler with flag inputs

TypeScript

import { describe, it, expect, afterEach } from 'vitest';
import { testCommand, mockLLM } from '@kb-labs/sdk/testing';
import commitHandler from '../src/cli/commands/commit';
 
describe('commit command', () => {
  let cleanup: () => void;
  afterEach(() => cleanup?.());
 
  it('dry-runs without applying', async () => {
    const result = await testCommand(commitHandler, {
      flags: { 'dry-run': true },
      platform: {
        llm: mockLLM().onAnyComplete().respondWith('feat: add feature'),
      },
    });
    cleanup = result.cleanup;
 
    expect(result.exitCode).toBe(0);
    expect(result.meta?.dryRun).toBe(true);
  });
});

Asserting on platform calls

TypeScript

it('caches results', async () => {
  const cache = mockCache();
  const result = await testCommand(handler, {
    flags: { query: 'test' },
    platform: { cache },
  });
  cleanup = result.cleanup;
 
  // First call should miss and set
  expect(cache.get).toHaveBeenCalledWith('query:test');
  expect(cache.set).toHaveBeenCalledWith(
    'query:test',
    expect.any(Object),
    60_000,
  );
});

Testing cancellation

TypeScript

it('handles abort signal', async () => {
  const controller = new AbortController();
  const testPromise = testCommand(longRunningHandler, {
    flags: {},
    signal: controller.signal,
  });
 
  setTimeout(() => controller.abort(), 10);
  const result = await testPromise;
  cleanup = result.cleanup;
 
  expect(result.exitCode).toBe(1);
  expect(result.error?.code).toBe('CANCELLED');
});

Testing REST handlers

TypeScript

import { createTestContext } from '@kb-labs/sdk/testing';
import generateHandler from '../src/rest/handlers/generate';
 
it('returns NO_CHANGES for clean working tree', async () => {
  const test = createTestContext({ host: 'rest' });
  const result = await generateHandler.execute(test.ctx, {
    scope: undefined,
    dryRun: false,
  });
 
  expect(result?.exitCode).toBe(1);
  expect(result?.error?.code).toBe('NO_CHANGES');
  test.cleanup();
});

REST handlers are tested the same way as CLI handlers — different host value, different input shape, same context factory.

Things that are NOT in testing

Integration tests against a real platform. For end-to-end tests, spin up a test instance of the REST API or workflow daemon with an in-memory adapter stack. The mocks here are for unit tests.
LLM response snapshotting. If you want to test against real LLM output and snapshot it, use Vitest's snapshot support directly, not the SDK's mocks.
Permission enforcement. The mocked runtime bypasses permission checks. For permission testing, write integration tests that go through the real sandbox.

Testing

The import

createTestContext

Options

Return value

mockLLM

The recorded surface

Mock instance methods

mockCache, mockStorage, mockLogger

Cache

Storage

Logger

mockTool

testCommand

Options

Return value

setupTestPlatform

Testing patterns

Unit-testing a handler with flag inputs

Asserting on platform calls

Testing cancellation

Testing REST handlers

Things that are NOT in testing

What to read next

`createTestContext`

`mockLLM`

`mockCache`, `mockStorage`, `mockLogger`

`mockTool`

`testCommand`

`setupTestPlatform`