KB LabsDocs

Testing Plugins

Last updated April 7, 2026


Unit tests, integration tests, and mocking platform services in plugin code.

Plugins are regular TypeScript packages — they're tested with whatever test runner you already use (Vitest is the convention in the KB Labs monorepo). The SDK ships @kb-labs/sdk/testing with helpers for constructing test contexts and mocking platform services, so handler code is testable without booting the full platform.

This guide covers three layers: unit tests for pure logic, unit tests for handler logic with mocked platform services, and integration tests against a real platform instance.

For the full reference of test utilities, see SDK → Testing. This page is the practical workflow.

The test layers

┌─────────────────────────────────────────┐
│  Integration tests                      │
│  (real platform, real adapters, slow)   │
├─────────────────────────────────────────┤
│  Handler unit tests                     │
│  (createTestContext + mocks, fast)      │
├─────────────────────────────────────────┤
│  Pure logic unit tests                  │
│  (no platform at all, instant)          │
└─────────────────────────────────────────┘

Most of your tests should be at the bottom two layers. Integration tests are slow and brittle — save them for the critical paths.

Setup

Add Vitest to your plugin package:

JSON
{
  "devDependencies": {
    "vitest": "^3.0.0",
    "@vitest/ui": "^3.0.0"
  },
  "scripts": {
    "test": "vitest run",
    "test:watch": "vitest",
    "test:ui": "vitest --ui"
  }
}

vitest.config.ts:

TypeScript
import { defineConfig } from 'vitest/config';
 
export default defineConfig({
  test: {
    environment: 'node',
    globals: false,
    include: ['src/**/*.test.ts'],
  },
});

Keep tests next to the code they test: src/cli/commands/hello.tssrc/cli/commands/hello.test.ts.

Layer 1 — Pure logic tests

If your handler can be decomposed into pure functions, test those directly. No platform, no context, no mocks.

TypeScript
// src/lib/parse-scope.ts
export function parseScope(raw: string): string[] {
  return raw.split(',').map(s => s.trim()).filter(Boolean);
}
 
// src/lib/parse-scope.test.ts
import { describe, it, expect } from 'vitest';
import { parseScope } from './parse-scope';
 
describe('parseScope', () => {
  it('splits comma-separated values', () => {
    expect(parseScope('a,b,c')).toEqual(['a', 'b', 'c']);
  });
 
  it('trims whitespace', () => {
    expect(parseScope('a , b , c')).toEqual(['a', 'b', 'c']);
  });
 
  it('drops empty entries', () => {
    expect(parseScope('a,,b, ,c')).toEqual(['a', 'b', 'c']);
  });
});

Pure tests run in milliseconds and have no side effects. If a function can be tested this way, it should be.

Layer 2 — Handler unit tests

Handlers call platform services through hooks. You can't test them as pure functions because they need a context and a platform singleton. The SDK's createTestContext + mock builders swap both out with fakes.

TypeScript
// src/cli/commands/hello.test.ts
import { describe, it, expect, afterEach } from 'vitest';
import { testCommand, mockLLM } from '@kb-labs/sdk/testing';
import helloHandler from './hello';
 
describe('hello:greet', () => {
  let cleanup: () => void;
 
  afterEach(() => cleanup?.());
 
  it('returns a canned greeting without --ai', async () => {
    const result = await testCommand(helloHandler, {
      flags: { name: 'Alice' },
    });
    cleanup = result.cleanup;
 
    expect(result.exitCode).toBe(0);
    expect(result.result).toEqual({
      greeting: 'Hello, Alice!',
      source: 'deterministic',
    });
  });
 
  it('uses LLM when --ai is passed', async () => {
    const llm = mockLLM()
      .onAnyComplete()
      .respondWith('Hi Alice, great to see you!');
 
    const result = await testCommand(helloHandler, {
      flags: { name: 'Alice', ai: true },
      platform: { llm },
    });
    cleanup = result.cleanup;
 
    expect(result.exitCode).toBe(0);
    expect(result.result?.source).toBe('llm');
    expect(result.result?.greeting).toContain('Alice');
    expect(llm.complete).toHaveBeenCalled();
  });
 
  it('falls back to canned greeting when LLM is unavailable and --ai is passed', async () => {
    // No llm in the platform override — useLLM() returns undefined
    const result = await testCommand(helloHandler, {
      flags: { name: 'Alice', ai: true },
    });
    cleanup = result.cleanup;
 
    expect(result.exitCode).toBe(0);
    expect(result.result?.source).toBe('deterministic');
  });
});

Three things to internalize:

  1. testCommand wraps your handler in a test context and runs it.
  2. platform.llm in options becomes what useLLM() returns. You control the mock; the handler sees it as the real thing.
  3. cleanup() in afterEach resets the platform singleton between tests. Without it, state leaks between tests.

See SDK → Testing for the full API of testCommand, createTestContext, and every mock builder.

Mocking the cache

TypeScript
import { testCommand, mockCache } from '@kb-labs/sdk/testing';
 
it('caches results', async () => {
  const cache = mockCache();
 
  const result = await testCommand(handler, {
    flags: { query: 'test' },
    platform: { cache },
  });
  cleanup = result.cleanup;
 
  // First call should miss and set
  expect(cache.get).toHaveBeenCalledWith('query:test');
  expect(cache.set).toHaveBeenCalledWith(
    'query:test',
    expect.any(Object),
    60_000,
  );
 
  // Value persists in the mock's in-memory store
  expect(await cache.get('query:test')).toBeDefined();
});

mockCache() is a real in-memory ICache implementation. get/set work as expected; the spies let you assert on call counts and arguments.

Mocking storage

TypeScript
import { mockStorage } from '@kb-labs/sdk/testing';
 
it('writes a report', async () => {
  const storage = mockStorage();
  const result = await testCommand(handler, {
    flags: { output: 'reports/summary.md' },
    platform: { storage },
  });
  cleanup = result.cleanup;
 
  expect(storage.write).toHaveBeenCalled();
  expect(await storage.read('reports/summary.md')).not.toBeNull();
});

Mocking tool calls

For handlers that use LLM tool-calling:

TypeScript
import { mockLLM, mockTool } from '@kb-labs/sdk/testing';
 
it('handles tool calls', async () => {
  const searchTool = mockTool('search')
    .withInput({ query: 'hello' })
    .returning({ results: [{ id: 1 }] });
 
  const llm = mockLLM()
    .onAnyChatWithTools()
    .respondWith({
      content: 'Calling search',
      toolCalls: [{ id: '1', name: 'search', input: { query: 'hello' } }],
    });
 
  const result = await testCommand(handler, {
    flags: { query: 'hello' },
    platform: { llm },
  });
  cleanup = result.cleanup;
 
  expect(llm.chatWithTools).toHaveBeenCalled();
});

Layer 3 — Integration tests

When you need to test real adapter behavior — actually talking to Qdrant, actually calling OpenAI, actually writing to SQLite — spin up a real platform instance.

Option A — Test workspace with kb.config.json

Create a test-specific workspace:

tests/fixtures/workspace/
├── .kb/
│   └── kb.config.json       # test config with in-memory adapters
└── packages/
    └── your-plugin/         # linked via marketplace

Test config:

JSON
{
  "platform": {
    "adapters": {
      "llm": null,
      "cache": null,
      "storage": "@kb-labs/adapters-fs"
    },
    "adapterOptions": {
      "storage": { "basePath": ".kb/test-storage" }
    },
    "execution": { "mode": "in-process" }
  }
}

null explicitly installs the NoOp adapter for tokens you don't want to configure. in-process keeps everything in the test process so you can assert on internal state without IPC.

Run your handler against this workspace from your test file:

TypeScript
import { createServiceBootstrap } from '@kb-labs/core-runtime';
import { join } from 'path';
 
beforeAll(async () => {
  await createServiceBootstrap({
    appId: 'test',
    repoRoot: join(__dirname, 'fixtures/workspace'),
  });
});

Option B — Live services + HTTP assertions

For higher-fidelity integration tests, run the REST API as a subprocess and hit it over HTTP:

TypeScript
import { spawn } from 'child_process';
 
beforeAll(async () => {
  const proc = spawn('node', ['path/to/rest-api/dist/index.js'], {
    env: { ...process.env, PORT: '15050' },
  });
 
  // Wait for /health
  for (let i = 0; i < 30; i++) {
    try {
      await fetch('http://localhost:15050/api/v1/health');
      return;
    } catch {
      await new Promise(r => setTimeout(r, 200));
    }
  }
  throw new Error('REST API did not start');
});
 
it('calls the plugin over HTTP', async () => {
  const response = await fetch('http://localhost:15050/api/v1/plugins/hello/greet', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer dev-token' },
    body: JSON.stringify({ name: 'Alice' }),
  });
 
  expect(response.status).toBe(200);
  const data = await response.json();
  expect(data.greeting).toBeDefined();
});

Integration tests are slow (service startup takes seconds) but give you confidence that the full plugin → REST → platform → adapter chain works end-to-end.

What NOT to test

  • Don't test that the SDK works. defineCommand has its own tests; you don't need to verify that it wraps your handler correctly.
  • Don't test the hooks. useLLM / useCache / useStorage have their own tests. Mock them and assert on your handler's behavior with the mock's return value.
  • Don't test Ant Design components. If you're writing Studio page tests, mock the hooks and assert on the data flow, not on the rendered HTML.
  • Don't test the host guard. defineCommand throws if called from the wrong host — that's tested in the SDK. You don't need to re-verify it.

Running tests

Bash
pnpm test            # one-shot
pnpm test:watch      # watch mode
pnpm test:ui         # Vitest UI

In CI:

YAML
- run: pnpm install --frozen-lockfile
- run: pnpm build
- run: pnpm test
- run: pnpm type-check
- run: pnpm lint

Test organization

Structure I recommend for plugin repos:

src/
├── cli/
│   └── commands/
│       ├── hello.ts
│       └── hello.test.ts             ← handler unit tests
├── lib/
│   ├── parse-scope.ts
│   └── parse-scope.test.ts           ← pure logic unit tests
└── rest/
    └── handlers/
        ├── greet.ts
        └── greet.test.ts             ← REST handler unit tests
tests/
├── integration/
│   ├── fixtures/
│   │   └── workspace/
│   └── greet.test.ts                 ← integration tests
└── e2e/
    └── smoke.test.ts                 ← optional smoke tests against real env

Unit tests live next to the code. Integration tests live in a separate directory because they share fixtures and have different lifecycle (longer startup, network deps).

Testing Plugins — KB Labs Docs