Testing Plugins
Last updated April 7, 2026
Unit tests, integration tests, and mocking platform services in plugin code.
Plugins are regular TypeScript packages — they're tested with whatever test runner you already use (Vitest is the convention in the KB Labs monorepo). The SDK ships @kb-labs/sdk/testing with helpers for constructing test contexts and mocking platform services, so handler code is testable without booting the full platform.
This guide covers three layers: unit tests for pure logic, unit tests for handler logic with mocked platform services, and integration tests against a real platform instance.
For the full reference of test utilities, see SDK → Testing. This page is the practical workflow.
The test layers
┌─────────────────────────────────────────┐
│ Integration tests │
│ (real platform, real adapters, slow) │
├─────────────────────────────────────────┤
│ Handler unit tests │
│ (createTestContext + mocks, fast) │
├─────────────────────────────────────────┤
│ Pure logic unit tests │
│ (no platform at all, instant) │
└─────────────────────────────────────────┘Most of your tests should be at the bottom two layers. Integration tests are slow and brittle — save them for the critical paths.
Setup
Add Vitest to your plugin package:
{
"devDependencies": {
"vitest": "^3.0.0",
"@vitest/ui": "^3.0.0"
},
"scripts": {
"test": "vitest run",
"test:watch": "vitest",
"test:ui": "vitest --ui"
}
}vitest.config.ts:
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
environment: 'node',
globals: false,
include: ['src/**/*.test.ts'],
},
});Keep tests next to the code they test: src/cli/commands/hello.ts → src/cli/commands/hello.test.ts.
Layer 1 — Pure logic tests
If your handler can be decomposed into pure functions, test those directly. No platform, no context, no mocks.
// src/lib/parse-scope.ts
export function parseScope(raw: string): string[] {
return raw.split(',').map(s => s.trim()).filter(Boolean);
}
// src/lib/parse-scope.test.ts
import { describe, it, expect } from 'vitest';
import { parseScope } from './parse-scope';
describe('parseScope', () => {
it('splits comma-separated values', () => {
expect(parseScope('a,b,c')).toEqual(['a', 'b', 'c']);
});
it('trims whitespace', () => {
expect(parseScope('a , b , c')).toEqual(['a', 'b', 'c']);
});
it('drops empty entries', () => {
expect(parseScope('a,,b, ,c')).toEqual(['a', 'b', 'c']);
});
});Pure tests run in milliseconds and have no side effects. If a function can be tested this way, it should be.
Layer 2 — Handler unit tests
Handlers call platform services through hooks. You can't test them as pure functions because they need a context and a platform singleton. The SDK's createTestContext + mock builders swap both out with fakes.
// src/cli/commands/hello.test.ts
import { describe, it, expect, afterEach } from 'vitest';
import { testCommand, mockLLM } from '@kb-labs/sdk/testing';
import helloHandler from './hello';
describe('hello:greet', () => {
let cleanup: () => void;
afterEach(() => cleanup?.());
it('returns a canned greeting without --ai', async () => {
const result = await testCommand(helloHandler, {
flags: { name: 'Alice' },
});
cleanup = result.cleanup;
expect(result.exitCode).toBe(0);
expect(result.result).toEqual({
greeting: 'Hello, Alice!',
source: 'deterministic',
});
});
it('uses LLM when --ai is passed', async () => {
const llm = mockLLM()
.onAnyComplete()
.respondWith('Hi Alice, great to see you!');
const result = await testCommand(helloHandler, {
flags: { name: 'Alice', ai: true },
platform: { llm },
});
cleanup = result.cleanup;
expect(result.exitCode).toBe(0);
expect(result.result?.source).toBe('llm');
expect(result.result?.greeting).toContain('Alice');
expect(llm.complete).toHaveBeenCalled();
});
it('falls back to canned greeting when LLM is unavailable and --ai is passed', async () => {
// No llm in the platform override — useLLM() returns undefined
const result = await testCommand(helloHandler, {
flags: { name: 'Alice', ai: true },
});
cleanup = result.cleanup;
expect(result.exitCode).toBe(0);
expect(result.result?.source).toBe('deterministic');
});
});Three things to internalize:
testCommandwraps your handler in a test context and runs it.platform.llmin options becomes whatuseLLM()returns. You control the mock; the handler sees it as the real thing.cleanup()inafterEachresets the platform singleton between tests. Without it, state leaks between tests.
See SDK → Testing for the full API of testCommand, createTestContext, and every mock builder.
Mocking the cache
import { testCommand, mockCache } from '@kb-labs/sdk/testing';
it('caches results', async () => {
const cache = mockCache();
const result = await testCommand(handler, {
flags: { query: 'test' },
platform: { cache },
});
cleanup = result.cleanup;
// First call should miss and set
expect(cache.get).toHaveBeenCalledWith('query:test');
expect(cache.set).toHaveBeenCalledWith(
'query:test',
expect.any(Object),
60_000,
);
// Value persists in the mock's in-memory store
expect(await cache.get('query:test')).toBeDefined();
});mockCache() is a real in-memory ICache implementation. get/set work as expected; the spies let you assert on call counts and arguments.
Mocking storage
import { mockStorage } from '@kb-labs/sdk/testing';
it('writes a report', async () => {
const storage = mockStorage();
const result = await testCommand(handler, {
flags: { output: 'reports/summary.md' },
platform: { storage },
});
cleanup = result.cleanup;
expect(storage.write).toHaveBeenCalled();
expect(await storage.read('reports/summary.md')).not.toBeNull();
});Mocking tool calls
For handlers that use LLM tool-calling:
import { mockLLM, mockTool } from '@kb-labs/sdk/testing';
it('handles tool calls', async () => {
const searchTool = mockTool('search')
.withInput({ query: 'hello' })
.returning({ results: [{ id: 1 }] });
const llm = mockLLM()
.onAnyChatWithTools()
.respondWith({
content: 'Calling search',
toolCalls: [{ id: '1', name: 'search', input: { query: 'hello' } }],
});
const result = await testCommand(handler, {
flags: { query: 'hello' },
platform: { llm },
});
cleanup = result.cleanup;
expect(llm.chatWithTools).toHaveBeenCalled();
});Layer 3 — Integration tests
When you need to test real adapter behavior — actually talking to Qdrant, actually calling OpenAI, actually writing to SQLite — spin up a real platform instance.
Option A — Test workspace with kb.config.json
Create a test-specific workspace:
tests/fixtures/workspace/
├── .kb/
│ └── kb.config.json # test config with in-memory adapters
└── packages/
└── your-plugin/ # linked via marketplaceTest config:
{
"platform": {
"adapters": {
"llm": null,
"cache": null,
"storage": "@kb-labs/adapters-fs"
},
"adapterOptions": {
"storage": { "basePath": ".kb/test-storage" }
},
"execution": { "mode": "in-process" }
}
}null explicitly installs the NoOp adapter for tokens you don't want to configure. in-process keeps everything in the test process so you can assert on internal state without IPC.
Run your handler against this workspace from your test file:
import { createServiceBootstrap } from '@kb-labs/core-runtime';
import { join } from 'path';
beforeAll(async () => {
await createServiceBootstrap({
appId: 'test',
repoRoot: join(__dirname, 'fixtures/workspace'),
});
});Option B — Live services + HTTP assertions
For higher-fidelity integration tests, run the REST API as a subprocess and hit it over HTTP:
import { spawn } from 'child_process';
beforeAll(async () => {
const proc = spawn('node', ['path/to/rest-api/dist/index.js'], {
env: { ...process.env, PORT: '15050' },
});
// Wait for /health
for (let i = 0; i < 30; i++) {
try {
await fetch('http://localhost:15050/api/v1/health');
return;
} catch {
await new Promise(r => setTimeout(r, 200));
}
}
throw new Error('REST API did not start');
});
it('calls the plugin over HTTP', async () => {
const response = await fetch('http://localhost:15050/api/v1/plugins/hello/greet', {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer dev-token' },
body: JSON.stringify({ name: 'Alice' }),
});
expect(response.status).toBe(200);
const data = await response.json();
expect(data.greeting).toBeDefined();
});Integration tests are slow (service startup takes seconds) but give you confidence that the full plugin → REST → platform → adapter chain works end-to-end.
What NOT to test
- Don't test that the SDK works.
defineCommandhas its own tests; you don't need to verify that it wraps your handler correctly. - Don't test the hooks.
useLLM/useCache/useStoragehave their own tests. Mock them and assert on your handler's behavior with the mock's return value. - Don't test Ant Design components. If you're writing Studio page tests, mock the hooks and assert on the data flow, not on the rendered HTML.
- Don't test the host guard.
defineCommandthrows if called from the wrong host — that's tested in the SDK. You don't need to re-verify it.
Running tests
pnpm test # one-shot
pnpm test:watch # watch mode
pnpm test:ui # Vitest UIIn CI:
- run: pnpm install --frozen-lockfile
- run: pnpm build
- run: pnpm test
- run: pnpm type-check
- run: pnpm lintTest organization
Structure I recommend for plugin repos:
src/
├── cli/
│ └── commands/
│ ├── hello.ts
│ └── hello.test.ts ← handler unit tests
├── lib/
│ ├── parse-scope.ts
│ └── parse-scope.test.ts ← pure logic unit tests
└── rest/
└── handlers/
├── greet.ts
└── greet.test.ts ← REST handler unit tests
tests/
├── integration/
│ ├── fixtures/
│ │ └── workspace/
│ └── greet.test.ts ← integration tests
└── e2e/
└── smoke.test.ts ← optional smoke tests against real envUnit tests live next to the code. Integration tests live in a separate directory because they share fixtures and have different lifecycle (longer startup, network deps).
What to read next
- SDK → Testing — complete API reference for
createTestContext,testCommand, and every mock builder. - SDK → Handler Context — the shape of
ctxthatcreateTestContextconstructs. - SDK → Hooks — the hooks that the mock builders swap out.