Record & Replay

VCR-style record-and-replay support. When a request doesn't match any fixture, aimock proxies it to the real upstream provider, records the response as a fixture on disk and in memory, then replays it on subsequent identical requests.

How It Works

  1. Client sends a request to aimock
  2. aimock attempts fixture matching as usual
  3. On miss: the request is forwarded to the configured upstream provider
  4. The upstream response is relayed back to the client immediately
  5. The response is collapsed (if streaming) and saved as a fixture to disk and memory
  6. Subsequent identical requests match the newly recorded fixture

Proxy-Only Mode

Use --proxy-only instead of --record when you want unmatched requests to always reach the real provider — no fixture files are written to disk and no responses are cached in memory. Matched fixtures still work normally.

This is ideal for demos and live environments where you have canned fixtures for repeatable demo scenarios you want to show off, but also want regular interactions to work normally by proxying to the real provider. Without --proxy-only, the first real API call would get recorded and cached, and subsequent identical requests would get the stale recorded response instead of hitting the live provider.

Proxy-only mode shell
$ npx -p @copilotkit/aimock llmock -f ./fixtures \
  --proxy-only \
  --provider-openai https://api.openai.com
Proxy-only mode shell
$ docker run -d -p 4010:4010 \
  -v $(pwd)/fixtures:/fixtures \
  ghcr.io/copilotkit/aimock \
  -f /fixtures -h 0.0.0.0 \
  --proxy-only \
  --provider-openai https://api.openai.com
Mode Unmatched request Writes to disk Caches in memory
--record Proxy → save → replay next time Yes Yes
--proxy-only Proxy → relay → proxy again next time No No

Quick Start

CLI usage shell
$ npx -p @copilotkit/aimock llmock -f ./fixtures \
  --record \
  --provider-openai https://api.openai.com \
  --provider-anthropic https://api.anthropic.com
CLI usage shell
$ docker run -d -p 4010:4010 \
  -v $(pwd)/fixtures:/fixtures \
  ghcr.io/copilotkit/aimock \
  -f /fixtures -h 0.0.0.0 \
  --record \
  --provider-openai https://api.openai.com \
  --provider-anthropic https://api.anthropic.com

Record & Replay CLI Flags

Flag Description
-f, --fixtures <path> Path to fixtures directory or file (default: ./fixtures)
-p, --port <port> Port to listen on (default: 4010)
-h, --host <host> Host to bind to (default: 127.0.0.1)
--record Enable record mode (proxy, save, and cache on miss)
--proxy-only Proxy mode (forward on miss, no saving or caching)
--strict Strict mode: return 503 (not 404) on unmatched requests
-w, --watch Watch fixture path for changes and reload
--log-level <level> Log verbosity: silent, info, debug (default: info)
--validate-on-load Validate fixture schemas at startup
--journal-max <n> Max request entries retained in memory (default: 1000 for both serve and createServer() since v1.14.2; 0 = unbounded; direct new Journal() instantiation still defaults to unbounded for back-compat)
--fixture-counts-max <n> Max unique testIds retained in the fixture match-count map (default: 500; 0 = unbounded)
--provider-openai <url> Upstream URL for OpenAI
--provider-anthropic <url> Upstream URL for Anthropic
--provider-gemini <url> Upstream URL for Gemini
--provider-vertexai <url> Upstream URL for Vertex AI
--provider-bedrock <url> Upstream URL for Bedrock
--provider-azure <url> Upstream URL for Azure OpenAI
--provider-ollama <url> Upstream URL for Ollama
--provider-cohere <url> Upstream URL for Cohere
--upstream-timeout-ms <ms> Connection idle timeout (ms) on the upstream request socket before the response body begins (default: 30000). Increase for upstreams with slow initial responses (reasoning models, queue-backed providers).
--body-timeout-ms <ms> Inter-chunk idle timeout (ms) on the upstream response body — fires if no bytes arrive for this duration after the response has started streaming (default: 30000). Reasoning models under concurrent load can leave 30s+ gaps between chunks; increase to e.g. 180000 in those setups.
--agui-record Enable AG-UI recording (proxy unmatched AG-UI requests)
--agui-proxy-only AG-UI proxy mode (forward on miss, no saving or caching)
--agui-upstream <url> Upstream AG-UI agent URL (used with --agui-record / --agui-proxy-only)
--chaos-drop <rate> Probability (01) of dropping requests with 500
--chaos-malformed <rate> Probability (01) of returning malformed JSON
--chaos-disconnect <rate> Probability (01) of destroying the connection mid-stream

Programmatic API

Programmatic recording ts
import { LLMock } from "@copilotkit/aimock";

const mock = new LLMock();
await mock.start();

try {
  // Enable recording — unmatched requests are proxied AND saved as fixtures
  mock.enableRecording({
    providers: {
      openai: "https://api.openai.com",
      anthropic: "https://api.anthropic.com",
    },
    fixturePath: "./fixtures/recorded",
  });

  // Make requests — unmatched ones are proxied and recorded
  // ...

  // Disable recording — recorded fixtures persist on disk
  mock.disableRecording();
} finally {
  // Always release the port, even if a test above threw
  await mock.stop();
}

To proxy unmatched requests without writing fixtures to disk, set proxyOnly: true and omit fixturePath. Despite the method name, enableRecording({ proxyOnly: true }) does not write fixtures — omit or set proxyOnly: false to actually record. Remember to mock.stop() when done:

Proxy-only (no recording) ts
try {
  mock.enableRecording({
    providers: {
      openai: "https://api.openai.com",
      anthropic: "https://api.anthropic.com",
    },
    proxyOnly: true,
  });
  // ...make requests; unmatched ones proxy through without being saved
} finally {
  await mock.stop();
}

Stream Collapsing

When the upstream provider returns a streaming response, aimock collapses it into a non-streaming fixture. Six streaming formats are supported:

Format Provider Content-Type
OpenAI SSE OpenAI, Azure text/event-stream
Anthropic SSE Anthropic text/event-stream
Gemini SSE Gemini, Vertex AI text/event-stream
Cohere SSE Cohere text/event-stream
Ollama NDJSON Ollama application/x-ndjson
Bedrock EventStream AWS Bedrock application/vnd.amazon.eventstream

The collapse extracts text content and tool calls from streaming chunks and produces a simple { content } or { toolCalls } fixture response.

Header Forwarding

When proxying to upstream providers, aimock forwards every header from the original request except hop-by-hop headers (per RFC 2616 §13.5.1) and headers the upstream HTTP client must set from the target URL or body. This pass-through behavior (in place since v1.6.1) means custom gateway headers, openai-organization, anthropic-version, and any other provider-specific auth or routing header all reach the upstream as-is.

The following headers are stripped before proxying:

Auth headers are never saved in recorded fixtures. The fixture only contains the match criteria (derived from the last user message) and the response content.

Strict Mode

When --strict is enabled, unmatched requests that cannot be proxied (no upstream configured for that provider) return 503 Service Unavailable instead of the default 404. This is useful for CI environments where you want to catch unexpected API calls.

Fixture Auto-Generation

Recorded fixtures are saved to disk with timestamped filenames:

Recorded fixture file json
// fixtures/recorded/openai-<YYYY-MM-DD>T<HH-MM-SS>-000Z-0.json
{
  "fixtures": [
    {
      "match": { "userMessage": "What is the weather?" },
      "response": { "content": "I don't have real-time weather data..." }
    }
  ]
}

Match criteria are derived from the original request: the last user message becomes userMessage, or for embedding requests, the input becomes inputText. If no match criteria can be derived (e.g., empty messages), the fixture is saved to disk with a warning but not registered in memory.

Model-Aware Recording

When recording fixtures, aimock automatically includes the model name in match criteria. This prevents collisions when your app makes multiple LLM calls with the same user message but different models (e.g., Opus for chat + Haiku for title generation).

Model names are normalized by stripping date/version suffixes so fixtures survive provider version bumps:

Request Model Recorded As
claude-opus-4-20250514 claude-opus-4
gpt-4o-2024-08-06 gpt-4o
claude-3-5-sonnet-20241022 claude-3-5-sonnet
llama3.1 llama3.1 (no date suffix — unchanged)

Matching uses prefix comparison, so model: "claude-opus-4" in a fixture matches requests for claude-opus-4-20250514, claude-opus-4-20250915, or any future version.

To record the full model version instead (disabling normalization), set recordFullModelVersion to true in the recording config:

Disable model normalization json
{
  "llm": {
    "record": {
      "providers": { "openai": "sk-..." },
      "recordFullModelVersion": true
    }
  }
}

Or programmatically:

Programmatic usage ts
mock.enableRecording({
  providers: { openai: "https://api.openai.com" },
  fixturePath: "./fixtures/recorded",
  recordFullModelVersion: true,
});

Context-Aware Recording

When a request carries an X-AIMock-Context header, the recorder automatically captures the context value in match.context. On replay, fixtures with context only match requests carrying that exact header value — fixtures without context remain shared across all callers.

Directory routing

Without snapshot-style recording (X-Test-Id), recorded fixtures for a given context are written to a <fixturePath>/<context>/ subdirectory:

Context directory layout text
fixtures/recorded/
  openai-2026-05-18T10-30-00-000Z-a1b2c3d4.json     # no context (shared)
  langgraph-python/
    openai-2026-05-18T10-30-01-000Z-e5f6a7b8.json   # context = langgraph-python
  crewai/
    openai-2026-05-18T10-30-02-000Z-c9d0e1f2.json   # context = crewai

When X-Test-Id is also present, snapshot-style paths take precedence and the context is captured only in match.context within the fixture file, not in the directory structure.

Sending X-AIMock-Context

Header example ts
// Set as a default header on your LLM client
const client = new OpenAI({
  baseURL: "http://localhost:4010/v1",
  apiKey: "mock",
  defaultHeaders: { "X-AIMock-Context": "langgraph-python" },
});

Upstream Timeouts

By default, aimock aborts a proxied request if the upstream socket is idle for 30 seconds (before the response body) or if no bytes arrive for 30 seconds during streaming. Reasoning models under concurrent load can exceed these limits during the thinking phase. Override with upstreamTimeoutMs and bodyTimeoutMs:

Custom timeouts ts
mock.enableRecording({
  providers: { openai: "https://api.openai.com" },
  fixturePath: "./fixtures/recorded",
  bodyTimeoutMs: 180_000,
});

Or via CLI: --body-timeout-ms 180000. Values must be positive finite numbers; zero, negative, NaN, and Infinity are rejected (CLI exits non-zero; programmatic API falls back to the 30s default).

Drift Detection Metadata

Recorded fixtures include a metadata block with hashes of the system prompt and tool definitions at recording time. These are informational only — not used for matching — and help you detect when your prompts or tools have changed since the fixture was recorded.

Recorded fixture with metadata json
{
  "match": { "userMessage": "hello", "model": "claude-opus-4" },
  "metadata": { "systemHash": "a7f3c291", "toolsHash": "e4b12d08" },
  "response": { "content": "Hi there!" }
}

When you re-record a fixture and the hashes differ from the previous version, it signals that your application’s prompts or tool definitions have evolved. This is useful for auditing fixture freshness — if the hashes don’t match, the recorded response may no longer reflect what the real provider would return for the current prompt.

Snapshot-Style Recording

When the X-Test-Id header is present on a request, aimock uses snapshot-style recording instead of the default timestamp-based filenames. Fixtures are organized by test, producing stable file paths that work well with version control and PR diffs.

Directory structure

The test ID is slugified into a directory name, and each provider gets its own file within that directory:

Snapshot directory layout text
fixtures/recorded/
  agent-chat--handles-tool-call/
    openai.json        # All OpenAI fixtures for this test
    anthropic.json     # All Anthropic fixtures for this test
  simple-test/
    openai.json

The slugify rules: Common test file prefixes (.spec.ts, .test.tsx, .e2e.js, etc.) are automatically stripped from the test ID before slugifying, so my-app.spec.ts › greeting becomes greeting. Then Playwright's  ›  separator becomes --, non-word characters become -, runs of 3+ dashes collapse to --, and the result is lowercased. For example, "agent chat › handles tool call" becomes agent-chat--handles-tool-call.

Merge behavior on re-run

When you re-run a test, the new fixture is appended to the existing <provider>.json file rather than overwriting it. This preserves multi-turn conversations in a single file. If the existing file is corrupted (invalid JSON), it is silently replaced.

Sending X-Test-Id from test frameworks

Playwright ts
// Playwright exposes testInfo.titlePath which joins suite + test titles
import { test } from "@playwright/test";

test("handles tool call", async ({ page }, testInfo) => {
  // titlePath = ["agent chat", "handles tool call"]
  const testId = testInfo.titlePath.join(" › ");
  // Set on your OpenAI/Anthropic client config as a default header:
  // headers: { "X-Test-Id": testId }
});
Vitest ts
import { describe, it } from "vitest";

describe("agent chat", () => {
  it("handles tool call", async () => {
    // Pass X-Test-Id on each LLM request:
    const resp = await fetch("http://localhost:4010/v1/chat/completions", {
      headers: { "X-Test-Id": "agent chat › handles tool call" },
      // ...body
    });
  });
});

Fallback behavior

When no X-Test-Id header is present (or the value is __default__), recording falls back to the standard timestamp-based filename: <provider>-<timestamp>-<uuid>.json.

Fixture Lifecycle

Local Development Workflow

Record once against real APIs, then replay from fixtures for fast, offline development.

Record then replay shell
# First run: record real API responses
$ npx -p @copilotkit/aimock llmock --record --provider-openai https://api.openai.com -f ./fixtures

# Subsequent runs: replay from recorded fixtures
$ npx -p @copilotkit/aimock llmock -f ./fixtures
Record then replay shell
# First run: record real API responses
$ docker run -d -p 4010:4010 \
  -v $(pwd)/fixtures:/fixtures \
  ghcr.io/copilotkit/aimock \
  --record --provider-openai https://api.openai.com -f /fixtures -h 0.0.0.0

# Subsequent runs: replay from recorded fixtures
$ docker run -d -p 4010:4010 \
  -v $(pwd)/fixtures:/fixtures \
  ghcr.io/copilotkit/aimock \
  -f /fixtures -h 0.0.0.0

CI Pipeline Workflow

Use the Docker image in CI with --strict mode to ensure every request matches a recorded fixture. No API keys needed, no flaky network calls.

GitHub Actions example yaml
- name: Start aimock
  run: |
    docker run -d --rm --name aimock \
      -v $(pwd)/fixtures:/fixtures \
      -p 4010:4010 \
      ghcr.io/copilotkit/aimock \
      --strict -f /fixtures -h 0.0.0.0

- name: Run tests
  env:
    OPENAI_BASE_URL: http://localhost:4010/v1
  run: pnpm test

- name: Stop aimock
  if: always()
  run: docker rm -f aimock

Request Transform

Prompts often contain dynamic data — timestamps, UUIDs, session IDs — that changes between runs. This causes fixture mismatches on replay because the recorded key no longer matches the live request. The requestTransform option normalizes requests before both matching and recording, stripping out the volatile parts.

Strip timestamps before matching ts
import { LLMock } from "@copilotkit/aimock";

const mock = new LLMock({
  requestTransform: (req) => ({
    ...req,
    messages: req.messages.map((m) => ({
      ...m,
      content:
        typeof m.content === "string"
          ? m.content.replace(/\d{4}-\d{2}-\d{2}T[\d:.+Z-]+/g, "").trim()
          : m.content,
    })),
  }),
});

// Fixture uses the cleaned key (no timestamp)
mock.onMessage("tell me the weather", { content: "Sunny" });

// Request with a timestamp still matches after transform
await mock.start();

When requestTransform is set, string matching for userMessage and inputText switches from substring (includes) to exact equality (===). This prevents shortened keys from accidentally matching unrelated prompts. Without a transform, the existing includes behavior is preserved for backward compatibility.

The transform is applied in both directions: recording saves the transformed match key (no timestamps in the fixture file), and matching transforms the incoming request before comparison. This means recorded fixtures and live requests always use the same normalized key.

Building Fixture Sets

A practical workflow for building and maintaining fixture sets:

  1. Run with --record against real APIs during development
  2. Review recorded fixtures in fixtures/recorded/
  3. Move and rename to organized fixture directories
  4. Switch to --strict mode in CI
  5. Re-record when upstream APIs change (drift detection catches this)

Recording Multi-Turn Conversations

The recorder is stateless across turns. Every incoming request is treated as an independent unit: aimock derives fixture match criteria from a single request at a time, and it doesn’t remember or hash prior turns of the same test session. Two post-record remedies handle the two flavors of collision: add toolCallId for tool-round follow-ups (covered on the Multi-Turn Conversations page), or add sequenceIndex for the same user prompt repeating (covered below).

Specifically, for chat requests the recorder reads only the last user message of the request and saves it as match.userMessage. For embedding requests it uses the input text. There is no history-based keying — unlike VCR-style recorders that fingerprint the full request body, aimock keys fixtures purely on the last user message.

Recorder match derivation ts
// src/recorder.ts (simplified; real implementation guards against a null last-user-message)
function buildFixtureMatch(request) {
  if (request.embeddingInput) {
    return { inputText: request.embeddingInput };
  }
  // Chat/multimedia — key on the LAST user message only
  const lastUser = getLastMessageByRole(request.messages, "user");
  const match = { userMessage: getTextContent(lastUser.content) };
  // Capture context from X-AIMock-Context header if present
  if (request._context) match.context = request._context;
  return match;
}

Consequence: identical final prompts collide

Two turns of the same conversation that happen to end with the same user message produce two fixture entries that share the same match.userMessage. On replay, the router picks the first fixture that matches (first-wins by file load order), and the second fixture is effectively shadowed. This matters whenever a test says “continue”, “yes”, “retry”, or any other short prompt twice in a row.

Recommended workflow

  1. Run the test once under --record and let aimock capture one fixture per turn as usual.
  2. Review the recorded fixtures. For turns whose purpose is answering a tool call, rewrite the match to key on toolCallId instead of userMessage — this is the canonical tool-round idiom.
  3. For genuine repeats of the same user prompt, add sequenceIndex (0, 1, …) to each fixture to differentiate them by call order.
  4. Move the hand-edited fixtures into your organized fixture directory and switch to --strict for replay in CI.

Gotcha: shadowed repeats

If you record a conversation that sends “continue” twice, you will get two fixtures with identical match.userMessage: "continue". On replay, the second one will never fire until you add sequenceIndex post-record. See Sequential / Stateful Responses for the mechanics. For tool-round conversations where each post-tool turn carries a distinct tool_call_id, prefer keying by toolCallId — see Multi-Turn Conversations.

Cross-Language Testing

The Docker image serves any language that speaks HTTP. Point your client at the mock server's URL instead of the real API.

Any language, one server bash
# Docker image serves all languages
docker run -d -p 4010:4010 -v $(pwd)/fixtures:/fixtures ghcr.io/copilotkit/aimock -f /fixtures -h 0.0.0.0

# Python
import openai
client = openai.OpenAI(base_url="http://localhost:4010/v1", api_key="mock")

# Go — github.com/sashabaranov/go-openai
config := openai.DefaultConfig("mock")
config.BaseURL = "http://localhost:4010/v1"
client := openai.NewClientWithConfig(config)

# Rust — async-openai
let config = OpenAIConfig::new()
    .with_api_base("http://localhost:4010/v1")
    .with_api_key("mock");
let client = Client::with_config(config);