Record & Replay
VCR-style record-and-replay support. When a request doesn't match any fixture, aimock proxies it to the real upstream provider, records the response as a fixture on disk and in memory, then replays it on subsequent identical requests.
How It Works
- Client sends a request to aimock
- aimock attempts fixture matching as usual
- On miss: the request is forwarded to the configured upstream provider
- The upstream response is relayed back to the client immediately
- The response is collapsed (if streaming) and saved as a fixture to disk and memory
- Subsequent identical requests match the newly recorded fixture
Proxy-Only Mode
Use --proxy-only instead of --record when you want unmatched
requests to always reach the real provider — no fixture files are written to disk
and no responses are cached in memory. Matched fixtures still work normally.
This is ideal for demos and live environments where you have canned
fixtures for repeatable demo scenarios you want to show off, but also want regular
interactions to work normally by proxying to the real provider. Without
--proxy-only, the first real API call would get recorded and cached, and
subsequent identical requests would get the stale recorded response instead of hitting the
live provider.
$ npx -p @copilotkit/aimock llmock -f ./fixtures \
--proxy-only \
--provider-openai https://api.openai.com
$ docker run -d -p 4010:4010 \
-v $(pwd)/fixtures:/fixtures \
ghcr.io/copilotkit/aimock \
-f /fixtures -h 0.0.0.0 \
--proxy-only \
--provider-openai https://api.openai.com
| Mode | Unmatched request | Writes to disk | Caches in memory |
|---|---|---|---|
--record |
Proxy → save → replay next time | Yes | Yes |
--proxy-only |
Proxy → relay → proxy again next time | No | No |
Quick Start
$ npx -p @copilotkit/aimock llmock -f ./fixtures \
--record \
--provider-openai https://api.openai.com \
--provider-anthropic https://api.anthropic.com
$ docker run -d -p 4010:4010 \
-v $(pwd)/fixtures:/fixtures \
ghcr.io/copilotkit/aimock \
-f /fixtures -h 0.0.0.0 \
--record \
--provider-openai https://api.openai.com \
--provider-anthropic https://api.anthropic.com
Record & Replay CLI Flags
| Flag | Description |
|---|---|
-f, --fixtures <path> |
Path to fixtures directory or file (default: ./fixtures) |
-p, --port <port> |
Port to listen on (default: 4010) |
-h, --host <host> |
Host to bind to (default: 127.0.0.1) |
--record |
Enable record mode (proxy, save, and cache on miss) |
--proxy-only |
Proxy mode (forward on miss, no saving or caching) |
--strict |
Strict mode: return 503 (not 404) on unmatched requests |
-w, --watch |
Watch fixture path for changes and reload |
--log-level <level> |
Log verbosity: silent, info, debug (default:
info)
|
--validate-on-load |
Validate fixture schemas at startup |
--journal-max <n> |
Max request entries retained in memory (default: 1000 for both
serve and createServer() since v1.14.2; 0 =
unbounded; direct new Journal() instantiation still defaults to
unbounded for back-compat)
|
--fixture-counts-max <n> |
Max unique testIds retained in the fixture match-count map (default:
500; 0 = unbounded)
|
--provider-openai <url> |
Upstream URL for OpenAI |
--provider-anthropic <url> |
Upstream URL for Anthropic |
--provider-gemini <url> |
Upstream URL for Gemini |
--provider-vertexai <url> |
Upstream URL for Vertex AI |
--provider-bedrock <url> |
Upstream URL for Bedrock |
--provider-azure <url> |
Upstream URL for Azure OpenAI |
--provider-ollama <url> |
Upstream URL for Ollama |
--provider-cohere <url> |
Upstream URL for Cohere |
--upstream-timeout-ms <ms> |
Connection idle timeout (ms) on the upstream request socket before the response body
begins (default: 30000). Increase for upstreams with slow initial
responses (reasoning models, queue-backed providers).
|
--body-timeout-ms <ms> |
Inter-chunk idle timeout (ms) on the upstream response body — fires if no
bytes arrive for this duration after the response has started streaming (default:
30000). Reasoning models under concurrent load can leave 30s+ gaps
between chunks; increase to e.g. 180000 in those setups.
|
--agui-record |
Enable AG-UI recording (proxy unmatched AG-UI requests) |
--agui-proxy-only |
AG-UI proxy mode (forward on miss, no saving or caching) |
--agui-upstream <url> |
Upstream AG-UI agent URL (used with --agui-record /
--agui-proxy-only)
|
--chaos-drop <rate> |
Probability (0–1) of dropping requests with 500
|
--chaos-malformed <rate> |
Probability (0–1) of returning malformed JSON |
--chaos-disconnect <rate> |
Probability (0–1) of destroying the connection
mid-stream
|
Programmatic API
import { LLMock } from "@copilotkit/aimock";
const mock = new LLMock();
await mock.start();
try {
// Enable recording — unmatched requests are proxied AND saved as fixtures
mock.enableRecording({
providers: {
openai: "https://api.openai.com",
anthropic: "https://api.anthropic.com",
},
fixturePath: "./fixtures/recorded",
});
// Make requests — unmatched ones are proxied and recorded
// ...
// Disable recording — recorded fixtures persist on disk
mock.disableRecording();
} finally {
// Always release the port, even if a test above threw
await mock.stop();
}
To proxy unmatched requests without writing fixtures to disk, set
proxyOnly: true and omit fixturePath. Despite the method name,
enableRecording({ proxyOnly: true }) does not write fixtures
— omit or set proxyOnly: false to actually record. Remember to
mock.stop() when done:
try {
mock.enableRecording({
providers: {
openai: "https://api.openai.com",
anthropic: "https://api.anthropic.com",
},
proxyOnly: true,
});
// ...make requests; unmatched ones proxy through without being saved
} finally {
await mock.stop();
}
Stream Collapsing
When the upstream provider returns a streaming response, aimock collapses it into a non-streaming fixture. Six streaming formats are supported:
| Format | Provider | Content-Type |
|---|---|---|
| OpenAI SSE | OpenAI, Azure | text/event-stream |
| Anthropic SSE | Anthropic | text/event-stream |
| Gemini SSE | Gemini, Vertex AI | text/event-stream |
| Cohere SSE | Cohere | text/event-stream |
| Ollama NDJSON | Ollama | application/x-ndjson |
| Bedrock EventStream | AWS Bedrock | application/vnd.amazon.eventstream |
The collapse extracts text content and tool calls from streaming chunks and produces a
simple { content } or { toolCalls } fixture response.
Header Forwarding
When proxying to upstream providers, aimock forwards every header from
the original request except hop-by-hop headers (per RFC 2616 §13.5.1) and headers the
upstream HTTP client must set from the target URL or body. This pass-through behavior (in
place since v1.6.1) means custom gateway headers, openai-organization,
anthropic-version, and any other provider-specific auth or routing header all
reach the upstream as-is.
The following headers are stripped before proxying:
-
Hop-by-hop (RFC 2616 §13.5.1):
connection,keep-alive,transfer-encoding,te,trailer,upgrade,proxy-authorization,proxy-authenticate -
Set by the HTTP client from the target URL / body:
host,content-length -
Not relevant for LLM APIs (avoid leaking or mismatched encoding):
cookie,accept-encoding
Auth headers are never saved in recorded fixtures. The fixture only contains the match criteria (derived from the last user message) and the response content.
Strict Mode
When --strict is enabled, unmatched requests that cannot be proxied (no
upstream configured for that provider) return 503 Service Unavailable
instead of the default 404. This is useful for CI environments where you want to catch
unexpected API calls.
Fixture Auto-Generation
Recorded fixtures are saved to disk with timestamped filenames:
// fixtures/recorded/openai-<YYYY-MM-DD>T<HH-MM-SS>-000Z-0.json
{
"fixtures": [
{
"match": { "userMessage": "What is the weather?" },
"response": { "content": "I don't have real-time weather data..." }
}
]
}
Match criteria are derived from the original request: the last user message becomes
userMessage, or for embedding requests, the input becomes
inputText. If no match criteria can be derived (e.g., empty messages), the
fixture is saved to disk with a warning but not registered in memory.
Model-Aware Recording
When recording fixtures, aimock automatically includes the model name in match criteria. This prevents collisions when your app makes multiple LLM calls with the same user message but different models (e.g., Opus for chat + Haiku for title generation).
Model names are normalized by stripping date/version suffixes so fixtures survive provider version bumps:
| Request Model | Recorded As |
|---|---|
claude-opus-4-20250514 |
claude-opus-4 |
gpt-4o-2024-08-06 |
gpt-4o |
claude-3-5-sonnet-20241022 |
claude-3-5-sonnet |
llama3.1 |
llama3.1 (no date suffix — unchanged) |
Matching uses prefix comparison, so model: "claude-opus-4" in a fixture
matches requests for claude-opus-4-20250514,
claude-opus-4-20250915, or any future version.
To record the full model version instead (disabling normalization), set
recordFullModelVersion to true in the recording config:
{
"llm": {
"record": {
"providers": { "openai": "sk-..." },
"recordFullModelVersion": true
}
}
}
Or programmatically:
mock.enableRecording({
providers: { openai: "https://api.openai.com" },
fixturePath: "./fixtures/recorded",
recordFullModelVersion: true,
});
Context-Aware Recording
When a request carries an X-AIMock-Context header, the recorder automatically
captures the context value in match.context. On replay, fixtures with
context only match requests carrying that exact header value — fixtures
without context remain shared across all callers.
Directory routing
Without snapshot-style recording (X-Test-Id), recorded fixtures for a given
context are written to a <fixturePath>/<context>/ subdirectory:
fixtures/recorded/
openai-2026-05-18T10-30-00-000Z-a1b2c3d4.json # no context (shared)
langgraph-python/
openai-2026-05-18T10-30-01-000Z-e5f6a7b8.json # context = langgraph-python
crewai/
openai-2026-05-18T10-30-02-000Z-c9d0e1f2.json # context = crewai
When X-Test-Id is also present, snapshot-style paths take precedence and the
context is captured only in match.context within the fixture file, not in the
directory structure.
Sending X-AIMock-Context
// Set as a default header on your LLM client
const client = new OpenAI({
baseURL: "http://localhost:4010/v1",
apiKey: "mock",
defaultHeaders: { "X-AIMock-Context": "langgraph-python" },
});
Upstream Timeouts
By default, aimock aborts a proxied request if the upstream socket is idle for 30 seconds
(before the response body) or if no bytes arrive for 30 seconds during streaming.
Reasoning models under concurrent load can exceed these limits during the thinking phase.
Override with upstreamTimeoutMs and bodyTimeoutMs:
mock.enableRecording({
providers: { openai: "https://api.openai.com" },
fixturePath: "./fixtures/recorded",
bodyTimeoutMs: 180_000,
});
Or via CLI: --body-timeout-ms 180000. Values must be positive finite numbers;
zero, negative, NaN, and Infinity are rejected (CLI exits
non-zero; programmatic API falls back to the 30s default).
Drift Detection Metadata
Recorded fixtures include a metadata block with hashes of the system prompt
and tool definitions at recording time. These are informational only — not used for
matching — and help you detect when your prompts or tools have changed since the
fixture was recorded.
{
"match": { "userMessage": "hello", "model": "claude-opus-4" },
"metadata": { "systemHash": "a7f3c291", "toolsHash": "e4b12d08" },
"response": { "content": "Hi there!" }
}
When you re-record a fixture and the hashes differ from the previous version, it signals that your application’s prompts or tool definitions have evolved. This is useful for auditing fixture freshness — if the hashes don’t match, the recorded response may no longer reflect what the real provider would return for the current prompt.
Snapshot-Style Recording
When the X-Test-Id header is present on a request, aimock uses
snapshot-style recording instead of the default timestamp-based
filenames. Fixtures are organized by test, producing stable file paths that work well with
version control and PR diffs.
Directory structure
The test ID is slugified into a directory name, and each provider gets its own file within that directory:
fixtures/recorded/
agent-chat--handles-tool-call/
openai.json # All OpenAI fixtures for this test
anthropic.json # All Anthropic fixtures for this test
simple-test/
openai.json
The slugify rules: Common test file prefixes (.spec.ts,
.test.tsx, .e2e.js, etc.) are automatically stripped from the
test ID before slugifying, so my-app.spec.ts › greeting becomes
greeting. Then Playwright's › separator becomes
--, non-word characters become -, runs of 3+ dashes collapse to
--, and the result is lowercased. For example,
"agent chat › handles tool call" becomes
agent-chat--handles-tool-call.
Merge behavior on re-run
When you re-run a test, the new fixture is appended to the existing
<provider>.json file rather than overwriting it. This preserves
multi-turn conversations in a single file. If the existing file is corrupted (invalid
JSON), it is silently replaced.
Sending X-Test-Id from test frameworks
// Playwright exposes testInfo.titlePath which joins suite + test titles
import { test } from "@playwright/test";
test("handles tool call", async ({ page }, testInfo) => {
// titlePath = ["agent chat", "handles tool call"]
const testId = testInfo.titlePath.join(" › ");
// Set on your OpenAI/Anthropic client config as a default header:
// headers: { "X-Test-Id": testId }
});
import { describe, it } from "vitest";
describe("agent chat", () => {
it("handles tool call", async () => {
// Pass X-Test-Id on each LLM request:
const resp = await fetch("http://localhost:4010/v1/chat/completions", {
headers: { "X-Test-Id": "agent chat › handles tool call" },
// ...body
});
});
});
Fallback behavior
When no X-Test-Id header is present (or the value is
__default__), recording falls back to the standard timestamp-based filename:
<provider>-<timestamp>-<uuid>.json.
Fixture Lifecycle
-
On disk: Fixtures persist in the configured
fixturePathdirectory (default:./fixtures/recorded) - In memory: Recorded fixtures are immediately available for matching subsequent requests in the same session
- After restart: Load the recorded fixture directory to replay previous recordings
Local Development Workflow
Record once against real APIs, then replay from fixtures for fast, offline development.
# First run: record real API responses
$ npx -p @copilotkit/aimock llmock --record --provider-openai https://api.openai.com -f ./fixtures
# Subsequent runs: replay from recorded fixtures
$ npx -p @copilotkit/aimock llmock -f ./fixtures
# First run: record real API responses
$ docker run -d -p 4010:4010 \
-v $(pwd)/fixtures:/fixtures \
ghcr.io/copilotkit/aimock \
--record --provider-openai https://api.openai.com -f /fixtures -h 0.0.0.0
# Subsequent runs: replay from recorded fixtures
$ docker run -d -p 4010:4010 \
-v $(pwd)/fixtures:/fixtures \
ghcr.io/copilotkit/aimock \
-f /fixtures -h 0.0.0.0
CI Pipeline Workflow
Use the Docker image in CI with --strict mode to ensure every request matches
a recorded fixture. No API keys needed, no flaky network calls.
- name: Start aimock
run: |
docker run -d --rm --name aimock \
-v $(pwd)/fixtures:/fixtures \
-p 4010:4010 \
ghcr.io/copilotkit/aimock \
--strict -f /fixtures -h 0.0.0.0
- name: Run tests
env:
OPENAI_BASE_URL: http://localhost:4010/v1
run: pnpm test
- name: Stop aimock
if: always()
run: docker rm -f aimock
Request Transform
Prompts often contain dynamic data — timestamps, UUIDs, session IDs — that
changes between runs. This causes fixture mismatches on replay because the recorded key no
longer matches the live request. The requestTransform option normalizes
requests before both matching and recording, stripping out the volatile parts.
import { LLMock } from "@copilotkit/aimock";
const mock = new LLMock({
requestTransform: (req) => ({
...req,
messages: req.messages.map((m) => ({
...m,
content:
typeof m.content === "string"
? m.content.replace(/\d{4}-\d{2}-\d{2}T[\d:.+Z-]+/g, "").trim()
: m.content,
})),
}),
});
// Fixture uses the cleaned key (no timestamp)
mock.onMessage("tell me the weather", { content: "Sunny" });
// Request with a timestamp still matches after transform
await mock.start();
When requestTransform is set, string matching for
userMessage and inputText switches from substring
(includes) to exact equality (===). This prevents shortened keys
from accidentally matching unrelated prompts. Without a transform, the existing
includes behavior is preserved for backward compatibility.
The transform is applied in both directions: recording saves the transformed match key (no timestamps in the fixture file), and matching transforms the incoming request before comparison. This means recorded fixtures and live requests always use the same normalized key.
Building Fixture Sets
A practical workflow for building and maintaining fixture sets:
- Run with
--recordagainst real APIs during development - Review recorded fixtures in
fixtures/recorded/ - Move and rename to organized fixture directories
- Switch to
--strictmode in CI - Re-record when upstream APIs change (drift detection catches this)
Recording Multi-Turn Conversations
The recorder is stateless across turns. Every incoming request is treated
as an independent unit: aimock derives fixture match criteria from a
single request at a time, and it doesn’t remember or hash prior turns of
the same test session. Two post-record remedies handle the two flavors of collision: add
toolCallId for tool-round follow-ups (covered on
the Multi-Turn Conversations page), or add
sequenceIndex for the same user prompt repeating (covered below).
Specifically, for chat requests the recorder reads only the
last user message of the request and saves it as
match.userMessage. For embedding requests it uses the input text. There is no
history-based keying — unlike VCR-style recorders that fingerprint the full request
body, aimock keys fixtures purely on the last user message.
// src/recorder.ts (simplified; real implementation guards against a null last-user-message)
function buildFixtureMatch(request) {
if (request.embeddingInput) {
return { inputText: request.embeddingInput };
}
// Chat/multimedia — key on the LAST user message only
const lastUser = getLastMessageByRole(request.messages, "user");
const match = { userMessage: getTextContent(lastUser.content) };
// Capture context from X-AIMock-Context header if present
if (request._context) match.context = request._context;
return match;
}
Consequence: identical final prompts collide
Two turns of the same conversation that happen to end with the same user message produce
two fixture entries that share the same match.userMessage. On replay, the
router picks the first fixture that matches (first-wins by file load
order), and the second fixture is effectively shadowed. This matters whenever a test says
“continue”, “yes”, “retry”, or any other short prompt
twice in a row.
Recommended workflow
-
Run the test once under
--recordand let aimock capture one fixture per turn as usual. -
Review the recorded fixtures. For turns whose purpose is answering a tool call, rewrite
the match to key on
toolCallIdinstead ofuserMessage— this is the canonical tool-round idiom. -
For genuine repeats of the same user prompt, add
sequenceIndex(0,1, …) to each fixture to differentiate them by call order. -
Move the hand-edited fixtures into your organized
fixture directory and switch to
--strictfor replay in CI.
Gotcha: shadowed repeats
If you record a conversation that sends “continue” twice, you will get two
fixtures with identical match.userMessage: "continue". On replay, the second
one will never fire until you add sequenceIndex post-record. See
Sequential / Stateful Responses for the mechanics. For
tool-round conversations where each post-tool turn carries a distinct
tool_call_id, prefer keying by toolCallId — see
Multi-Turn Conversations.
Cross-Language Testing
The Docker image serves any language that speaks HTTP. Point your client at the mock server's URL instead of the real API.
# Docker image serves all languages
docker run -d -p 4010:4010 -v $(pwd)/fixtures:/fixtures ghcr.io/copilotkit/aimock -f /fixtures -h 0.0.0.0
# Python
import openai
client = openai.OpenAI(base_url="http://localhost:4010/v1", api_key="mock")
# Go — github.com/sashabaranov/go-openai
config := openai.DefaultConfig("mock")
config.BaseURL = "http://localhost:4010/v1"
client := openai.NewClientWithConfig(config)
# Rust — async-openai
let config = OpenAIConfig::new()
.with_api_base("http://localhost:4010/v1")
.with_api_key("mock");
let client = Client::with_config(config);