Fixtures

Fixtures define what the mock server returns. Each fixture has a match criteria and a response. Load them from JSON files, register them programmatically, or mix both approaches.

File Format

fixtures/example.json json

{
  "fixtures": [
    {
      "match": {
        "userMessage": "hello",
        "model": "gpt-4"
      },
      "response": {
        "content": "Hello!"
      },
      "latency": 200,
      "chunkSize": 10
    }
  ]
}

Match Fields

Field	Type	Description
userMessage	string \| RegExp	Match on the last user message — string (substring, or exact when `requestTransform` is set) or regex (pattern match)
inputText	string \| RegExp	Match on embedding input text
toolCallId	string	Match on `tool_call_id` of the last `role: "tool"` message. The `onToolResult(id, response)` helper is sugar over this field
toolName	string	Match on tool function name — compared against the names of tool definitions in the request’s `tools:` array
model	string \| RegExp	Match on the requested model name
responseFormat	string	Match on response_format.type (e.g. "json_object")
sequenceIndex	number	Match on the Nth occurrence of this pattern
turnIndex	number	Count of `role: "assistant"` messages in the request. Stateless — derived from request content, safe for shared instances. See Multi-Turn Conversations
hasToolResult	boolean	`true` when at least one `role: "tool"` message is present; `false` when none are. Stateless alternative to ordering fixtures by `toolCallId`. See Multi-Turn Conversations
endpoint	string	Restrict to endpoint type: chat, image, speech, transcription, video, embedding. Search, rerank, and moderation services (added in 1.7.0) are registered through their own fixture APIs rather than via this field
context	string	Restrict to a named context via `X-AIMock-Context` header. Fixtures with `context` only match requests carrying that exact value; fixtures without `context` match any request. Same opt-in semantics as `endpoint`
predicate	function	Custom function: (req) => boolean (programmatic only)

Matching Semantics

These are the rules the router uses to pick a fixture for a given request. All fields on match are AND-ed — every one must pass for the fixture to be selected.

1. `userMessage` matches only the LAST user message

userMessage is compared against the content of the last message with role: "user" in the request. Earlier user messages in the conversation history are ignored. A request that contains ten turns of prior history plus one new user turn only matches against that final turn — never against anything earlier.

This is the single rule that trips people up most often. If you need to differentiate conversations by earlier context (for example, to return a different response on the second round of a tool-using conversation), use toolCallId, sequenceIndex, or a predicate instead of piling keywords into userMessage. See Multi-Turn Conversations for the tool-round idiom.

2. `toolCallId` matches the LAST tool message

toolCallId is compared against the tool_call_id of the last role: "tool" message in the request — regardless of whether that’s the overall last message. If no tool message is present in the history, toolCallId never matches. See Multi-Turn Conversations for the tool-round idiom.

3. First match wins, in file order

Fixtures are evaluated in the order they were registered. The first fixture whose match criteria all pass is returned — subsequent fixtures are not consulted. For file-loaded fixtures, that means order within the JSON array. For loadFixtureDir(), files are loaded in sorted filename order, so a 00-catchall.json loaded before 10-specific.json will shadow the specific fixture. Put more specific fixtures before broader ones.

sequenceIndex lets a single pattern return different responses on repeated matches — see Sequential Responses.

4. Substring by default, exact when a `requestTransform` is set

By default, string userMessage (and inputText) match via String.includes — userMessage: "hello" matches "say hello world". Pass a RegExp when you need pattern matching. When a requestTransform is configured, this behavior flips to strict equality — see the next paragraph for why.

If the router is configured with a requestTransform (typically used to strip dynamic data like timestamps or UUIDs from the request before matching), string userMessage and inputText flip to strict equality (===). The rationale: transforms normalize requests to a canonical form, and once normalized, the sensible comparison is exact — substring matching on a normalized string is more likely to hide bugs than catch flexible input.

5. Validation warnings surface shadowing at load time

validateFixtures() runs when fixtures are loaded and emits warnings for common shadowing mistakes:

Duplicate userMessage — two fixtures with the same string userMessage produce a warning of the form duplicate userMessage 'hello' — shadows fixture 0, where 'hello' is the duplicated message and 0 is the zero-based index of the earlier fixture being shadowed. This is advisory, not a hard error: the check now factors in turnIndex, hasToolResult, context, and sequenceIndex when deciding whether two fixtures truly collide, but it does not consider toolCallId, model, or predicate, so the warning may still fire when those discriminators are present. Treat it as advisory: if a runtime differentiator is in place, the fixtures won't actually shadow each other at match time. Only fixtures with no differentiator at all will truly shadow on match — that's the case where the second is never reached because the first wins. Safe to ignore in the former case; investigate in the latter.
Catch-all not last — a fixture with an empty match (no discriminator fields) matches everything. If it is not the final fixture, every fixture after it is unreachable. The warning is of the form empty match acts as catch-all but is not the last fixture — shadows fixtures 3+, where 3 is the zero-based index of the first shadowed fixture (i.e. every fixture from that index onward).

6. Use `predicate` for arbitrary logic

When the built-in match fields can't express the condition you need, a predicate function receives the full request and returns a boolean. It is the escape hatch for anything from inspecting the assistant's prior tool call arguments to gating on system-prompt content. Predicates are programmatic-only — JSON fixture files cannot serialize functions.

predicate.ts ts

mock.on(
  { predicate: (req) => req.messages.at(-1)?.role === "tool" },
  { content: "Done!" }
);

Response Types

Type	Fields	Description
Text	content, role?, finishReason?, reasoning?, webSearches?	Plain text response
Tool Call	toolCalls[], finishReason?	Function call(s) with name + arguments
Content + Tool Calls	content, toolCalls[], blocks?, reasoning?, finishReason?	Text and tool calls in a single response. Add an optional `blocks` array to control stream order (e.g. tool-first) — see Ordered blocks below.
Error	error.message, error.type?, status?	Error response with HTTP status
Embedding	embedding[]	Vector of numbers
Image	image.url or images[].url	Generated image URL(s) or base64 data
Speech	audio	Base64-encoded audio data
Transcription	transcription.text, words?, segments?	Transcribed text with optional timestamps
Video	video.id, video.status, video.url?, video.error?, video.b64?, video.cost?	Generated video with async polling — `error` is the failure message surfaced by async video jobs, `b64` is base64-encoded video bytes served by content-download endpoints, `cost` is the generation cost surfaced in usage envelopes

Override fields: Text, Tool Call, and Content + Tool Calls responses also accept the override fields listed below (id, model, usage, finishReason, role, systemFingerprint, created).

Ordered blocks (tool-first & interleaved streaming)

By default a Content + Tool Calls response streams its text first, then its tool calls. To control that order — for example to emit a tool call before any text (“tool-first”), or to interleave text and tool calls — add an optional blocks array. Each entry is one of:

{ "type": "text", "text": "..." } — a text segment
{ "type": "toolCall", "name": "...", "arguments": ..., "id": "..." } — a tool call (id optional; arguments accepts an object or string, same auto-stringify rules as elsewhere)

When blocks is present it takes precedence over the content and toolCalls fields for stream ordering: the blocks are streamed in array order. When blocks is absent, legacy { content, toolCalls } fixtures stream exactly as before — text-first, byte-identical to prior releases. The field is purely additive.

Blocks-only fixtures (first-class)

A fixture can be written with only a blocks array — no content or toolCalls needed. A non-empty blocks array is a first-class response shape: the builders derive the aggregate text and tool calls from the blocks themselves, and validateFixtures() accepts it without requiring the legacy fields. This is the cleanest way to author a tool-first or interleaved response — you express the order once, in one place, with no duplicated aggregate to keep in sync.

tool-first.json json

{
  "blocks": [
    { "type": "toolCall", "name": "get_weather", "arguments": { "city": "SF" }, "id": "call_1" },
    { "type": "text", "text": "Here is the weather." }
  ]
}

The example above streams the get_weather tool call before the text, with no separate content / toolCalls fields. For an interleaved stream, list blocks in the desired order, e.g. [toolCall, text, toolCall].

You may still supply content and toolCalls alongside blocks if you want an explicit aggregate — for example to assert a specific merged shape independently of the order. Both forms are supported; blocks always wins for stream ordering.

Validation: validateFixtures() checks a blocks array at load time so a malformed array is rejected before it reaches a builder — blocks must be an array; each entry must be an object with type "text" or "toolCall"; a text block needs a non-empty string text; a toolCall block needs a non-empty name, arguments that are a valid-JSON string or an object, and an optional string id. If a fixture carries both blocks and legacy content/toolCalls that disagree, loading warns (the redundant legacy fields are ignored in favor of blocks).

Per-provider observability

How faithfully “tool-first” / interleaved order is observable depends on each provider's wire protocol — and, for some providers, on whether the request is streaming. The mock always emits in block order; what a client can reconstruct from the result varies. A shape is Full when the wire carries the blocks in a single positionally-ordered structure (indexed content blocks, ordered output items, ordered steps); it is Non-observable when text and tool calls land in separate top-level fields that the client merges without a shared order. It is Partial when block order is carried on the wire (chunk arrival order) but the structure is not positionally indexed, so some clients reassemble positionally rather than honoring arrival order — observable best-effort, not guaranteed. The classifications below were verified against each provider's builder.

Provider / shape	Block-order support	Notes
Anthropic (Claude Messages)	Full	Typed `text` / `tool_use` content blocks at incrementing indices — tool-first and interleaved are natively observable, streaming and non-streaming alike.
OpenAI Responses API	Full	Ordered `output` items (message vs `function_call`) carry `output_index` — SDKs honor the order, so a tool call can precede the message.
Gemini	Full	Ordered parts/candidate chunks carry `functionCall` and text in any order.
Gemini Interactions (replay)	Full	One step per block in array order — a `function_call` step takes a lower `index` than a later `model_output` step, streaming (`step.*` events) and non-streaming (`steps[]`) alike. Record side is args-normalization only — see the note below.
Bedrock invoke	Full	Mirrors the Anthropic Messages content array: ordered `text` / `tool_use` entries non-streaming, indexed `content_block_*` events streaming — tool-first is wire-expressible on both.
Bedrock Converse	Full	Positional `content[]` blocks non-streaming, indexed `contentBlock*` events (carrying `contentBlockIndex`) streaming — a `toolUse` can precede the text on both.
Cohere (streaming)	Full	SSE emits `content-` and `tool-call-` events in block array order, each carrying an `index` — tool-first / interleaved is observable on the stream.
Ollama (streaming)	Partial	A `tool_calls` chunk can be emitted before content on the wire, but some clients reassemble positionally. Best-effort.
OpenAI chat-completions	Non-observable	`delta.content` and `delta.tool_calls` (streaming), or `message.content` and `message.tool_calls` (non-streaming), are separate channels/fields the client merges. The mock emits in block order and the streamed wire order is assertable, but the merge is not positionally interleaved, so tool-first is not semantically observable to clients on this channel.
Cohere (non-streaming)	Non-observable	The non-streaming body keeps text in `message.content[]` and tool calls in the separate `message.tool_calls[]` field — the relative order of a text vs. a toolCall block is not on the wire. Use the streaming shape when order matters.
Ollama (non-streaming)	Non-observable	The aggregated reply carries `message.content` and `message.tool_calls` as separate fields — no positional ordering between a text and a toolCall block. Use the streaming shape when order matters.

Recording: In record mode the recorder only persists a blocks array when the recorded upstream stream was genuinely tool-first or interleaved (a tool-call delta arrives before the first content delta, or content arrives after a tool-call delta). Ordinary text-then-tools streams are saved in the legacy { content, toolCalls } shape with no blocks key, so existing golden recordings round-trip byte-identically. The Cohere and Bedrock collapsers capture block order this way alongside the original providers.

Gemini Interactions is the exception: its record-side collapser normalizes tool-call arguments only and does not reorder blocks on capture — its step-index protocol can't reconcile arrival-order blocks at record time. Ordering is still honored on replay from a hand-authored blocks fixture; it is simply not reconstructed automatically from a recording.

JSON auto-stringify: In fixture files and programmatic API, arguments and content fields accept both objects and strings. Objects are automatically stringified via JSON.stringify(). Use the object form for readability — no more escaped JSON strings.

Dynamic responses: Responses can also be sync or async functions that receive the request and return the response dynamically. See Dynamic Responses on the Examples page.

Response Override Fields

Fixture responses can include optional fields to override auto-generated envelope values. These map correctly across all provider formats (OpenAI, Claude, Gemini, Responses API).

Field	Type	Description
id	string	Override auto-generated response ID
created	number	Override Unix timestamp
model	string	Override model name in response
usage	object	Override token counts: `{ prompt_tokens, completion_tokens, total_tokens }`. Also accepts Anthropic field names (`input_tokens`, `output_tokens`) and Gemini field names (`promptTokenCount`, `candidatesTokenCount`, `totalTokenCount`). OpenAI Chat Completions includes usage in the response body; the Responses API uses a separate `response.usage` object. When omitted, token counts are auto-computed from content length
finishReason	string	Override finish reason (default: "stop" or "tool_calls"). Provider mappings: `stop` → `end_turn` (Claude), `STOP` (Gemini), `completed` (Responses API); `tool_calls` → `tool_use` (Claude), `FUNCTION_CALL` (Gemini), `completed` (Responses API); `length` → `max_tokens` (Claude), `MAX_TOKENS` (Gemini), `incomplete` (Responses API); `content_filter` → `SAFETY` (Gemini), `failed` (Responses API)
role	string	Override message role (default: "assistant")
systemFingerprint	string	Add system_fingerprint to response

Fixture Options

Field	Type	Description
latency	number	Milliseconds delay between SSE chunks (streaming)
chunkSize	number	Characters per SSE chunk (streaming)
truncateAfterChunks	number	Abort stream after N chunks (error injection)
disconnectAfterMs	number	Disconnect after N ms (error injection)
streamingProfile	object	Streaming physics profile: `{ ttft, tps, jitter }`. See Streaming Physics
chaos	object	Per-fixture chaos config: `{ dropRate, malformedRate, disconnectRate }`. See Chaos Testing

Loading Fixtures

From a file

load-file.ts ts

const mock = new LLMock();
mock.loadFixtureFile("./fixtures/chat.json");
mock.loadFixtureFile("./fixtures/tools.json");

From a directory

load-dir.ts ts

// Loads all .json files in the directory (non-recursive)
mock.loadFixtureDir("./fixtures");

Snapshot-style recording: When recording with X-Test-Id, fixtures are automatically organized into per-test directories (<fixturePath>/<test-slug>/<provider>.json). See Snapshot-Style Recording for details.

Context-scoped fixtures

fixtures/context-example.json json

{
  "fixtures": [
    {
      "match": { "userMessage": "hello", "context": "langgraph-python" },
      "response": { "content": "Hi from LangGraph!" }
    },
    {
      "match": { "userMessage": "hello" },
      "response": { "content": "Hi from the shared fallback!" }
    }
  ]
}

Requests with X-AIMock-Context: langgraph-python match the first fixture; all other requests fall through to the shared fixture.

Programmatically

programmatic.ts ts

// Shorthand methods
mock.onMessage("hello", { content: "Hi!" });
mock.onToolCall("get_weather", { content: "72F" });
mock.onEmbedding("my text", { embedding: [0.1, 0.2] });
mock.onImage("sunset", { image: { url: "https://example.com/sunset.png" } });
mock.onSpeech("hello", { audio: "SGVsbG8=" });
mock.onTranscription({ transcription: { text: "Hello" } });
mock.onVideo("cats", { video: { id: "vid-1", status: "completed", url: "https://example.com/cats.mp4" } });
mock.onJsonOutput("data", { key: "value" });
mock.onToolResult("call_123", { content: "Done" });

// Full fixture object
mock.addFixture({
  match: { userMessage: "hello", model: "gpt-4" },
  response: { content: "Hi!" },
  latency: 100,
  chunkSize: 5,
});

// Predicate-based routing
mock.on(
  { predicate: (req) => req.messages.at(-1)?.role === "tool" },
  { content: "Done!" }
);

JSON files cannot use predicate (functions can't be serialized). Use programmatic registration for predicate-based routing.

onTranscription takes the response object directly — there is no user-provided input to match against, unlike onMessage / onToolCall / onEmbedding. Every transcription request matches the same fixture.

Provider Support Matrix

Feature	OpenAI Chat	OpenAI Responses	Claude	Gemini	Gemini Int.	Vertex AI	Bedrock	Azure	Ollama	Cohere
Text	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Tool Calls	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Content + Tool Calls	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Streaming	SSE	SSE	SSE	SSE	SSE	SSE	Binary EventStream	SSE	NDJSON	SSE
Reasoning	Yes	Yes	Yes	Yes	Record only^†	Yes	Yes	Yes	Yes	Yes
Web Searches	—	Yes	—	—	—	—	—	—	—	—
Response Overrides	Yes	Yes	Yes	Yes	Yes	Yes	—	Yes^*	—	—

^* Azure inherits OpenAI’s override support because Azure OpenAI routes through the OpenAI Chat Completions response format internally.

^† Gemini Interactions captures reasoning on record (its collapser assembles thought_summary deltas into reasoning), but its replay builders do not re-emit reasoning, so a replayed turn carries none.