fal.ai
aimock mocks the fal.ai inference API — queue-based and synchronous runs for image,
video, audio, and any other fal model. Routes by x-fal-target-host header to
mirror the @fal-ai/client proxy convention.
How It Works
The @fal-ai/client SDK routes requests through a proxy using the
x-fal-target-host header to indicate the upstream fal service:
-
queue.fal.run— queue-based operations (submit, poll status, fetch result) fal.run— synchronous single-shot runsrest.fal.ai— storage and other REST endpoints
aimock intercepts POST /fal/{owner}/{model} requests bearing this header and
handles the full queue lifecycle: it auto-mints a request_id, returns status
and response URLs, and serves the matched fixture payload on result fetch.
Legacy path-based routing is also supported —
/fal/queue/submit/{model}, /fal/queue/requests/{id}, and
/fal/run/{model} all continue to work for backward compatibility.
Quick Start (Programmatic)
import { LLMock } from "@copilotkit/aimock";
import { describe, it, expect, beforeAll, afterAll } from "vitest";
let mock: LLMock;
beforeAll(async () => {
mock = new LLMock();
await mock.start();
});
afterAll(async () => {
await mock.stop();
});
it("image generation via queue", async () => {
// Register a queue fixture for Flux image generation
mock.onFalQueue(/flux/, { images: [{ url: "https://example.com/cat.png" }] });
// Submit → status → result, just like the real API
});
it("video generation via queue", async () => {
mock.onFalQueue(/kling/, { video: { url: "https://example.com/v.mp4" } });
});
it("synchronous transcription", async () => {
// Sync runs skip the queue entirely
mock.onFalRun(/whisper/, { text: "Hello world" });
});
Typed Helpers: onFalImage / onFalVideo
onFalQueue takes a raw JSON payload — the exact bytes that come out of fal.
When you want stronger types and don't want to hand-write the envelope, use the typed
helpers: they accept the same ImageResponse /
VideoResponse shapes you use with onImage / onVideo
and translate them into fal's wire shape before storing.
// Equivalent to onFalQueue(..., { images: [...], timings, seed, has_nsfw_concepts, prompt })
mock.onFalImage(/flux/, {
images: [{ url: "https://mock.fal.media/x.png" }],
});
// Equivalent to onFalQueue(..., { video: { url, content_type, file_name, file_size }, seed })
mock.onFalVideo(/kling/, {
video: { id: "v1", status: "completed", url: "https://mock.fal.media/clip.mp4" },
});
Defaults filled in for image: width: 1024, height: 1024,
content_type inferred from URL extension,
has_nsfw_concepts: [false, …] (one per image),
timings.inference: 0, seed: 0. For video:
content_type + file_name inferred from URL,
file_size: 0, seed: 0.
Client Configuration
Point the @fal-ai/client at aimock using requestMiddleware to
rewrite the proxy URL:
import { fal } from "@fal-ai/client";
fal.config({
requestMiddleware: fal.withMiddleware(
fal.withProxy({
targetUrl: "http://localhost:4005/fal", // aimock default port
})
),
});
The client sends the original target host (e.g. queue.fal.run) in the
x-fal-target-host header. aimock reads this header to decide whether to
handle the request as a queue operation, a sync run, or a storage call.
Queue Lifecycle
Queue-based operations follow a four-step lifecycle. aimock handles all steps automatically once a fixture is registered:
| Step | Method | Path | Response |
|---|---|---|---|
| Submit | POST | /fal/{owner}/{model} |
{ request_id, status_url, response_url, cancel_url }
|
| Status | GET | /fal/{owner}/{model}/requests/{id}/status |
{ status, request_id, response_url, logs[] } —
queue_position while pending, metrics.inference_time once
COMPLETED
|
| Result | GET | /fal/{owner}/{model}/requests/{id} |
The matched fixture payload (200) once COMPLETED; the status body (202)
before
|
| Cancel | PUT | /fal/{owner}/{model}/requests/{id}/cancel |
{ status: "CANCELLED" } (200) before completion;
{ status: "ALREADY_COMPLETED" } (400) after
|
| Submit (bad body) | POST | /fal/{owner}/{model} |
400 with
{ error: { code: "invalid_json", type: "invalid_request_error", message } }
when the request body is not valid JSON
|
Polling Realism
By default a queued job completes on submit — status polls return
COMPLETED immediately and tests stay fast. To exercise client code that
reacts to IN_QUEUE / IN_PROGRESS (queue position decay, log
accumulation, latency metrics), pass falQueue with positive poll thresholds.
The job advances through the state machine over the configured number of
/status calls.
const mock = new LLMock({
port: 0,
falQueue: { pollsBeforeInProgress: 1, pollsBeforeCompleted: 2 },
});
mock.onFalImage(/flux/, { images: [{ url: "..." }] });
// Submit → IN_QUEUE, queue_position: 1
// status1 → IN_PROGRESS, queue_position: 0, logs[2]
// status2 → COMPLETED, metrics.inference_time set
// result → 200 with the matched payload
When only pollsBeforeInProgress is set,
pollsBeforeCompleted defaults to pollsBeforeInProgress + 1 so
the job always spends at least one poll in IN_PROGRESS. Set both explicitly
for full control.
If pollsBeforeCompleted is set lower than pollsBeforeInProgress,
it is clamped up so IN_PROGRESS is never skipped.
logs always contains at least one entry (job enqueued); a transition entry is
appended for each state change. Cancelling a job before completion sets status to
CANCELLED and subsequent polls keep reporting that state.
JSON Fixture File
{
"fixtures": [
{
"match": { "model": "fal-ai/flux/dev", "endpoint": "fal" },
"response": {
"json": {
"images": [{ "url": "https://example.com/result.png" }]
}
}
}
]
}
Record & Replay
Use --record with the providers.fal configuration to capture
real fal.ai responses and replay them in tests:
npx @copilotkit/aimock --record --fixtures fixtures/fal.json
When recording, the x-fal-target-host header is used to resolve the upstream
fal service automatically — no additional provider configuration is needed.
Responses are saved as fixtures that can be replayed without network access.
Legacy Routes
For backward compatibility, aimock also supports the older path-based routing convention
used by the audio-specific handler (fal-audio.ts):
POST /fal/queue/submit/{model}— submit a queue jobGET /fal/queue/requests/{id}— fetch the resultPOST /fal/run/{model}— synchronous run
These paths work identically to the header-routed equivalents and share the same fixture matching logic.