fal.ai

aimock mocks the fal.ai inference API — queue-based and synchronous runs for image, video, audio, and any other fal model. Routes by x-fal-target-host header to mirror the @fal-ai/client proxy convention.

How It Works

The @fal-ai/client SDK routes requests through a proxy using the x-fal-target-host header to indicate the upstream fal service:

aimock intercepts POST /fal/{owner}/{model} requests bearing this header and handles the full queue lifecycle: it auto-mints a request_id, returns status and response URLs, and serves the matched fixture payload on result fetch.

Legacy path-based routing is also supported — /fal/queue/submit/{model}, /fal/queue/requests/{id}, and /fal/run/{model} all continue to work for backward compatibility.

Quick Start (Programmatic)

fal-mock.test.ts ts
import { LLMock } from "@copilotkit/aimock";
import { describe, it, expect, beforeAll, afterAll } from "vitest";

let mock: LLMock;

beforeAll(async () => {
  mock = new LLMock();
  await mock.start();
});

afterAll(async () => {
  await mock.stop();
});

it("image generation via queue", async () => {
  // Register a queue fixture for Flux image generation
  mock.onFalQueue(/flux/, { images: [{ url: "https://example.com/cat.png" }] });

  // Submit → status → result, just like the real API
});

it("video generation via queue", async () => {
  mock.onFalQueue(/kling/, { video: { url: "https://example.com/v.mp4" } });
});

it("synchronous transcription", async () => {
  // Sync runs skip the queue entirely
  mock.onFalRun(/whisper/, { text: "Hello world" });
});

Typed Helpers: onFalImage / onFalVideo

onFalQueue takes a raw JSON payload — the exact bytes that come out of fal. When you want stronger types and don't want to hand-write the envelope, use the typed helpers: they accept the same ImageResponse / VideoResponse shapes you use with onImage / onVideo and translate them into fal's wire shape before storing.

typed.test.ts ts
// Equivalent to onFalQueue(..., { images: [...], timings, seed, has_nsfw_concepts, prompt })
mock.onFalImage(/flux/, {
  images: [{ url: "https://mock.fal.media/x.png" }],
});

// Equivalent to onFalQueue(..., { video: { url, content_type, file_name, file_size }, seed })
mock.onFalVideo(/kling/, {
  video: { id: "v1", status: "completed", url: "https://mock.fal.media/clip.mp4" },
});

Defaults filled in for image: width: 1024, height: 1024, content_type inferred from URL extension, has_nsfw_concepts: [false, …] (one per image), timings.inference: 0, seed: 0. For video: content_type + file_name inferred from URL, file_size: 0, seed: 0.

Client Configuration

Point the @fal-ai/client at aimock using requestMiddleware to rewrite the proxy URL:

fal-client-config.ts ts
import { fal } from "@fal-ai/client";

fal.config({
  requestMiddleware: fal.withMiddleware(
    fal.withProxy({
      targetUrl: "http://localhost:4005/fal",  // aimock default port
    })
  ),
});

The client sends the original target host (e.g. queue.fal.run) in the x-fal-target-host header. aimock reads this header to decide whether to handle the request as a queue operation, a sync run, or a storage call.

Queue Lifecycle

Queue-based operations follow a four-step lifecycle. aimock handles all steps automatically once a fixture is registered:

Step Method Path Response
Submit POST /fal/{owner}/{model} { request_id, status_url, response_url, cancel_url }
Status GET /fal/{owner}/{model}/requests/{id}/status { status, request_id, response_url, logs[] }queue_position while pending, metrics.inference_time once COMPLETED
Result GET /fal/{owner}/{model}/requests/{id} The matched fixture payload (200) once COMPLETED; the status body (202) before
Cancel PUT /fal/{owner}/{model}/requests/{id}/cancel { status: "CANCELLED" } (200) before completion; { status: "ALREADY_COMPLETED" } (400) after
Submit (bad body) POST /fal/{owner}/{model} 400 with { error: { code: "invalid_json", type: "invalid_request_error", message } } when the request body is not valid JSON

Polling Realism

By default a queued job completes on submit — status polls return COMPLETED immediately and tests stay fast. To exercise client code that reacts to IN_QUEUE / IN_PROGRESS (queue position decay, log accumulation, latency metrics), pass falQueue with positive poll thresholds. The job advances through the state machine over the configured number of /status calls.

polling.test.ts ts
const mock = new LLMock({
  port: 0,
  falQueue: { pollsBeforeInProgress: 1, pollsBeforeCompleted: 2 },
});
mock.onFalImage(/flux/, { images: [{ url: "..." }] });

// Submit  → IN_QUEUE,    queue_position: 1
// status1 → IN_PROGRESS, queue_position: 0, logs[2]
// status2 → COMPLETED,   metrics.inference_time set
// result  → 200 with the matched payload

When only pollsBeforeInProgress is set, pollsBeforeCompleted defaults to pollsBeforeInProgress + 1 so the job always spends at least one poll in IN_PROGRESS. Set both explicitly for full control.

If pollsBeforeCompleted is set lower than pollsBeforeInProgress, it is clamped up so IN_PROGRESS is never skipped.

logs always contains at least one entry (job enqueued); a transition entry is appended for each state change. Cancelling a job before completion sets status to CANCELLED and subsequent polls keep reporting that state.

JSON Fixture File

fixtures/fal.json json
{
  "fixtures": [
    {
      "match": { "model": "fal-ai/flux/dev", "endpoint": "fal" },
      "response": {
        "json": {
          "images": [{ "url": "https://example.com/result.png" }]
        }
      }
    }
  ]
}

Record & Replay

Use --record with the providers.fal configuration to capture real fal.ai responses and replay them in tests:

CLI sh
npx @copilotkit/aimock --record --fixtures fixtures/fal.json

When recording, the x-fal-target-host header is used to resolve the upstream fal service automatically — no additional provider configuration is needed. Responses are saved as fixtures that can be replayed without network access.

Legacy Routes

For backward compatibility, aimock also supports the older path-based routing convention used by the audio-specific handler (fal-audio.ts):

These paths work identically to the header-routed equivalents and share the same fixture matching logic.