fal.ai

aimock mocks the fal.ai inference API — queue-based and synchronous runs for image, video, audio, and any other fal model. Routes by x-fal-target-host header to mirror the @fal-ai/client proxy convention.

How It Works

The @fal-ai/client SDK routes requests through a proxy using the x-fal-target-host header to indicate the upstream fal service:

queue.fal.run — queue-based operations (submit, poll status, fetch result)
fal.run — synchronous single-shot runs
rest.fal.ai — storage and other REST endpoints

aimock intercepts POST /fal/{owner}/{model} requests bearing this header and handles the full queue lifecycle: it auto-mints a request_id, returns status and response URLs, and serves the matched fixture payload on result fetch.

Legacy path-based routing is also supported — /fal/queue/submit/{model}, /fal/queue/requests/{id}, and /fal/run/{model} all continue to work for backward compatibility.

Quick Start (Programmatic)

fal-mock.test.ts ts

import { LLMock } from "@copilotkit/aimock";
import { describe, it, expect, beforeAll, afterAll } from "vitest";

let mock: LLMock;

beforeAll(async () => {
  mock = new LLMock();
  await mock.start();
});

afterAll(async () => {
  await mock.stop();
});

it("image generation via queue", async () => {
  // Register a queue fixture for Flux image generation
  mock.onFalQueue(/flux/, { images: [{ url: "https://example.com/cat.png" }] });

  // Submit → status → result, just like the real API
});

it("video generation via queue", async () => {
  mock.onFalQueue(/kling/, { video: { url: "https://example.com/v.mp4" } });
});

it("synchronous transcription", async () => {
  // Sync runs skip the queue entirely
  mock.onFalRun(/whisper/, { text: "Hello world" });
});

Typed Helpers: `onFalImage` / `onFalVideo`

onFalQueue takes a raw JSON payload — the exact bytes that come out of fal. When you want stronger types and don't want to hand-write the envelope, use the typed helpers: they accept the same ImageResponse / VideoResponse shapes you use with onImage / onVideo and translate them into fal's wire shape before storing.

typed.test.ts ts

// Equivalent to onFalQueue(..., { images: [...], timings, seed, has_nsfw_concepts, prompt })
mock.onFalImage(/flux/, {
  images: [{ url: "https://mock.fal.media/x.png" }],
});

// Equivalent to onFalQueue(..., { video: { url, content_type, file_name, file_size }, seed })
mock.onFalVideo(/kling/, {
  video: { id: "v1", status: "completed", url: "https://mock.fal.media/clip.mp4" },
});

Defaults filled in for image: width: 1024, height: 1024, content_type inferred from URL extension, has_nsfw_concepts: [false, …] (one per image), timings.inference: 0, seed: 0. For video: content_type + file_name inferred from URL, file_size: 0, seed: 0.

Client Configuration

Point the @fal-ai/client at aimock using requestMiddleware to rewrite the proxy URL:

fal-client-config.ts ts

import { fal } from "@fal-ai/client";

fal.config({
  requestMiddleware: fal.withMiddleware(
    fal.withProxy({
      targetUrl: "http://localhost:4005/fal",  // aimock default port
    })
  ),
});

The client sends the original target host (e.g. queue.fal.run) in the x-fal-target-host header. aimock reads this header to decide whether to handle the request as a queue operation, a sync run, or a storage call.

Queue Lifecycle

Queue-based operations follow a four-step lifecycle. aimock handles all steps automatically once a fixture is registered:

Step	Method	Path	Response
Submit	POST	`/fal/{owner}/{model}`	`{ request_id, status_url, response_url, cancel_url }`
Status	GET	`/fal/{owner}/{model}/requests/{id}/status`	`{ status, request_id, response_url, logs[] }` — `queue_position` while pending, `metrics.inference_time` once `COMPLETED`
Result	GET	`/fal/{owner}/{model}/requests/{id}`	The matched fixture payload (200) once `COMPLETED`; the status body (202) before
Cancel	PUT	`/fal/{owner}/{model}/requests/{id}/cancel`	`{ status: "CANCELLED" }` (200) before completion; `{ status: "ALREADY_COMPLETED" }` (400) after
Submit (bad body)	POST	`/fal/{owner}/{model}`	400 with `{ error: { code: "invalid_json", type: "invalid_request_error", message } }` when the request body is not valid JSON

Polling Realism

By default a queued job completes on submit — status polls return COMPLETED immediately and tests stay fast. To exercise client code that reacts to IN_QUEUE / IN_PROGRESS (queue position decay, log accumulation, latency metrics), pass falQueue with positive poll thresholds. The job advances through the state machine over the configured number of /status calls.

polling.test.ts ts

const mock = new LLMock({
  port: 0,
  falQueue: { pollsBeforeInProgress: 1, pollsBeforeCompleted: 2 },
});
mock.onFalImage(/flux/, { images: [{ url: "..." }] });

// Submit  → IN_QUEUE,    queue_position: 1
// status1 → IN_PROGRESS, queue_position: 0, logs[2]
// status2 → COMPLETED,   metrics.inference_time set
// result  → 200 with the matched payload

When only pollsBeforeInProgress is set, pollsBeforeCompleted defaults to pollsBeforeInProgress + 1 so the job always spends at least one poll in IN_PROGRESS. Set both explicitly for full control.

If pollsBeforeCompleted is set lower than pollsBeforeInProgress, it is clamped up so IN_PROGRESS is never skipped.

logs always contains at least one entry (job enqueued); a transition entry is appended for each state change. Cancelling a job before completion sets status to CANCELLED and subsequent polls keep reporting that state.

JSON Fixture File

fixtures/fal.json json

{
  "fixtures": [
    {
      "match": { "model": "fal-ai/flux/dev", "endpoint": "fal" },
      "response": {
        "json": {
          "images": [{ "url": "https://example.com/result.png" }]
        }
      }
    }
  ]
}

Record & Replay

Use --record with the providers.fal configuration to capture real fal.ai responses and replay them in tests:

CLI sh

npx @copilotkit/aimock --record --fixtures fixtures/fal.json

When recording, the x-fal-target-host header is used to resolve the upstream fal service automatically — no additional provider configuration is needed. Responses are saved as fixtures that can be replayed without network access.

Legacy Routes

For backward compatibility, aimock also supports the older path-based routing convention used by the audio-specific handler (fal-audio.ts):

POST /fal/queue/submit/{model} — submit a queue job
GET /fal/queue/requests/{id} — fetch the result
POST /fal/run/{model} — synchronous run

These paths work identically to the header-routed equivalents and share the same fixture matching logic.