Text-to-Speech

The POST /v1/audio/speech endpoint returns audio data from text input. Supports multiple output formats including mp3, opus, aac, flac, wav, and pcm.

Endpoint

Method	Path	Format
POST	/v1/audio/speech	JSON request, binary/base64 response

Unit Test: Basic Speech

Using the programmatic API with vitest, register a fixture and assert on the response.

speech-basic.test.ts ts

import { LLMock } from "@copilotkit/aimock";
import { describe, it, expect, beforeAll, afterAll } from "vitest";

let mock: LLMock;

beforeAll(async () => {
  mock = new LLMock();
  await mock.start();
});

afterAll(async () => {
  await mock.stop();
});

it("returns audio for text input", async () => {
  mock.onSpeech("Hello world", { audio: "SGVsbG8gd29ybGQ=" });

  const res = await fetch(`${mock.url}/v1/audio/speech`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "tts-1",
      input: "Hello world",
      voice: "alloy",
    }),
  });

  expect(res.ok).toBe(true);
  const body = await res.json();
  expect(body.audio).toBe("SGVsbG8gd29ybGQ=");
});

Format Options

The response_format field in the request controls the audio output format. Supported values:

Format	Content-Type	Description
mp3	audio/mpeg	Default format, widely supported
opus	audio/opus	Low latency, good for streaming
aac	audio/aac	Preferred for mobile devices
flac	audio/flac	Lossless compression
wav	audio/wav	Uncompressed, no decoding overhead
pcm	audio/pcm	Raw samples, 24kHz 16-bit signed little-endian

JSON Fixture

fixtures/speech.json json

{
  "fixtures": [
    {
      "match": { "userMessage": "Hello world" },
      "response": {
        "audio": "SGVsbG8gd29ybGQ="
      }
    }
  ]
}

Response Format

Returns audio data matching the requested format:

audio — base64-encoded audio data in the fixture response

Speech fixtures use match.userMessage which maps to the input field in the request body. The matcher checks for substring matches on the text to be spoken.

Record & Replay

When no fixture matches an incoming request, aimock can proxy it to the real API and record the response as a fixture for future replays. Enable recording with the --record flag or via RecordConfig in the programmatic API. Binary audio from the provider is base64-encoded in the recorded fixture, with the format derived from the response Content-Type header (e.g. audio/mpeg for mp3). Subsequent requests replay the cached audio without hitting the real API.

CLI sh

npx -p @copilotkit/aimock llmock --record --provider-openai https://api.openai.com