Text-to-Speech

The POST /v1/audio/speech endpoint returns audio data from text input. Supports multiple output formats including mp3, opus, aac, flac, wav, and pcm.

Endpoint

Method Path Format
POST /v1/audio/speech JSON request, binary/base64 response

Unit Test: Basic Speech

Using the programmatic API with vitest, register a fixture and assert on the response.

speech-basic.test.ts ts
import { LLMock } from "@copilotkit/aimock";
import { describe, it, expect, beforeAll, afterAll } from "vitest";

let mock: LLMock;

beforeAll(async () => {
  mock = new LLMock();
  await mock.start();
});

afterAll(async () => {
  await mock.stop();
});

it("returns audio for text input", async () => {
  mock.onSpeech("Hello world", { audio: "SGVsbG8gd29ybGQ=" });

  const res = await fetch(`${mock.url}/v1/audio/speech`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "tts-1",
      input: "Hello world",
      voice: "alloy",
    }),
  });

  expect(res.ok).toBe(true);
  const body = await res.json();
  expect(body.audio).toBe("SGVsbG8gd29ybGQ=");
});

Format Options

The response_format field in the request controls the audio output format. Supported values:

Format Content-Type Description
mp3 audio/mpeg Default format, widely supported
opus audio/opus Low latency, good for streaming
aac audio/aac Preferred for mobile devices
flac audio/flac Lossless compression
wav audio/wav Uncompressed, no decoding overhead
pcm audio/pcm Raw samples, 24kHz 16-bit signed little-endian

JSON Fixture

fixtures/speech.json json
{
  "fixtures": [
    {
      "match": { "userMessage": "Hello world" },
      "response": {
        "audio": "SGVsbG8gd29ybGQ="
      }
    }
  ]
}

Response Format

Returns audio data matching the requested format:

Speech fixtures use match.userMessage which maps to the input field in the request body. The matcher checks for substring matches on the text to be spoken.

Record & Replay

When no fixture matches an incoming request, aimock can proxy it to the real API and record the response as a fixture for future replays. Enable recording with the --record flag or via RecordConfig in the programmatic API. Binary audio from the provider is base64-encoded in the recorded fixture, with the format derived from the response Content-Type header (e.g. audio/mpeg for mp3). Subsequent requests replay the cached audio without hitting the real API.

CLI sh
npx -p @copilotkit/aimock llmock --record --provider-openai https://api.openai.com